Tag Archives: cURL and libcurl

what’s –next for curl

curl is finally getting support for doing multiple independent requests specified in the same command line, which allows users to make even better use of curl’s excellent persistent connection handling and more. I don’t know when I first got the question of how to do a GET and a POST in a single command line with curl, but I do know that we’ve had the TODO item about adding such a feature mentioned since 2004 – and I know it wasn’t added there right away…

Starting in curl 7.36.0, we can respond with a better answer: use the --next option!

curl has been able to work with multiple URLs on the command line virtually since day 1, but all the command line options would then mostly apply and be used for all specified URLs.

This new --next option introduces a “boundary”, or a wall if you like, between options on the command line. The options set before –next will be handled as one request and the options set on the right side of –next will start adding up to another request. You of course then need to specify at least one URL per individual such section and you can add any number of –next on the command line. If the command line then gets too long, we also support the same logic and sequence in the “config files” which is the way you can specify command line arguments into a text file and have curl read them from there using -K or --config.

Here’s a somewhat silly example to illustrate. This fist makes a POST and then a HEAD to two different pages on the same host:

curl -d FOO example.com/input.cgi --next--head example.com/robots.txt

Thanks to Steve Holme for his hard work on implementing this!

http2 in curl

While the first traces of http2 support in curl was added already back in September 2013 it hasn’t been until recently it actually was made useful. There’s been a lot of http2 related activities in the curl team recently and in the late January 2014 we could run our first command line inter-op tests against public http2 (draft-09) servers on the Internet.

There’s a lot to be said about http2 for those not into its nitty gritty details, but I’ll focus on the curl side of this universe in this blog post. I’ll do separate posts and presentations on http2 “internals” later.

A quick http2 overview

http2 (without the minor version, as per what the IETF work group has decided on) is a binary protocol that allows many logical streams multiplexed over the same physical TCP connection, it features compressed headers in both directions and it has stream priorities and more. It is being designed to maintain the user concepts and paradigms from HTTP 1.1 so web sites don’t have to change contents and web authors won’t need to relearn a lot. The web will not break because of http2, it will just magically work a little better, a little smoother and a little faster.

In libcurl we build http2 support with the help of the excellent library called nghttp2, which takes care of all the binary protocol details for us. You’ll also have to build it with a new enough version of the SSL library of your choice, as http2 over TLS will require use of some fairly recent TLS extensions that not many older releases have and several TLS libraries still completely lack!

The need for an extension is because with speaking TLS over port 443 which HTTPS implies, the current and former web infrastructure assumes that we will speak HTTP 1.1 over that, while we now want to be able to instead say we want to talk http2. When Google introduced SPDY then pushed for a new extension called NPN to do this, which when taken through the standardization in IETF has been forked, changed and renamed to ALPN with roughly the same characteristics (I don’t know the specific internals so I’ll stick to how they appear from the outside).

So, NPN and especially ALPN are fairly recent TLS extensions so you need a modern enough SSL library to get that support. OpenSSL and NSS both support NPN and ALPN with a recent enough version, while GnuTLS only supports ALPN. You can build libcurl to use any of these threes libraries to get it to talk http2 over TLS.

http2 using libcurl

(This still describes what’s in curl’s git repository, the first release to have this level of http2 support is the upcoming 7.36.0 release.)

Users of libcurl who want to enable http2 support will only have to set CURLOPT_HTTP_VERSION to CURL_HTTP_VERSION_2_0 and that’s it. It will make libcurl try to use http2 for the HTTP requests you do with that handle.

For HTTP URLs, this will make libcurl send a normal HTTP 1.1 request with an offer to the server to upgrade the connection to version 2 instead. If it does, libcurl will continue using http2 in the clear on the connection and if it doesn’t, it’ll continue using HTTP 1.1 on it. This mode is what Firefox and Chrome will not support.

For HTTPS URLs, libcurl will use NPN and ALPN as explained above and offer to speak http2 and if the server supports it. there will be http2 sweetness from than point onwards. Or it selects HTTP 1.1 and then that’s what will be used. The latter is also what will be picked if the server doesn’t support ALPN and NPN.

Alt-Svc and ALTSVC are new things planned to show up in time for http2 draft-11 so we haven’t really thought through how to best support them and provide their features in the libcurl API. Suggestions (and patches!) are of course welcome!

http2 with curl

Hardly surprising, the curl command line tool also has this power. You use the –http2 command line option to switch on the libcurl behavior as described above.

Translated into old-style

To reduce transition pains and problems and to work with the rest of the world to the highest possible degree, libcurl will (decompress and) translate received http2 headers into http 1.1 style headers so that applications and users will get a stream of headers that look very much the way you’re used to and it will produce an initial response line that says HTTP 2.0 blabla.

Building (lib)curl to support http2

See the README.http2 file in the lib/ directory.

This is still a draft version of http2!

I just want to make this perfectly clear: http2 is not out “for real” yet. We have tried our http2 support somewhat at the draft-09 level and Tatsuhiro has worked on the draft-10 support in nghttp2. I expect there to be at least one more draft, but perhaps even more, before http2 becomes an official RFC. We hope to be able to stay on the frontier of http2 and deliver support for the most recent draft going forward.

PS. If you try any of this and experience any sort of problems, please speak to us on the curl-library mailing list and help us smoothen out whatever problem you got!

cURL

My FOSDEM 2014

I’m back home after FOSDEM 2014.Lots of coffee A big THANK YOU from me to the organizers of this fine and totally free happening.

Europe’s (the World’s?) biggest open source conference felt even bigger and more crowded this year. There seemed to be more talks that got full, longer lines for food and a worse parking situation.

Nothing of that caused any major concern for me though. I had a great weekend and I met up with a whole busload of friends from all over. Many of them I only meet at FOSDEM. This year I had some additional bonuses by for example meeting up with long-term committers Steve and Dan from the curl project whom I had never met before IRL. Old buddies from Haxx and Rockbox are kind of default! 🙂

Talk-wise this year was also extra good. I’ve always had a soft spot for the Embedded room but this year there was fierce competition for my attention so I spread my time among many rooms and got to see stuff about: clang the compiler, lots of really cool stuff on GDB, valgrind and helgrind, power efficient software, using the GPU to accelerate libreoffice, car automation and open source, how to run Android on low-memory devices, Firefox on Android and more.

I missed out the kdbus talks since it took place in one of them smaller devrooms even though it was “celebrity warning” all over it with Lennart Poettering. In general there’s sometimes this problem at FOSDEM that devrooms have very varying degrees of popularity on the different talks so the size of the room may be too large or too small depending on the separate topics and speakers. But yeah, I understand it is a very hard problem to improve for the organizers.

As a newbie Firefox developer at Mozilla I find it fun to first hear the Firefox on Android talk for an overview on how things  run on that platform now and then I also got references to Firefox both in the helgrind talk and the low-memory Android talk. In both negative and positive senses.

As always on FOSDEM some talks are not super good and we get unprepared speakers who talk quietly, monotone and uninspired but then there’s the awesome people that in spite of accents and the problem of speaking in English as your non-native language, can deliver inspiring and enticing talks that make me just want to immediately run home and try out new things.

The picture on the right is a small tribute to the drinks we could consume to get our spirits up during a talk we perhaps didn’t find the most interesting…

This year I found the helgrind and the gdb-valgrind talks to be especially good together with Meeks’ talk on using the GPU for libreoffice. We generally found that the wifi setup was better than ever before and worked basically all the time.

Accordingly, there were 8333 unique MAC addresses used on the network through the two days, which we then can use to guesstimate the number of attendees. Quite possibly upwards 6000…

See you at FOSDEM 2015. I think I’ll set myself up to talk about something then. I didn’t do any this year.

My first Mozilla week

Working from home

I get up in the morning, shave, eat breakfast and make sure all family members get off as they should. Most days I walk my son to school (some 800 meters) and then back again. When they’re all gone, the house is quiet and then me and my cup of coffee go upstairs and my work day begins.

Systems and accounts

I have spent time this week to setup accounts and sign up for various lists and services. Created profiles, uploaded pictures, confirmed passwords. I’ve submitted stuff and I’ve signed things. There’s quite a lot of systems in use.

My colleagues

I’ve met a few. The Necko team isn’t very big but the entire company is huge and there are just so many people and names. I haven’t yet had any pressing reason to meet a lot of people nor learn a lot of names. I feel like I’m starting out this really slowly and gradually.

Code base

Firefox is a large chunk of code. It takes some 20 minutes to rebuild on my 3.5GHz quad-core Core-i7 with SSD. I try to pull code and rebuild every morning now so that I can dogfood and live on the edge. I also have a bunch of local patches now, some of them which I want to have stewing in my own browser for a while so that I know they at least don’t have any major negative impact!

Figuring out the threading, XPCOM, the JavaScript stuff and everything is a massive task. I really cannot claim to have done more than just scratched the surface so far, but at least I am scratching and I’ve “etagged” the whole lot and I’ve spent some time reading and reviewing code. Attaching a gdb to a running Firefox and checking out behavior and how it looks has also helped.

Netwerk code size

“Netwerk” is the directory name of the source tree where most of the network code is located. It is actually not so ridiculously large as one could fear. Counting only C++ and header files, it sums up to about 220K lines of code. Of course not everything interesting is in this tree, but still. Not mindbogglingly large.

Video conferencing

I’ll admit I’ve not participated in this sort of large scale video conferences before this. With Vidyo and all the different people and offices signed up at once – it is a quite impressive setup actually. My only annoyance so far is that I didn’t get the sound for Vidyo to work for me in Linux with my headphones. The other end could hear me but I couldn’t hear them! I had to defer to using Vidyo on a windows laptop instead.

Doing the video conferencing on a laptop instead of on my desktop machine has its advantages when I do them during the evenings when the rest of the family is at home since then I can move my machine somewhere and sit down somewhere where they won’t disturb me and I won’t disturb them.

Bugzilla

The bug tracker is really in the center for this project, or at least for how I view it and work with it right now. During my first week I’ve so far filed two bug reports and I’ve submitted a suggested patch for a third bug. One of my bugs (Bug 959100 – ParseChunkRemaining doesn’t detect chunk size overflow) has been reviewed fine and is now hopefully about to be committed.

I’ve requested commit access (#961018) as a “level 1” and I’ve signed the committer’s agreement. Level 1 is entry level and only lets me push to the Try server but still, I fully accept that there’s a process to follow and I’m in no hurry. I’ll get to level 3 soon enough I’m sure.

Mercurial

What can I say. After having used it a bit this week without any particularly fancy operations, I prefer git so much more. Of course I’m also much more used to git, but I find that for a lot of the stuff where both have similar concepts I prefer to git way. Oh well, its just a tool. I’ll get around. Possibly I’ll try out the git mirror soon and see if that provides a more convenient environment for me.

curl

What impact did all this new protocol and network code stuff during my work days have on my curl activities?

I got inspired to fix both the chunked encoding parser and the cookie parser’s handling of max-age in libcurl.

What didn’t happen

I feel behind in the implementing-http2 department. I didn’t get my new work laptop yet.

Next weekDaniel's work place

More of the same, land more patches and figure out more code. Grab more smallish bugs others have filed and work on fixing them as more practice.

Also, there’s a HTTPbis meeting in Zürich on Wednesday to Friday that I won’t go to (I’ll spare you the explanation why) but I’ll try to participate remotely.

I go Mozilla

Mozilla dinosaur head logo

In January 2014, I start working for Mozilla

I’ve worked in open source projects for some 20 years and I’ve maintained curl and libcurl for over 15 years. I’m an internet protocol geek at heart and Mozilla seems like a perfect place for me to continue to explore this interest of mine and combine it with real open source in its purest form.

I plan to use my experiences from all my years of protocol fiddling and making stuff work on different platforms against random server implementations into the networking team at Mozilla and work on improving Firefox and more.

I’m putting my current embedded Linux focus to the side and I plunge into a worldwide known company with worldwide known brands to do open source within the internet protocols I enjoy so much. I’ll be working out of my home, just outside Stockholm Sweden. Mozilla has no office in my country and I have no immediate plans of moving anywhere (with a family, kids and all established here).

I intend to bring my mindset on protocols and how to do things well into the Mozilla networking stack and world and I hope and expect that I will get inspiration and input from Mozilla and take that back and further improve curl over time. My agreement with Mozilla also gives me a perfect opportunity to increase my commitment to curl and curl development. I want to maintain and possibly increase my involvement in IETF and the httpbis work with http2 and related stuff. With one foot in Firefox and one in curl going forward, I think I may have a somewhat unique position and attitude toward especially HTTP.

I’ve not yet met another Swedish Mozillian but I know I’m not the only one located in Sweden. I guess I now have a reason to look them up and say hello when suitable.

Björn and Linus will continue to drive and run Haxx with me taking a step back into the shadows (Haxx-wise). I’ll still be part of the collective Haxx just as I was for many years before I started working full-time for Haxx in 2009. My email address, my sites etc will remain on haxx.se.

I’m looking forward to 2014!

source code survival rate

The curl project has its roots in the late 1996, but we haven’t kept track of all of the early code history. We imported our code to Sourceforge late 1999 and that’s how far back we can see in our current git repository. The exact date is “Wed Dec 29 14:20:26 UTC 1999”. So, almost 14 years of development.

Warning: this blog post contains more useless info and graphs than many mortals can handle. Be aware!

How much old code remain in the current source tree? Or perhaps put differently: how is the refresh rate of the code? We fix bugs, we change things, we add features. Surely we’ll slowly over time rewrite the old code and replace it with new more shiny and better working code? I decided to check this. Here’s what I found!

The tools

We have all code in git. ‘git blame’ is the primary tool I used as it lists all lines of all source code and tells us when it was added. I did some additional perl scripting around it.

The code

I decided to check all code in the src/ and lib/ directories in the curl and libcurl source tree. The source code is used to create both the curl tool and the libcurl library and back in 1999 there was no libcurl like today so we do get a slightly better coverage of history this way.

In total this sums up to some 112000 lines in the current .c and .h files.

To count the total amount of commits done to those specific files through history I ran:

git log --oneline src/*.[ch] lib/*.[ch] | wc -l

6047 commits in total. (if I don’t specify the files and count all commits in the repo it ends up at 16954)

git stats

We run gitstats on the curl repo every day so you can go there for some more and current stats. Right now it tells us that average number of commits is 4.7 per active day (that means days when actually something was committed), or 3.4 per all days over the entire time. There was git activity 3576 days in total. By 224 authors.

Surviving commits

How much of the code would you think still remains that were present already that December day 1999?

How much of the code in the current code base would you think was written the last few years?

Commit vs Author vs Date

I wanted to see how much old code that exists, or perhaps how the age of the code is represented in the current code base. I decided to therefore base my logic on the author time that git tracks. It is basically the time when the author of a change commits it to his/her local tree as then the change can be applied later on by a committer that can be someone else, but the author time remains the same. Sometimes a committer commits multiple patches at once, possibly at a much later time etc so I figured the author time would be a better time stamp. I also decided to track the date instead of just the commit hash so that I can sort the changes properly and also make interesting graphs that are based on that time. I use the time with a second precision so changes done a second apart will be recorded as two separate changes while two commits done with the same author time stamp will be counted as the same time.

I had my script run ‘git blame –line-porcelain’ for all files and had my script sum up all changes done on the same time.

Some totals

The code base contains changes written at 4147 different times. Converted to UTC times, they happened on 2076 unique days. On 167 unique months. That’s every month since the beginning.

We’re talking about 312 files.

Number of lines changed over time

A graph with changes over time. The Y axis is number of lines that were changed on that particular time. (click for higher res)

Lines changed over time

Ok you object, that doesn’t look very appealing. So here’s the same data but with all the changes accumulated over time.

accumulated

Do you think the same as I do? Isn’t it strangely linear? It seems that the number of added lines that remain in the code today is virtually the same over time! But fair enough, the changes in the X axis are not distributed according to the time/date they represent so we shouldn’t be fooled by the time, but certainly we can see that changes in general only bring in a certain amount of surviving modified lines.

Another way to count the changes is then to check all the ~4000 change times of the present code, and see how many days between them there are:

delta

Ah, now finally we’re seeing something. Older code that is still present clearly was made with longer periods in between the changes that have lasted. It makes perfect sense to me, since the many years of development probably have later overwritten a lot of code that was written in between.

Also, it is clearly that among the more recent changes that have survived they were often done on the same day or just a few days away from another lasting change.

Grouped on date ranges

The number of modified lines split up on the individual year the change came in.

year

Interesting! The general trend is clear and not surprising. Two years stand out from the trend, 2004 and 2011. I have not yet investigated what particular larger changes that were made those years that have survived. The bump for 1999 is simply the original import and most of those lines are preprocessor lines like #ifdef and #include or just opening and closing braces { and }.

Splitting up the number of surviving lines on the specific year+month they were added:

month

This helps us analyze the previous chart. As we can see, the rather tall bars from 2004 and 2011 are actually several months wide and explains the bumps in the year-chart. Clearly we made some larger effort on those periods that were good enough to still remain in the code.

Correlate to added or removed lines?

So, can we perhaps see if some years’ more activity in number of added or removed source lines can be tracked back to explain the number of surviving source code lines? I ran “git diff [hash1]..[hash2] –stat — lib/*.[ch] src/*.[ch]” for all years to get a summary of number of added and removed source code lines that year. I added those number to the table with surviving lines and then I made another graph:

year-again

Funnily enough, we see almost an exact correlation there for the first eight years and then the pattern breaks. From the year 2009 the number of removed lines went down but still the amount of surving lines went up quite a bit and then the graphs jump around a bit.

My interpretation of this graph is this boring: the amount of surviving code in absolute numbers is clearly correlating to the amount of added code. And that we removed more code yearly in the 2000-2003 period than what has survived.

But notice how the blue line is closing the gap to the orange/red one over time, which means that percentage wise there’s more surviving code in more recent code! How much?

Here’s the amount of surviving lines/added lines and a second graph looking at surviving lines/(added + removed) to see if the mere source code activity would be a more suitable factor to compare against…

relation survival vs added and removed lines

Code committed within the last 5 years are basically 75% left but then it goes downhill down to the 18% survival rate of the 1999 code import.

If you can think of other good info to dig out, let me know!

1999,1699
2000,1115
2001,3061
2002,2432
2003,2578
2004,7644
2005,4016
2006,5101
2007,7665
2008,7292
2009,9460
2010,11762
2011,19642
2012,11842
2013,16844

he forked off libgnurl

Everyone and anyone is of course entitled to fork a project that is released under an open source license. This goes for my projects as well and I don’t mind it. Go ahead.

I think it may be a bit shortsighted and a stupid decision, but open source allows this and it sometimes actually leads to goodness.

libgnurl

Enter libgnurl. A libcurl fork created by Christian Grothoff.

For most applications, the more obscure protocols supported by cURL are close to dead code — mostly harmless, but not useful

<sarcasm>Of course a libcurl newcomer such Christian knows exactly what “most applications” want and need and thus what’s useful to them….</sarcasm>

cURL supports a bunch of crypto backends. In practice, only the OpenSSL, NSS (RedHat) and GnuTLS (Debian) variants seem to see widespread deployment

Originally he mentioned only OpenSSL and GnuTLS there until someone pointed out the massive amount of NSS users and then the page got updated. Quite telling I think. Lots of windows users these days use the schannel backend, Mac OS X users use the darwinssl backend and so on. Again statements based on his view and opinions and most probably without any closer checks done or even attempted.

As a side-node we could discuss what importance (perceived) “widespread deployment” has when selecting what to support or not, but let’s save that for another blog post a rainy day.

there exist examples of code that deadlocks on IPC if cURL is linked against OpenSSL while it works fine with GnuTLS

I can’t argue against something I don’t know about. I’m not aware of any bug reports on something like this. libcurl is not fully SSL layer agnostic, the SSL library choice “leaks” through to applications so yes an application can very well be written to be “forced” to use a libcurl built against a particular backend. That doesn’t seem what he’s complaining about here though.

Thus, application developers have to pray that the cURL version deployed by the distribution is compatible with their needs

Application developers that use a library – any library – surely always hope that it is compatible with their needs!?

it is also rather difficult to replace cURL for normal users if cURL is compiled in the wrong way

Is it really? As most autotools based projects, you just run configure –prefix=blablalba and install a separate build in a customer directory and then use that for your special-need projects. I suppose he means something else. I don’t know what.

For GNUnet, we need a modern version of GnuTLS. How modern? Well, while I write this, it hasn’t been released yet (update: the release has now happened, the GnuTLS guys are fast). So what happens if one tries to link cURL against this version of GnuTLS?

To verify his claims that building against the most recent gnutls is tricky, I tried:

  1. download 3.2.5 tarball
  2. unpack it
  3. configure –prefix=$HOME/build-gnutls-3.2.5
  4. make
  5. make install
  6. cd [curl source tree]
  7. configure –without-ssl –with-gnutls=$HOME/build-gnutls-3.2.5 [and some more options if wanted]
  8. make
  9. invoke “./src/curl -V” to verify that the build is using the latest. Yes it does. Case closed.

How does forking fix it? Easy. First, we can get rid of all of the compatibility issues

That’s of course hard to argue with. If you introduce a brand new library it won’t have any compatibility issues since nobody used it before. Kind of shortsighted solution though, since as soon as someone starts to use it then compatibility becomes something to pay attention to.

Also, since Christian is talking about doing some changes to accomplish this new grand state, I suspect he will do this by breaking compatibility with libcurl in some aspects and then gnurl won’t be libcurl compatible so it will no longer be that easy to switch between them as desired.

Note that this pretty much CANNOT be done without a fork, as renaming is an essential part of the fix.

Is renaming the produced library really that hard to do without forking the project? If I want to produce a renamed output from an open source project out there, I apply a script or hack the makefile of that project and I keep that script or diff in my end. No fork needed. I think I must’ve misunderstood some subtle angle of this…

Now, there might be creative solutions to achieve the same thing within the standard cURL build system, but I’m not happy to wait for a decade for Daniel to review the patches.

Why would he need to send me such patches in the first place? Why would I have to review the patches? Why would we merge them?

That final paragraph is probably the most telling of his entire page. I think he did this entire fork because he is unhappy with the lack of speed in the reviewing and responses to his patches he sent to the libcurl mailing list. He’s publicly complained and whined about it several times. A very hostile attitude to actually get the help or review you want.

I want to note that the main motivations for this fork are technical

Yes sure, they are technical but also based on misunderstandings and just lack of will.

But I like to stress again that I don’t mind the fork. I just mind the misinformation and the statements made as if they were true and facts and represents what we stand for in the actual curl project.

I believe in collaboration. I try to review patches and provide feedback as soon as I can. I wish Christian every success with gnurl.

Testing curl_multi_socket_action

We’re introducing a brand new way to test the event-based socket_action API in libcurl! (available in curl since commit 6cf8413e3162)

Background

Since 2006 we’ve had three major API families in libcurl for doing file transfers:

  1. the easy interface – a synchronous and yes, easy, interface for getting things done
  2. the multi interface – a non-blocking interface that allows multiple simultaneous transfers (in both directions) in the same thread
  3. the socket_action interface – a brother of the multi interface but designed for  use with an event-based library/engine for high performance and large scale transfers

The curl command line tool uses the easy interface and our test suite for curl + libcurl consists of perhaps 80% curl tests, while the rest are libcurl-using programs testing both the easy interface and the multi interface.

Early this year we modified libcurl’s internals so that the functions driving the easy interface transfer would use the multi interface internally. Then all of a sudden all the curl-using tests using the easy interface also then by definition tested that the operation worked fine with the multi interface. Needless to say, this pushed several bugs up to the surface that we could fix.

So the multi and the easy interfaces are tested by many hundred test cases on a large number of various systems every day around the clock. Nice! But what about the third interface? The socket_action interface isn’t tested at all! Time to change this sorry state.

Event-based test challenges

The event-based API has its own set of challenges; like it needs to react on socket state changes (only) and allow smooth interactions with the user’s own choice of event library etc. This is our newest API family and also the least commonly used. One reason for this may very well be that event-based coding is generally harder to do than more traditional poll-based code. Event-based code forces the application into using state-machines all over to a much higher degree and the frequent use of callbacks easily makes the code hard to read and its logic hard to follow.

So, curl_multi_socket_action() acts in ways that aren’t done or even necessary when the regular select-oriented multi interface is used. Code that then needs to be tested to remain working!

Introducing an alternative curl_easy_perform

As I mentioned before, we made the general multi interface widely tested by making sure the easy interface code uses the multi interface under the hood. We can’t easily do the same operation again, but instead this time we introduce a separate implementation (for debug-enabled builds) called curl_easy_perform_ev that instead uses the event-based API internally to drive the transfer.

The curl_multi_socket_action() is meant to use an event library to work really well multi-platform, or something like epoll directly if Linux-only functionality is fine for you. curl and libcurl is quite likely among the most portable code you can find so after having fought with this agony a while (how to introduce event-based testing without narrowing the tested platforms too much) I settled on a simple but yet brilliant solution (I can call it brilliant since I didn’t come up with the idea on my own):

We write an internal “simulated” event-based library with functionality provided by the libcurl internal function Curl_poll() (the link unfortunately goes to a line number, you may need to move around in the file to find the function). It is in itself a wrapper function that can work with either poll() or select() and should therefor work on just about any operating system written since the 90s, and most of the ones since before that as well! Doing such an emulation code may not be the most clever action if the aim would be to write a high performance and low latency application, but since my objective now is to exercise the API and make an curl_easy_perform clone it was perfect!

It should be carefully noted that curl_easy_perform_ev is only for testing and will only exist in debug-enabled builds and is therefor not considered stable nor a part of the public API!

Running event-based tests

The fake event library works with the curl_multi_socket_action() family of functions and when curl is invoked with –test-event, it will call curl_easy_perform_ev instead of curl_easy_perform and the transfer should then work exactly as without –test-event.

The main test suite running script called ‘runtests.pl’ now features the option -e that will run all ~800 curl tests with –test-event. It will skip tests it can’t run event-based – basically all the tests that don’t use the curl tool.

Many sockets is slow if not done with events

This picture on the right shows some very old performance measurements done on libcurl in the year 2005, but the time spent growing exponentially when the amount of sockets grow is exactly why you want to use something event-based instead of something poll or select based.

See also my document discussing poll, select and event-based.

dotdot removal in libcurl 7.32.0

Allow as much as possible and only sanitize what’s absolutely necessary.

That has basically been the rule for the URL parser in curl and libcurl since the project was started in the 90s. The upside with this is that you can use curl to torture your web servers with tests and you can handicraft really imaginary stuff to send and thus subsequently to receive. It kind of assumes that the user truly gives curl a URL the user wants to use.

Why would you give curl a broken URL?

But of course life and internet protocols, and perhaps in particular HTTP, is more involved than that. It soon becomes more complicated.

Redirects

Everyone who’s writing a web user-agent based on RFC 2616 soon faces the fact that redirects based on the Location: header is a source of fun and head-scratching. It is defined in the spec as only allowing “absolute URLs” but the reality is that they were also provided as relative ones by web servers already from the start so the browsers of course support that (and the pending HTTPbis document is already making this clear). curl thus also adopted support for relative URLs, meaning the ability to “merge” or “add” a relative URL onto a previously used absolute one had to be implemented. And even illegally constructed URLs are done this way and in the grand tradition of web browsers, they have not tried to stop users from doing bad things, they have instead adapted and now instead try to convert it to what the user could’ve meant. Like for example using a white space within the URL you send in a Location: header. Even curl has to sanitize that so that it works more like the browsers.

Relative path segments

The path part of URLs are truly to be seen as a path, in that it is a hierarchical scheme where each slash-separated part adds a piece. Like “/first/second/third.html”

As it turns out, you can also include modifiers in the path that have special meanings. Like the “..” (two dots or periods next to each other) known from shells and command lines to mean “one directory level up” can also be used in the path part of a URL like “/one/three/../two/three.html” which equals “/one/two/three.html” when the dotdot sequence is handled. This dot removal procedure is documented in the generic URL specification RFC 3986 (published January 2005) and is completely protocol agnostic. It works like this for HTTP, FTP and every other protocol you provide a path part for.

In its traditional spirit of just accepting and passing along, curl didn’t use to treat “dotdots” in any particular way but handed it over to the server to deal with. There probably aren’t that terribly many such occurrences either so it never really caused any problems or made any users hit any particular walls (or they were too shy to report it); until one day back in February this year… so we finally had to do something about this. Some 8 years after the spec saying it must be done was released.

dotdot removal

Alas, libcurl 7.32.0 now features (once it gets released around August 12th) full traversal and handling of such sequences in the path part of URLs. It also includes single dot sequences like in “/one/./two”. libcurl will detect such uses and convert the path to a sequence without them and continue on. This of course will cause a limited altered behavior for the possible small portion of users out there in the world who would use dotdot sequences and actually want them to get sent as-is the way libcurl has been doing it. I decided against adding an option for disabling this behavior, but of course if someone would experience terrible pain and can reported about it convincingly to us we could possible reconsider that decision in the future.

I suspect (and hope) this will just be another little change along the way that will make libcurl act more standard and more like the browsers and thus cause less problems to users but without people much having to care about how or why.

Further reading: the dotdot.c file from the libcurl source tree!

Bonus kit

A dot to dot surprise drawing for you and your kids (click for higher resolution)

curl dot-to-dot

tailmatching them cookies

A brand new libcurl security advisory was announced on April 12th, which details how libcurl can leak cookies to domains with tailmatch. Let me explain the details.

(Did I mention that security is hard?)

cookiecurl first implemented cookie support way back in the early days in the late 90s. I participated in the IETF work that much later documented how cookies work in real life. I know how cookies work, and yet this flaw still existed in the curl cookie implementation for over 13 years. Until someone spotted it. And once again that sense of gaaaah, how come we never saw this before!! came over me.

A quick cookie 101

When cookies are used over HTTP, it is (if we simplify things a little) only a name = value pair that is set to be valid a certain domain and a path. But the path is only specifying the prefix, and the domain only specifies the tail part. This means that a site can set a cookie that is for the entire site that is under the path /members so that it will be sent by the brower even for /members/names/ as well as for /members/profile/me etc. The cookie will then not be sent to the same domain for pages under a different path, such as /logout or similar.

A domain for a cookie can set to be valid for example.org and then it will be sent by the browser also for www.example.org and www.sub.example.org but not at all for example.com or badexample.org.

Unless of course you have a bug in the cookie tailmatching function. The bug libcurl had until 7.30.0 was released made it send cookies for the domain example.org also to sites that would have the same tail but a different prefix. Like badexample.org.

Let me try a story on you

It might not be obvious at first glance how terrible this can become to users. Let me take you through an imaginary story, backed up by some facts:

Imagine that there’s a known web site out there on the internet that provides an email service to users. Users login on a form and they read email. Or perhaps it is a social site. Preferably for our story, the site is using HTTP only but this trick can be done for most HTTPS sites as well with only a mildly bigger effort.

This known and popular site runs its services on ‘site.com’. When you’re logged in to site.com, your session is a cookie that keeps getting sent to the server and the server sometimes updates the contents and sends it back to the browser. This is the way millions of sites work.

As an evil person, you now register a domain and setup an attack server. You register a domain that has the same ending as the legitimate site. You call your domain ‘fun-cat-and-food-pics-from-site.com’ (FCAFPFS among friends).

anattackMr evil person also knows that there are several web browsers, typically special purpose ones for different kinds of devices, that use libcurl as its base. (But it doesn’t have to be a browser, it could be other tools as well but for this story a browser fits fine.) Lets say you know a person or two who use one of those browsers on site.com.

You send a phishing email to these persons. Or post a funny picture on the social site. The idea is to have them click your link to follow through to your funny FCAFPFS site. A little social engineering, who on the internet can truly resist funny cats?

The visitor’s browser (which uses a vulnerable libcurl) does the wrong “tailmatch” on the domain for the session cookie and gladly hands it off to the attacker site. The attacker site could then use that cookie to access site.com and hijack the user’s session. Quite likely the attacker would immediately change password or something and logout/login so that the innocent user who’s off looking at cats will get a “you are logged-out” message when he/she returns to site.com…

The attacker could then use “password reminder” features on other sites to get emails sent to site.com to allow him to continue attacking the user’s other accounts on other services. Or if site.com was a social site, the attacker would post more cat links and harvest more accounts etc…

End of story.

Any process improvements?

For every security vulnerability a project gets, it should be a reason for scrutinizing what went wrong. I don’t mean in the actual code necessarily, but more what processes we lack that made the bug sneak in and remain in there for so long without being detected.

What didn’t we do that made this bug survive this long?

Obviously we didn’t review the code properly. But this is a tricky beast that was added a very long time ago, back in the days when the project was young and not that many developers were involved. Before we even had a test suite. I do believe that we have slightly better reviews these days, but I will also claim that it is far from sure that we would detect this flaw by a sheer code review.

Test cases! We clearly lacked the necessary test case setup that tested the limitations of how cookies are supposed to work and get sent back and forth. We’ve added a few new ones now that detect this particular flaw fine, but I think we have reasons to continue to search for various kinds of negative tests we should do. Involving cookies of course, but also generally in other areas of the curl project.

Of course, we’re all just working voluntarily here on spare time so we can’t expect miracles.

(an attack, picture by Andy Gardner)