Better pipelining in libcurl 7.30.0

Back in October 2006, we added support for HTTP pipelining to libcurl. The implementation was naive and simple: it basically preferred to pipeline everything on the single connection to a given host if it could. It works only with the multi interface and if you do a second request to the same host it will try to pipeline that.

pipelines

Over the years the feature was bugfixed and improved slightly, which proved that at least a couple of applications actually used it – but it was never any particularly big hit among libcurl’s vast amount of features.

Related background information that gives details on some of the problems with pipelining in the wild can be found in Mark Nottingham’s Making HTTP Pipelining Usable on the Open Web internet-draft, Mozilla’s bug report “HTTP pipelining by default” and Chrome’s pipelining docs.

Now, more than six years later, Linus Nielsen Feltzing (a colleague and friend at Haxx) strikes back with a much improved and almost completely revamped HTTP pipelining support (merged into master just hours before the new-feature window closed for the pending 7.30.0 release). This time, the implementation features and provides:

  • a configurable number of connections and pipelines to each unique host name
  • a round-robin approach that favors starting new connections first, and then pipeline on existing connections once the maximum number of connections to the host is reached
  • a max-depth value that when filled makes the code not add any more requests on that connection/pipeline
  • a pipe penalization system that avoids adding new requests to pipes that are known to be receiving very large contents and thus possibly would stall subsequent requests for an extended period of time
  • a server blacklist that allows the application to specify a list of servers for which HTTP pipelining should not be attempted – real world tests has proven that some servers are too broken to be allowed to play the game

The code also adds a feature that helps out applications to do massive amount of requests in a controlled manner:

A hard maximum amount of connections, that when reached makes libcurl queue up easy handles internally until they can create a new connection or re-use a previously used one. That allows an application to for example set the limit to 50 and then add 400 handles to the multi handle but it will still only use 50 connections as a maximum so over time when requests get completed it will start new transfers on the requests that are waiting in line and thus shrinking the queue and keeping the maximum amount of connections until there’s less than 50 left to do…

Previously that kind of queuing had to be done by the application itself, but now with the much more extensive pipelining support it really isn’t as easy for an application to know when the new request can get pipelined or create a new connection so this logic is now provided by libcurl itself. It is likely going to be appreciated and used also by non-pipelining applications…

This implementation is accompanied with a bunch of new test cases for libcurl (and even a new HTTP test server for the purpose), and it has been tested in the wild for a while with libcurl as the engine in a web browser implementation (the company doing that has requested to remain anonymous). We believe it is in fairly decent state, but as this is a large step and the first release it is shipped with I expect there to be some hiccups along the way.

Two things to take note of:

  1. pipelining is only available for users of libcurl’s multi interface, and only if explicitly enabled with CURLMOPT_PIPELINING
  2. the curl command line tool does not use the multi interface and thus it will not use pipelining

Why no curl 8

no 8In this little piece I’ll explain why there won’t be any version 8 of curl and libcurl in a long time. I won’t rule out that it might happen at some point in the future. Just that it won’t happen anytime soon and explain the reasons why.

Seven point twenty nine, really?

We’ve done 29 minor releases and many more patch releases since version seven was born, on August 7 2000. We did in fact bump the ABI number a couple of times so we had the chance of bumping the version number as well, but we didn’t take the chance back then and these days we have a much harder commitment and determinism to not break the ABI.

There’s really no particular downside with having a minor version 29. Given our current speed and minor versioning rules, we’ll bump it 4-6 times/year and we won’t have any practical problems until we reach 256. (This particular detail is because we provide the version number info with the API using 8 bits per major, minor and patch field and 8 bits can as you know only hold values up to 255.) Assuming we bump minor number 6 times per year, we’ll reach the problematic limit in about 37 years in the fine year 2050. Possibly we’ll find a reason to bump to version 8 before that.

Prepare yourself for seven point an-increasingly-higher-number for a number of years coming up!

Is bumping the ABI number that bad?

Yes!

We have a compatibility within the ABI number so that a later version always work with a program built to use the older version. We have several hundred million users. That means an awful lot of programs are built to use this particular ABI number. Changing the number has a ripple effect so that at some point in time a new version has to replace all the old ones and applications need to be rebuilt – and at worst also possibly have to be rewritten in parts to handle the ABI/API changes. The amount of work done “out there” on hundreds or thousands of applications for a single little libcurl tweak can be enormous. The last time we bumped the ABI, we got a serious amount of harsh words and critical feedback and since then we’ve gotten many more users!

Don’t sensible systems handle multiple library versions?

Yes in theory they do, but in practice they don’t.

If you build applications they have the ABI number stored for which lib to use, so if you just keep the different versions of the libraries installed in the file system you’ll be fine. Then the older applications will keep using the old version and the ones you rebuild will be made to use the new version. Everything is fine and dandy and over time all rebuilt applications will use the latest ABI and you can delete the older version from the system.

In reality, libraries are provided by distributions or OS vendors and they ship applications that link to a specific version of the underlying libraries. These distributions only want one version of the lib, so when an ABI bump is made all the applications that use the lib will be rebuilt and have to be updated.

Most importantly, there’s no pressing need!

If we would find ourselves cornered without ability to continue development without a bump then of course we would take the pain it involves. But as things are right now, we have a few things we don’t really like with the current API and ABI but in general it works fine and there’s no major downsides or great pains involved. We simply do not have any particularly good reason to bump version number or ABI version. Things work pretty good with the current way.

The future is of course unknown and at some point we’ll face a true limitation in the API that we need to bridge over with a bump, but it can also take a long while until we hit that snag.

Update April 6th: this article has been read by many and I’ve read a lot of comments and some misunderstandings about it. Here’s some additional clarifications:

  1. this isn’t stuff we’ve suddenly realized now. This is truths and facts we’ve learned over a long time and this post just makes it more widely available and easier to find. We already worked with this knowledge. I decided to blog about it since it struck me we didn’t have it documented anywhere.
  2. not doing version 8 (in a long time) does not mean we’re done or that the pace of development slows down. We keep doing releases bimonthly and we keep doing an average of 30 something bugfixes in each release.

some missing github features

github-social-coding

I think github is a lovely resource for collaborating on source code with my friends all over the globe. Among other things, we host the primary curl repository there and we’ve been doing so for almost three years now. This experience has led me to discover a bunch of things I miss in the service…

github is clearly aimed at repositories run by one person or a small set of persons, while in the projects I run I try to involve as many as possible in wide collaboration and I put efforts into informing everyone to get the widest possible attention and feedback. I may have created the account and “own” the repository, but I want the work to be done by a large team and I want everything that happens to it to be seen by a large audience. This is not always possible to do easily with the existing github services.

To further this spirit and to widen cooperation more, I would like to see the following improvements:

  • pull requests can’t be disabled nor can i control to which email address to send the notification. In our project I want all patches posted to the mailing list for review, archiving and discussions before I get a pull request, and I don’t use GitHub’s merge feature since it is hardly ever good enough (I want fast-forward and I usually feel a need to edit the commit message ever so slightly etc). I want the pull request to get translated into a patch review submission to the mailing list.
  • similarly, I cannot redirect where notifications are sent when someone comments a commit or a source line and this is highly annoying since we merge a lot of outsiders’ patches etc and as they may still read the mailing list we want the discussion there! Many times the contributors don’t have GitHub accounts and of course we don’t want to require that.
  • after the death of the CIA.vc service, the current IRC notification service offered by GitHub is nothing but inferior. The stupid bot has to join, tell its message and leave again. It is not an IRC friendly behavior and I can’t make it announce exactly what I’d like it to say…
  • I wish it had much better email notification on commits that would allow me to customize what it sends out without forcing me to write a full blown replacement. I want a unified diff included!

I realize GitHub has features that offer me to create an “organization” to host a repository instead of it being owned by me as a person, but I don’t think that should be a requirement to get this functionality. And I don’t know if GitHub truly offers better group functionality then either.

Now 1K+ awesome contributors

On March 20 1998 the first curl release shipped – but as it was built on a previous project there was already a handful of contributors. It was then just a modest little project with but a few thousand lines of code in total.

As time has passed, the project has grown and development has been going on at a rapid speed in a never-ending cycle of releases and bug fixes. The first couple of years I didn’t keep track of every single contributor properly, but one day in 2005 I decided to go back and try to collect the names of all the helpers so far. Names that had been mentioned in the changelogs and comments etc. When looking at our history, it is therefore not really sensible to look at the numbers before that cut-off date as I didn’t keep the count and logs properly before then.

All since then, I make an effort in properly giving credit to all contributors: patch producers, bug reporters, documentation writers as well as people “just” providing good advice. I try to mention all contributors. To give credit where credit is due. In a volunteer driven project, that is after all the best compensation I can offer.

Looking back over the years it seems the pace of newcomers have been quite steady. The 437 names in 2005 have grown to 1005 in less than eight years. Roughly 70 new contributors every year or 6 per month.

Now, in February 2013 with the release of 7.29.0, we surpassed the magic 1000 named contributors limit in the project. 1000 contributors in about 5437 days. On average one new soul every 5th day over a period of almost 15 years. Fascinatingly enough, even if I count a more recent period like the last 6 years the pace is only a little faster with one new just a little faster than every 5 days!

I originally planned to present the data as a graph in this post, but since the development has been so extremely linear it turned out so boring I scrapped it! Instead I’ll show you the top-10 most common first names among curl contributors:

  1. 23 David
  2. 14 Peter
  3. 14 John
  4. 13 Michael
  5. 11 Eric
  6. 10 Mark
  7. 9 Tom
  8. 9 Tim
  9. 9 Daniel
  10. 9 Chris

The names that didn’t quite make it and that all exist 8 times in the THANKS file are Steve, Robert, Richard, James, Dan, Christian, Andrew and Andreas. As I’ve mentioned before: not that many females in there.

(Yes, I’ve ignored that possibly some Daniels go by the name of Dan, some Chrises might be Christian and so on, I’ve only counted the actual names people have used.)

Why the latest security vulnerability in curl happened

In the end of January 2013 we got a fresh security vulnerability pointed out to us in the curl project (it was publicly announced on Feb 6). Another buffer overflow. This time in the SASL Digest-MD5 handling for POP3, IMAP and SMTP. It is the 16th security flaw during curl’s life-time of almost 15 years so it isn’t a disaster but still of course it is never fun when it happens. I put a lot of my own effort and pride into this project so every time something like this floats to the surface my pride and self-esteem get damaged a bit.

Everyone who’s concerned about open source and security and foremost in a reliable and secure libcurl of course now wonders: how did this happen? How could this piece of security problem get into libcurl and what are we doing to make sure it doesn’t happen again?

Let me tell you the story. It is not as interesting nor full of conspiracies as you’d like. It is instead rather dull and boring but nevertheless the truth.

I’m the lead developer and maintainer of curl and libcurl. I personally have done some 65% of all commits in the project and I do the majority of all code reviews on the mailing list. Our code might be used by some 500 million users, but the number of regulars that can be considered the “core team” can still basically be counted on a single hand. Also, we all do this primarily on our spare time.

During intense development periods we get flooded by bug reports and patch submissions and my backlog grows. It’s really not possible to foresee when these periods come, but occasionally it seems the planets align in this way and work piles up.

In order to then proceed the best way in the project, I try to focus on the architectural and “deep” matters that need me and my particular knowledge most. I then try to leave the “easy” problems that are easier to work on to others, and I try to stay away from the issues that already seem to be under control by some of the existing regulars in the project. I also have to let other “elders” in the project push things with slightly less scrutiny just to be able to plow through the work better. Unfortunately this leads to the occasional flaw getting through and in this case it was even a security vulnerability that when you look back on the code you really cannot understand how we could miss this.

We do take security seriously though and we make a big effort on handling all security reports swiftly and accurately. Even if this was the 16th time we let our guard down, I want to think that we at least react responsibly and in a good way when we realize our mistakes.

Please don’t judge us due to this. Please instead consider joining us and help us review code and help us find the next flaw before we merge it into mainline or at least before we do a public release with the code!

sasl-patch

curl and libcurl 7.29.0

As a representative for the team behind curl and libcurl, we’re of course proud to yet again having shipped a release to the public today. Over 240 commits, with in total almost 10000 lines added and 6000 removed since the previous release in November 2012. We’re only a month away until the curl project turns 15 years old.

Some highlights this time include:

  • We fixed a nasty overflow vulnerability we have been shipping in a few previous releases. The flaw existed in code used by IMAP, POP3 and SMTP.
  • We introduced a new test suite output mode that is “automake compliant”. This can help linux distros and others who want to run many test suites and have a unified way of parsing the results and outcome. It follows the spirit of ptest and I believe it will be used in the future.
  • The IMAP support got a lot of improvements and lots of login and authentication fixes were brought in. Now libcurl supports the sasl methods digest-md5, cram-md5, ntlm and login., and it also recognizes the login disabled server capability.
  • Architecture wise, we remodeled the internals quite a lot and made it “always-multi“. This improves readability and internal complexity and is all just goodness. The short-term downside is possibly the risk for a temporary increase in bug reports due to this…
  • 35 specified bug fixes were crammed in as well, and there are a bunch more we haven’t mentioned that just “silently” improved the multi interface functionality.

the new bug tracker on sourceforge

A while ago Sourceforge gave me the offer to upgrade curl’s bug tracker to “the new one” they offer. They do offer some arguments to why you would want to do this but they don’t elaborate much on the transition for existing projects. Since I’ve been annoyed and disappointed on the old one for years I decided to dive right in. I decided to post this blog entry to possibly encourage others as well, or at least explain how upgrading worked for us.

I’ll start by explaining a bit about what’s so bad about the old Sourceforge bug tracker. Anyone who has tried to use it for anything “real” most likely already know about these things and then I figure my list can be used for a comparison if we’ve gotten annoyed by the same things.

  1. They use a global bug id which makes all bug entries get very large numbers that aren’t in sequence and are fairly hard to remember.
  2. You can’t respond to bug reports by mail, so you are forced to use the heavy ad-filled web site.
  3. Ridiculous URLs to the bug tracker and each individual bug entry. I created a bounce CGI years ago on the curl web site to avoid having to use the overly long ones anywhere.
  4. When sending out email notifications, it prepends the new comments while having the older ones below which basically is an odd-order top-posting style a lot of people and projects have a hard time to get accustomed to.
The new tracker addresses all of these issues and I agreed to allow it to make curl use their new tracker. And this is the outcome:
  • All the existing bug tracker entries were converted. They all now get numbered sequentially in a private number series so no more bug #31234234 and instead the 1100 or so bug reports became bug #1 to bug #1169.
  • The new bug entries have a different set of meta-data but the ‘status’ and ‘owner’ etc seemed to get translated pretty good. The new ‘milestone’ got populated wrongly for me, but it didn’t matter much to me because I simply cleared it.
  • There’s no visible way to translate from old style bug numbers to the new bug numbers. When I go to the URL for the old number it redirects me to the new bug so clearly sourceforge has created a look-up table it can use.
  • There’s now a sensible public URL to point out the “home” for the curl bug tracking.

Annoying things with the new tracker:

  • It splits up a the comments to a single report into several “pages” far too early and forces you to click through annoying “page 2” or “next page” links to see the latest comments.

Summary: the upgrade was totally worth it. A much better bug tracker with much more useful interfaces, both the web interface and the ability to respond to it by email etc. And still room for improvements!

internally, we’re all multi now!

libcurl internals suddenly become a lot cleaner and neater to work with when we made all code assume and work with the multi interface!

libcurl was initially created slightly after the birth of the curl tool. After the tool started to get some traction and use out in the world, requests and queries about a library with its powers started to drop in. Soon enough, in the year 2000 we shipped the first release of libcurl and it featured a synchronous API (the “easy” interface) that performs the complete operation and then returns. I think we can now say that the blocking easy interface was successful and its ease of use has been very popular and appreciated by many users.

During 2002 the need for a non-blocking API had been identified and we introduced the multi interface. The multi interface is kind of a super-set as it re-uses the same handles as is used with the easy interface, so it cleverly makes it fairly easy for a standard application to move from the easy interface to the multi.

Basically since that day, we’ve struggled in the source code structure to handle the fact that we have both a blocking and a non-blocking API. In lots of places we’ve had different code paths and choices done depending on which API that was used. It made the source code hard to follow and it occasionally introduced hard to track bugs which could lead to the multi and easy interface not behaving the same way to the underlying network or protocol behavior. It was clear very early on that it wasn’t an ideal design choice, but it was a design choice that was spread out among the code and it stuck.

During November 2012 I finally took on the code that we’ve had #ifdef’ed since around 2005 which makes the blocking easy interface operation a wrapper function around the non-blocking multi interface functions. Using this method, all internals should be considered non-blocking and there is no need left to treat things differently depending on which API that was used because everything is now multi interface == non-blocking.

On January 17th 2013 the big patch was committed. 400 added lines, 800 removed over 54 modified files…

cURL

The curl year 2012

2012

So what did happen in the curl project during 2012?

First some basic stats

We shipped 6 releases with 199 identified bug fixes and some 40 other changes. That makes on average 33 bug fixes shipped every 61st day or a little over one bug fix done every second day. All this done with about 1000 commits to the git repository, which is roughly the same amount of git activity as 2010 and 2011. We merged commits from 72 different authors, which is a slight increase from the 62 in 2010 and 68 in 2011.

On our main development mailing list, the curl-library list, we now have 1300 subscribers and during 2012 it got about 3500 postings from almost 500 different From addresses. To no surprise, I posted by far the largest amount of mails there (847) with the number two poster being Günter Knauf who posted 151 times. Four more members posted more than 100 times: Steve Holme (145), Dan Fandrich (131), Marc Hoersken (130) and Yang Tse (107). Last year I sent 1175 mails to the same list…

Notable events

I’ve walked through the biggest changes and fixes and here are the particular ones I found stood out during this otherwise rather calm and laid back curl year. Possibly in a rough order of importance…

  1. We started the year with two security vulnerability announcements, regarding an SSL weakness and an injection flaw. They were reported in 2011 though and we didn’t get any further security alerts during 2012 which I think is good. Or a sign that nobody has been looking close enough…
  2. We got two interesting additions in the SSL backend department almost simultaneously. We got native Windows support with the use of the schannel subsystem and we got native Mac OS X support with the use of Darwin SSL. Thanks to these, we can now offer SSL-enabled libcurls on those operating systems without relying on third party SSL libraries.
  3. The VERIFYHOST debacle took off with “security researchers” throwing accusations and insults, ending with us releasing a curl release with the bug removed. It did however unfortunately lead to some follow-up problems in for example the PHP binding.
  4. During the autumn, the brokeness of WSApoll was identified, and we now build libcurl without it and as a result libcurl now works better on Windows!
  5. In an attempt to allow libcurl-using applications to avoid select() and its problems, we introduced the new public function curl_multi_wait. It avoids the FD_SETSIZE limit and makes it harder to screw up…
  6. The overly bloated User-Agent string for the curl tool was dramatically shortened when we cut out all the subsystems/libraries and their version numbers from the string. Now there’s only curl and its version number left. Nice and clean.
  7. In July we finally introduced metalink support in the curl tool with the curl 7.27.0 release. It’s been one of those things we’ve discussed for ages that finally came through and became reality.
  8. With the brand new HTTP CONNECT support in the test suite we suddenly could get much improved test cases that does SSL or just tunnel through an HTTP proxy with the CONNECT request. It of course helps us avoid regressions and otherwise improve curl and libcurl.

What didn’t happen

  1. I made an attempt to get the spindly hacking going, but I’ve mostly failed with that effort. I have personally not had enough time and energy to work on it, and the interest from the rest of the world seems luke warm at best.
  2. HTTP pipelining. Linus Nielsen Feltzing has a patch series in the works with a much improved pipelining support for libcurl. I’ll write a separate post about it once it gets in. Obviously we failed to merge it before the end of the year.
  3. Some of my friends like to mock me about curl not being completely IPv6 friendly due to its lack of support for Happy Eyeballs, and of course they’re right. Making curl just do two connects on IPv6-enabled machines should be a fairly small change but yet I haven’t yet managed to get into actually implementing it…
  4. DANE is SSL cert verification with records from DNS thanks to DNSSEC. Firefox has some experiments going and Chrome already supports it. This is a technology that truly can improve HTTPS going forwards and allow us to avoid the annoyingly weak and broken CA model…

I won’t promise that any of these will happen during 2013 but I can promise there will be efforts…

The Future

I wrote a separate post a short while ago about the HTTP2 progress, and I expect 2013 to bring much more details and discussions in that area. Will we get SRV record support soon? Or perhaps even URI records? Will some of the recent discussions about new HTTP auth schemes develop into something that will reach the internet in the coming year?

In libcurl we will switch to an internal design that is purely non-blocking with a lot of if-then-that-else source code removed for checks which interface that is used. I’ll make a follow-up post with details about that as well as soon as it actually happens.

Our Responsibility

curl and libcurl are considered pillars in the internet world by now. This year I’ve heard from several places by independent sources how people consider support by curl to be an important driver for internet technology. As long as we don’t have it, it hasn’t really reached everyone and that things won’t get adopted for real in the Internet community until curl has it supported. As father of the project it makes me proud and humble, but I also feel the responsibility of making sure that we continue to do the right thing the right way.

I also realize that this position of ours is not automatically glued to us, we need to keep up the good stuff to make it stick.

cURL

curl, open source and networking