Tag Archives: cURL and libcurl

Testing curl_multi_socket_action

We’re introducing a brand new way to test the event-based socket_action API in libcurl! (available in curl since commit 6cf8413e3162)

Background

Since 2006 we’ve had three major API families in libcurl for doing file transfers:

  1. the easy interface – a synchronous and yes, easy, interface for getting things done
  2. the multi interface – a non-blocking interface that allows multiple simultaneous transfers (in both directions) in the same thread
  3. the socket_action interface – a brother of the multi interface but designed for  use with an event-based library/engine for high performance and large scale transfers

The curl command line tool uses the easy interface and our test suite for curl + libcurl consists of perhaps 80% curl tests, while the rest are libcurl-using programs testing both the easy interface and the multi interface.

Early this year we modified libcurl’s internals so that the functions driving the easy interface transfer would use the multi interface internally. Then all of a sudden all the curl-using tests using the easy interface also then by definition tested that the operation worked fine with the multi interface. Needless to say, this pushed several bugs up to the surface that we could fix.

So the multi and the easy interfaces are tested by many hundred test cases on a large number of various systems every day around the clock. Nice! But what about the third interface? The socket_action interface isn’t tested at all! Time to change this sorry state.

Event-based test challenges

The event-based API has its own set of challenges; like it needs to react on socket state changes (only) and allow smooth interactions with the user’s own choice of event library etc. This is our newest API family and also the least commonly used. One reason for this may very well be that event-based coding is generally harder to do than more traditional poll-based code. Event-based code forces the application into using state-machines all over to a much higher degree and the frequent use of callbacks easily makes the code hard to read and its logic hard to follow.

So, curl_multi_socket_action() acts in ways that aren’t done or even necessary when the regular select-oriented multi interface is used. Code that then needs to be tested to remain working!

Introducing an alternative curl_easy_perform

As I mentioned before, we made the general multi interface widely tested by making sure the easy interface code uses the multi interface under the hood. We can’t easily do the same operation again, but instead this time we introduce a separate implementation (for debug-enabled builds) called curl_easy_perform_ev that instead uses the event-based API internally to drive the transfer.

The curl_multi_socket_action() is meant to use an event library to work really well multi-platform, or something like epoll directly if Linux-only functionality is fine for you. curl and libcurl is quite likely among the most portable code you can find so after having fought with this agony a while (how to introduce event-based testing without narrowing the tested platforms too much) I settled on a simple but yet brilliant solution (I can call it brilliant since I didn’t come up with the idea on my own):

We write an internal “simulated” event-based library with functionality provided by the libcurl internal function Curl_poll() (the link unfortunately goes to a line number, you may need to move around in the file to find the function). It is in itself a wrapper function that can work with either poll() or select() and should therefor work on just about any operating system written since the 90s, and most of the ones since before that as well! Doing such an emulation code may not be the most clever action if the aim would be to write a high performance and low latency application, but since my objective now is to exercise the API and make an curl_easy_perform clone it was perfect!

It should be carefully noted that curl_easy_perform_ev is only for testing and will only exist in debug-enabled builds and is therefor not considered stable nor a part of the public API!

Running event-based tests

The fake event library works with the curl_multi_socket_action() family of functions and when curl is invoked with –test-event, it will call curl_easy_perform_ev instead of curl_easy_perform and the transfer should then work exactly as without –test-event.

The main test suite running script called ‘runtests.pl’ now features the option -e that will run all ~800 curl tests with –test-event. It will skip tests it can’t run event-based – basically all the tests that don’t use the curl tool.

Many sockets is slow if not done with events

This picture on the right shows some very old performance measurements done on libcurl in the year 2005, but the time spent growing exponentially when the amount of sockets grow is exactly why you want to use something event-based instead of something poll or select based.

See also my document discussing poll, select and event-based.

dotdot removal in libcurl 7.32.0

Allow as much as possible and only sanitize what’s absolutely necessary.

That has basically been the rule for the URL parser in curl and libcurl since the project was started in the 90s. The upside with this is that you can use curl to torture your web servers with tests and you can handicraft really imaginary stuff to send and thus subsequently to receive. It kind of assumes that the user truly gives curl a URL the user wants to use.

Why would you give curl a broken URL?

But of course life and internet protocols, and perhaps in particular HTTP, is more involved than that. It soon becomes more complicated.

Redirects

Everyone who’s writing a web user-agent based on RFC 2616 soon faces the fact that redirects based on the Location: header is a source of fun and head-scratching. It is defined in the spec as only allowing “absolute URLs” but the reality is that they were also provided as relative ones by web servers already from the start so the browsers of course support that (and the pending HTTPbis document is already making this clear). curl thus also adopted support for relative URLs, meaning the ability to “merge” or “add” a relative URL onto a previously used absolute one had to be implemented. And even illegally constructed URLs are done this way and in the grand tradition of web browsers, they have not tried to stop users from doing bad things, they have instead adapted and now instead try to convert it to what the user could’ve meant. Like for example using a white space within the URL you send in a Location: header. Even curl has to sanitize that so that it works more like the browsers.

Relative path segments

The path part of URLs are truly to be seen as a path, in that it is a hierarchical scheme where each slash-separated part adds a piece. Like “/first/second/third.html”

As it turns out, you can also include modifiers in the path that have special meanings. Like the “..” (two dots or periods next to each other) known from shells and command lines to mean “one directory level up” can also be used in the path part of a URL like “/one/three/../two/three.html” which equals “/one/two/three.html” when the dotdot sequence is handled. This dot removal procedure is documented in the generic URL specification RFC 3986 (published January 2005) and is completely protocol agnostic. It works like this for HTTP, FTP and every other protocol you provide a path part for.

In its traditional spirit of just accepting and passing along, curl didn’t use to treat “dotdots” in any particular way but handed it over to the server to deal with. There probably aren’t that terribly many such occurrences either so it never really caused any problems or made any users hit any particular walls (or they were too shy to report it); until one day back in February this year… so we finally had to do something about this. Some 8 years after the spec saying it must be done was released.

dotdot removal

Alas, libcurl 7.32.0 now features (once it gets released around August 12th) full traversal and handling of such sequences in the path part of URLs. It also includes single dot sequences like in “/one/./two”. libcurl will detect such uses and convert the path to a sequence without them and continue on. This of course will cause a limited altered behavior for the possible small portion of users out there in the world who would use dotdot sequences and actually want them to get sent as-is the way libcurl has been doing it. I decided against adding an option for disabling this behavior, but of course if someone would experience terrible pain and can reported about it convincingly to us we could possible reconsider that decision in the future.

I suspect (and hope) this will just be another little change along the way that will make libcurl act more standard and more like the browsers and thus cause less problems to users but without people much having to care about how or why.

Further reading: the dotdot.c file from the libcurl source tree!

Bonus kit

A dot to dot surprise drawing for you and your kids (click for higher resolution)

curl dot-to-dot

tailmatching them cookies

A brand new libcurl security advisory was announced on April 12th, which details how libcurl can leak cookies to domains with tailmatch. Let me explain the details.

(Did I mention that security is hard?)

cookiecurl first implemented cookie support way back in the early days in the late 90s. I participated in the IETF work that much later documented how cookies work in real life. I know how cookies work, and yet this flaw still existed in the curl cookie implementation for over 13 years. Until someone spotted it. And once again that sense of gaaaah, how come we never saw this before!! came over me.

A quick cookie 101

When cookies are used over HTTP, it is (if we simplify things a little) only a name = value pair that is set to be valid a certain domain and a path. But the path is only specifying the prefix, and the domain only specifies the tail part. This means that a site can set a cookie that is for the entire site that is under the path /members so that it will be sent by the brower even for /members/names/ as well as for /members/profile/me etc. The cookie will then not be sent to the same domain for pages under a different path, such as /logout or similar.

A domain for a cookie can set to be valid for example.org and then it will be sent by the browser also for www.example.org and www.sub.example.org but not at all for example.com or badexample.org.

Unless of course you have a bug in the cookie tailmatching function. The bug libcurl had until 7.30.0 was released made it send cookies for the domain example.org also to sites that would have the same tail but a different prefix. Like badexample.org.

Let me try a story on you

It might not be obvious at first glance how terrible this can become to users. Let me take you through an imaginary story, backed up by some facts:

Imagine that there’s a known web site out there on the internet that provides an email service to users. Users login on a form and they read email. Or perhaps it is a social site. Preferably for our story, the site is using HTTP only but this trick can be done for most HTTPS sites as well with only a mildly bigger effort.

This known and popular site runs its services on ‘site.com’. When you’re logged in to site.com, your session is a cookie that keeps getting sent to the server and the server sometimes updates the contents and sends it back to the browser. This is the way millions of sites work.

As an evil person, you now register a domain and setup an attack server. You register a domain that has the same ending as the legitimate site. You call your domain ‘fun-cat-and-food-pics-from-site.com’ (FCAFPFS among friends).

anattackMr evil person also knows that there are several web browsers, typically special purpose ones for different kinds of devices, that use libcurl as its base. (But it doesn’t have to be a browser, it could be other tools as well but for this story a browser fits fine.) Lets say you know a person or two who use one of those browsers on site.com.

You send a phishing email to these persons. Or post a funny picture on the social site. The idea is to have them click your link to follow through to your funny FCAFPFS site. A little social engineering, who on the internet can truly resist funny cats?

The visitor’s browser (which uses a vulnerable libcurl) does the wrong “tailmatch” on the domain for the session cookie and gladly hands it off to the attacker site. The attacker site could then use that cookie to access site.com and hijack the user’s session. Quite likely the attacker would immediately change password or something and logout/login so that the innocent user who’s off looking at cats will get a “you are logged-out” message when he/she returns to site.com…

The attacker could then use “password reminder” features on other sites to get emails sent to site.com to allow him to continue attacking the user’s other accounts on other services. Or if site.com was a social site, the attacker would post more cat links and harvest more accounts etc…

End of story.

Any process improvements?

For every security vulnerability a project gets, it should be a reason for scrutinizing what went wrong. I don’t mean in the actual code necessarily, but more what processes we lack that made the bug sneak in and remain in there for so long without being detected.

What didn’t we do that made this bug survive this long?

Obviously we didn’t review the code properly. But this is a tricky beast that was added a very long time ago, back in the days when the project was young and not that many developers were involved. Before we even had a test suite. I do believe that we have slightly better reviews these days, but I will also claim that it is far from sure that we would detect this flaw by a sheer code review.

Test cases! We clearly lacked the necessary test case setup that tested the limitations of how cookies are supposed to work and get sent back and forth. We’ve added a few new ones now that detect this particular flaw fine, but I think we have reasons to continue to search for various kinds of negative tests we should do. Involving cookies of course, but also generally in other areas of the curl project.

Of course, we’re all just working voluntarily here on spare time so we can’t expect miracles.

(an attack, picture by Andy Gardner)

Monitoring my voip line

Ping Communication Voice Catcher 201EMy “landline” phone in my house is connected over voip through my fiber and I’m using the service provided by Affinity Telecom. A company I never heard of before and I can only presume it is a fairly small one.

Everything is working out fine, apart from one annoying little glitch: every other month or so my phone reports itself as either busy to a caller (or just as if nobody picks up the phone) and the pingcom NetPhone Adapter 201E voip box I have needs to be restarted for the phone line to get back to normal (I haven’t figured out if the box or the service provider is the actual villain).

In my household we usually discover the problem after several days of this situation since we don’t get many calls and we don’t make many calls. (The situation is usually even notable on the voip box’s set of led lights as they are flashing when they are otherwise solid but the box is not put in a place where we notice that either.) Several days of the phone beeping busy to callers is a bit annoying and I’ve decided to try to remedy that somehow. Luckily the box has a web interface that allows me to admin it and check status etc, and after all, I know a tool I can use to script HTTP to the thing, extract the status and send me a message when it needs some love!

Okay, so I just need to “login” to the box and get the status page and extract the info for the phone line and I’m done. I’ve done this dozens if not hundreds of times on sites all over the net the last decade. I merrily transferred the device info page “http://pingcom/Status/Device_Info.shtml” with curl and gave it a glance…

Oh. My. God. This is a little excerpt from the javascript magic that handles the password I enter to login to the web interface:

    /*
     * Get the salt from the router
     */
    (code gets salt from a local URL)

    var salt = xml_doc.textdoc;
    /*
     * Append the password to the salt
     */
     var input = salt + password;
    /*
     * MD5 hash of the salt.
     */
    var hash = hex_md5(input);
    /*
     * Append the MD5 hash to the salt.
    */
    var login_hash = salt.concat(hash);
    /*
     * Send the login hash to the server.
     */
    login_request = new ajax_xmlhttp("/post_login.xml?user=" + escape(username) + "&hash=" +
         escape(login_hash), function(xml_doc)

    [cut]

Ugha! So it downloads a salt, does hashing, salting and md5ing on the data within the browser itself before it sends it off to the server. That’s is so annoying and sure I can probably replicate that logic in a script language of my choice but it is going to take some trial and error until the details are all sorted out.

Ok, so I do the web form login with my browser again and start to look at what requests it does and so on in order to be able to mimic them with curl instead. I then spot that when viewing that device info page, it makes a whole series of HTTP requests that aren’t for pictures and not for the main HTML… Hm, at a closer look it fetches data from a bunch of URLs ending with “.cgi”! And look, among those URLs there’s one in particular that is called “voip_line_state.cgi”. Let me try to get just that and see what that might offer and what funny auth dance I may need for it…

curl http://pingcom/voip_line_state.cgi

And what do you know? It returns a full XML of the voip status, entirely without any login or authentication required:

<LineStatus channel_count="2">
  <Channel index="0" enabled="1">
    <SIP state="Up">
      <Name>0123456789</Name>
      <Server>sip.example.org</Server>
    </SIP>
    <Call state="Idle"></Call>
  </Channel>
  <Channel index="1" enabled="0"></Channel>
</LineStatus>

Lovely! That ‘Idle’ string in there in the <Call> tag is the key. I now poll the status and check to see the state in order to mail myself when it looks wrong. Still needs to be proven to actually trigger during the problem but hey, why wouldn’t it work?

The final tip is probably the lovely tool xml2, which converts an XML input to a “flat” output. That output is perfect to use grep or sed on to properly catch the correct situation, and it keeps me from resorting to the error-prone concept of grepping or regexing actual XML. After xml2 the above XML looks like this:

/LineStatus/@channel_count=2
/LineStatus/Channel/@index=0
/LineStatus/Channel/@enabled=1
/LineStatus/Channel/SIP/@state=Up
/LineStatus/Channel/SIP/Name=012345679
/LineStatus/Channel/SIP/Server=sip.example.org
/LineStatus/Channel/Call/@state=Idle
/LineStatus/Channel
/LineStatus/Channel/@index=1
/LineStatus/Channel/@enabled=0

Now I’ll just have to wait until the problem hits again to see that my scripts actually work… Once proven to detect the situation, my next step will probably be to actually maneuver the web interface and reboot it. I’ll get back to that later..

Better pipelining in libcurl 7.30.0

Back in October 2006, we added support for HTTP pipelining to libcurl. The implementation was naive and simple: it basically preferred to pipeline everything on the single connection to a given host if it could. It works only with the multi interface and if you do a second request to the same host it will try to pipeline that.

pipelines

Over the years the feature was bugfixed and improved slightly, which proved that at least a couple of applications actually used it – but it was never any particularly big hit among libcurl’s vast amount of features.

Related background information that gives details on some of the problems with pipelining in the wild can be found in Mark Nottingham’s Making HTTP Pipelining Usable on the Open Web internet-draft, Mozilla’s bug report “HTTP pipelining by default” and Chrome’s pipelining docs.

Now, more than six years later, Linus Nielsen Feltzing (a colleague and friend at Haxx) strikes back with a much improved and almost completely revamped HTTP pipelining support (merged into master just hours before the new-feature window closed for the pending 7.30.0 release). This time, the implementation features and provides:

  • a configurable number of connections and pipelines to each unique host name
  • a round-robin approach that favors starting new connections first, and then pipeline on existing connections once the maximum number of connections to the host is reached
  • a max-depth value that when filled makes the code not add any more requests on that connection/pipeline
  • a pipe penalization system that avoids adding new requests to pipes that are known to be receiving very large contents and thus possibly would stall subsequent requests for an extended period of time
  • a server blacklist that allows the application to specify a list of servers for which HTTP pipelining should not be attempted – real world tests has proven that some servers are too broken to be allowed to play the game

The code also adds a feature that helps out applications to do massive amount of requests in a controlled manner:

A hard maximum amount of connections, that when reached makes libcurl queue up easy handles internally until they can create a new connection or re-use a previously used one. That allows an application to for example set the limit to 50 and then add 400 handles to the multi handle but it will still only use 50 connections as a maximum so over time when requests get completed it will start new transfers on the requests that are waiting in line and thus shrinking the queue and keeping the maximum amount of connections until there’s less than 50 left to do…

Previously that kind of queuing had to be done by the application itself, but now with the much more extensive pipelining support it really isn’t as easy for an application to know when the new request can get pipelined or create a new connection so this logic is now provided by libcurl itself. It is likely going to be appreciated and used also by non-pipelining applications…

This implementation is accompanied with a bunch of new test cases for libcurl (and even a new HTTP test server for the purpose), and it has been tested in the wild for a while with libcurl as the engine in a web browser implementation (the company doing that has requested to remain anonymous). We believe it is in fairly decent state, but as this is a large step and the first release it is shipped with I expect there to be some hiccups along the way.

Two things to take note of:

  1. pipelining is only available for users of libcurl’s multi interface, and only if explicitly enabled with CURLMOPT_PIPELINING
  2. the curl command line tool does not use the multi interface and thus it will not use pipelining

Why no curl 8

no 8In this little piece I’ll explain why there won’t be any version 8 of curl and libcurl in a long time. I won’t rule out that it might happen at some point in the future. Just that it won’t happen anytime soon and explain the reasons why.

Seven point twenty nine, really?

We’ve done 29 minor releases and many more patch releases since version seven was born, on August 7 2000. We did in fact bump the ABI number a couple of times so we had the chance of bumping the version number as well, but we didn’t take the chance back then and these days we have a much harder commitment and determinism to not break the ABI.

There’s really no particular downside with having a minor version 29. Given our current speed and minor versioning rules, we’ll bump it 4-6 times/year and we won’t have any practical problems until we reach 256. (This particular detail is because we provide the version number info with the API using 8 bits per major, minor and patch field and 8 bits can as you know only hold values up to 255.) Assuming we bump minor number 6 times per year, we’ll reach the problematic limit in about 37 years in the fine year 2050. Possibly we’ll find a reason to bump to version 8 before that.

Prepare yourself for seven point an-increasingly-higher-number for a number of years coming up!

Is bumping the ABI number that bad?

Yes!

We have a compatibility within the ABI number so that a later version always work with a program built to use the older version. We have several hundred million users. That means an awful lot of programs are built to use this particular ABI number. Changing the number has a ripple effect so that at some point in time a new version has to replace all the old ones and applications need to be rebuilt – and at worst also possibly have to be rewritten in parts to handle the ABI/API changes. The amount of work done “out there” on hundreds or thousands of applications for a single little libcurl tweak can be enormous. The last time we bumped the ABI, we got a serious amount of harsh words and critical feedback and since then we’ve gotten many more users!

Don’t sensible systems handle multiple library versions?

Yes in theory they do, but in practice they don’t.

If you build applications they have the ABI number stored for which lib to use, so if you just keep the different versions of the libraries installed in the file system you’ll be fine. Then the older applications will keep using the old version and the ones you rebuild will be made to use the new version. Everything is fine and dandy and over time all rebuilt applications will use the latest ABI and you can delete the older version from the system.

In reality, libraries are provided by distributions or OS vendors and they ship applications that link to a specific version of the underlying libraries. These distributions only want one version of the lib, so when an ABI bump is made all the applications that use the lib will be rebuilt and have to be updated.

Most importantly, there’s no pressing need!

If we would find ourselves cornered without ability to continue development without a bump then of course we would take the pain it involves. But as things are right now, we have a few things we don’t really like with the current API and ABI but in general it works fine and there’s no major downsides or great pains involved. We simply do not have any particularly good reason to bump version number or ABI version. Things work pretty good with the current way.

The future is of course unknown and at some point we’ll face a true limitation in the API that we need to bridge over with a bump, but it can also take a long while until we hit that snag.

Update April 6th: this article has been read by many and I’ve read a lot of comments and some misunderstandings about it. Here’s some additional clarifications:

  1. this isn’t stuff we’ve suddenly realized now. This is truths and facts we’ve learned over a long time and this post just makes it more widely available and easier to find. We already worked with this knowledge. I decided to blog about it since it struck me we didn’t have it documented anywhere.
  2. not doing version 8 (in a long time) does not mean we’re done or that the pace of development slows down. We keep doing releases bimonthly and we keep doing an average of 30 something bugfixes in each release.

some missing github features

github-social-coding

I think github is a lovely resource for collaborating on source code with my friends all over the globe. Among other things, we host the primary curl repository there and we’ve been doing so for almost three years now. This experience has led me to discover a bunch of things I miss in the service…

github is clearly aimed at repositories run by one person or a small set of persons, while in the projects I run I try to involve as many as possible in wide collaboration and I put efforts into informing everyone to get the widest possible attention and feedback. I may have created the account and “own” the repository, but I want the work to be done by a large team and I want everything that happens to it to be seen by a large audience. This is not always possible to do easily with the existing github services.

To further this spirit and to widen cooperation more, I would like to see the following improvements:

  • pull requests can’t be disabled nor can i control to which email address to send the notification. In our project I want all patches posted to the mailing list for review, archiving and discussions before I get a pull request, and I don’t use GitHub’s merge feature since it is hardly ever good enough (I want fast-forward and I usually feel a need to edit the commit message ever so slightly etc). I want the pull request to get translated into a patch review submission to the mailing list.
  • similarly, I cannot redirect where notifications are sent when someone comments a commit or a source line and this is highly annoying since we merge a lot of outsiders’ patches etc and as they may still read the mailing list we want the discussion there! Many times the contributors don’t have GitHub accounts and of course we don’t want to require that.
  • after the death of the CIA.vc service, the current IRC notification service offered by GitHub is nothing but inferior. The stupid bot has to join, tell its message and leave again. It is not an IRC friendly behavior and I can’t make it announce exactly what I’d like it to say…
  • I wish it had much better email notification on commits that would allow me to customize what it sends out without forcing me to write a full blown replacement. I want a unified diff included!

I realize GitHub has features that offer me to create an “organization” to host a repository instead of it being owned by me as a person, but I don’t think that should be a requirement to get this functionality. And I don’t know if GitHub truly offers better group functionality then either.

Now 1K+ awesome contributors

On March 20 1998 the first curl release shipped – but as it was built on a previous project there was already a handful of contributors. It was then just a modest little project with but a few thousand lines of code in total.

As time has passed, the project has grown and development has been going on at a rapid speed in a never-ending cycle of releases and bug fixes. The first couple of years I didn’t keep track of every single contributor properly, but one day in 2005 I decided to go back and try to collect the names of all the helpers so far. Names that had been mentioned in the changelogs and comments etc. When looking at our history, it is therefore not really sensible to look at the numbers before that cut-off date as I didn’t keep the count and logs properly before then.

All since then, I make an effort in properly giving credit to all contributors: patch producers, bug reporters, documentation writers as well as people “just” providing good advice. I try to mention all contributors. To give credit where credit is due. In a volunteer driven project, that is after all the best compensation I can offer.

Looking back over the years it seems the pace of newcomers have been quite steady. The 437 names in 2005 have grown to 1005 in less than eight years. Roughly 70 new contributors every year or 6 per month.

Now, in February 2013 with the release of 7.29.0, we surpassed the magic 1000 named contributors limit in the project. 1000 contributors in about 5437 days. On average one new soul every 5th day over a period of almost 15 years. Fascinatingly enough, even if I count a more recent period like the last 6 years the pace is only a little faster with one new just a little faster than every 5 days!

I originally planned to present the data as a graph in this post, but since the development has been so extremely linear it turned out so boring I scrapped it! Instead I’ll show you the top-10 most common first names among curl contributors:

  1. 23 David
  2. 14 Peter
  3. 14 John
  4. 13 Michael
  5. 11 Eric
  6. 10 Mark
  7. 9 Tom
  8. 9 Tim
  9. 9 Daniel
  10. 9 Chris

The names that didn’t quite make it and that all exist 8 times in the THANKS file are Steve, Robert, Richard, James, Dan, Christian, Andrew and Andreas. As I’ve mentioned before: not that many females in there.

(Yes, I’ve ignored that possibly some Daniels go by the name of Dan, some Chrises might be Christian and so on, I’ve only counted the actual names people have used.)

Why the latest security vulnerability in curl happened

In the end of January 2013 we got a fresh security vulnerability pointed out to us in the curl project (it was publicly announced on Feb 6). Another buffer overflow. This time in the SASL Digest-MD5 handling for POP3, IMAP and SMTP. It is the 16th security flaw during curl’s life-time of almost 15 years so it isn’t a disaster but still of course it is never fun when it happens. I put a lot of my own effort and pride into this project so every time something like this floats to the surface my pride and self-esteem get damaged a bit.

Everyone who’s concerned about open source and security and foremost in a reliable and secure libcurl of course now wonders: how did this happen? How could this piece of security problem get into libcurl and what are we doing to make sure it doesn’t happen again?

Let me tell you the story. It is not as interesting nor full of conspiracies as you’d like. It is instead rather dull and boring but nevertheless the truth.

I’m the lead developer and maintainer of curl and libcurl. I personally have done some 65% of all commits in the project and I do the majority of all code reviews on the mailing list. Our code might be used by some 500 million users, but the number of regulars that can be considered the “core team” can still basically be counted on a single hand. Also, we all do this primarily on our spare time.

During intense development periods we get flooded by bug reports and patch submissions and my backlog grows. It’s really not possible to foresee when these periods come, but occasionally it seems the planets align in this way and work piles up.

In order to then proceed the best way in the project, I try to focus on the architectural and “deep” matters that need me and my particular knowledge most. I then try to leave the “easy” problems that are easier to work on to others, and I try to stay away from the issues that already seem to be under control by some of the existing regulars in the project. I also have to let other “elders” in the project push things with slightly less scrutiny just to be able to plow through the work better. Unfortunately this leads to the occasional flaw getting through and in this case it was even a security vulnerability that when you look back on the code you really cannot understand how we could miss this.

We do take security seriously though and we make a big effort on handling all security reports swiftly and accurately. Even if this was the 16th time we let our guard down, I want to think that we at least react responsibly and in a good way when we realize our mistakes.

Please don’t judge us due to this. Please instead consider joining us and help us review code and help us find the next flaw before we merge it into mainline or at least before we do a public release with the code!

sasl-patch