Category Archives: Open Source

Open Source, Free Software, and similar

shorter HTTP requests for curl

Starting in curl 7.26.0 (due to be released at the end of May 2012), we will shrink the User-agent: header that curl sends by default in HTTP(S) requests to something much shorter! I suspect that this will raise some eyebrows out there so even though I’ve emailed about it to the curl-users list before I thought I’d better write it up and elaborate.

A default ‘curl localhost’ on Debian Linux makes 170 bytes get sent in that single request:

GET / HTTP/1.1
User-Agent: curl/7.24.0 (i486-pc-linux-gnu) libcurl/7.24.0 OpenSSL/1.0.0g zlib/1.2.6 libidn/1.23 libssh2/1.2.8 librtmp/2.3
Host: localhost
Accept: */*

As you can see, the user-agent description takes up a large portion of that request, and this for really no good reason at all. Without sacrificing any functionality I shrunk the same request down to 71 bytes:

GET / HTTP/1.1
User-Agent: curl/7.24.0
Host: localhost
Accept: */*

That means we shrunk it down to 41% of the original size. I’ll admit the example is a bit extreme and most other normal use cases will use longer host names and longer paths, but even for a URL like “https://daniel.haxx.se/docs/curl-vs-wget.html” we’re down to 50% of the original request size (100 vs 199).

Can we shrink it even more? Sure, we could leave out the version number too. I left it in there now only to allow some kind of statistics to get extracted. We can’t remove the entire header, we need to include a user-agent in requests since there are too many servers who won’t function properly otherwise.

And before anyone asks: this change is only for the curl command line tool and not for libcurl, the library. libcurl does in fact not send any user-agent at all by default…

NFS has many meanings

Today I learned that Need for speed World (I first had to google what “NFS-world” actually means) uses curl when I received this email:

From: [removed]
Subject: NFS-world

I can not go into the game for 4 months my nickname “[removed]”. it writes the error “Login failed, please try again.” Please solve this problem. Support Group does not help.

But no, I don’t know why this guy emailed me…

I then went on to look for other Electronic Arts games using libcurl, and I fell over these forum posts that clearly indicate Game Face uses it, but I found no credits or other information page online.

Can you find any other?

Linux kernel code on TV

In one of the fast-moving early scenes in episode 16 of Person of Interest at roughly 2:05 into the thing I caught this snapshot:

person of interest s01e16

(click the image to see a slightly bigger version)

It is only in sight for a fraction of a second. What is seen in the very narrow terminal screen on the right is source code scrolling by. Which source code you say? Take a look again. That my friends is kernel/groups.c from around line 30 in a recent Linux kernel. I bet that source file never had so many viewers before, although perhaps not that many actually appreciated this insight! 😉

And before anyone asks: no, there’s absolutely no point or relevance in showing this source code in this section. It is just a way for the guys to look techy. And to be fair, in my mind kernel code is fairly techy!

No summer of Rockbox 2012

For the first summer in many years I’m not doing any admin or mentor work for an organization for Google’s Summer of Code program this year.

I’ve been mentoring, co-mentoring and admined within the Rockbox project the last… 4-5(?) summers and as a result I now have a good collection of t-shirts. 🙂 This year, the project sadly came to the conclusion that there was not a good enough number of mentors and projects ideas gathered for it to apply to become a mentor organization.

Taking care of a student for full-time work during many weeks is not something to take lightly. To do it properly you need a dedicated and qualified mentor. To provide a good starting point for students to figure out and come up with a good project proposal you need an really good and detailed list of ideas.

The gsoc task is hard enough as it is with many mentors and many good ideas, so when there’s a sign of us not being able to fill up both lists we thought it better not to waste anyone’s’ time or energy. We also value and treasure Google’s very fine help with open source over the years thanks to gsoc, and we would hate to end up looking like we try to just take advantage of our role of having been accepted as mentor organization for many years in a row in the past.

In the other end, I was very happy to see that my friends in the metalink project finally after having applied many years got accepted as a mentor organization. I’d like to think that perhaps we (as in the Rockbox project) by standing back this year can let others get the chance to shine and join in the fun.

There is nothing said or planned for Rockbox for next year. If people want to mentor and if we manage to get a good pile of ideas I’m sure we will apply to be a mentor organization again. If not, well then I’m sure other organizations will still participate in the program and possibly I will find myself involved in there via another project. I am involved in a bunch of other open source projects, but none of the ones I’m very active in have applied nor participated as mentor org in gsoc so far.

Travel for fun or profit

As a protocol geek I love working in my open source projects curl, libssh2, c-ares and spindly. I also participate in a few related IETF working groups around these protocols, and perhaps primarily I enjoy the HTTPbis crowd.

Meanwhile, I’m a consultant during the day and most of my projects and assignments involve embedded systems and primarily embedded Linux. The protocol part of my life tends to be left to get practiced during my “copious” amount of spare time – you know that time after your work, after you’ve spent time with your family and played with your kids and done the things you need to do at home to keep the household in a decent shape. That time when the rest of the family has gone to bed and you should too but if you did when would you ever get time to do that fun things you really want to do?

IETF has these great gatherings every now and then and they’re awesome places to just drown in protocol mumbo jumbo for several days. They’re being hosted by various cities all over the world so often I deem them too far away or too awkward to go to, also a lot because I rarely have any direct monetary gain or compensation for going but rather I’d have to do it as a vacation and pay for it myself.

IETF 83 is going to be held in Paris during March 25-30 and it is close enough for me to want to go and HTTPbis and a few other interesting work groups are having scheduled meetings. I really considered going, at least to meet up with HTTP friends.

Something very rare instead happened that prevents me from going there! My customer (for whom I work full-time since about six months and shall remain nameless for now) asked me to join their team and go visit the large embedded conference ESC in San Jose, California in the exact same week! It really wasn’ t a hard choice for me, since this is my job and being asked to do something because I’m wanted is a nice feeling and position – and they’re paying me to go there. It will also be my first time in California even though I guess I won’t get time to actually see much of it.

I hope to write a follow-up post later on about what I’m currently working with, once it has gone public.

The updated web scraping howto

webbots-spiders-and-screen-scrapers

Web scraping is a practice that is basically as old as the web. The desire to extract contents or to machine- generate things from what perhaps was primarily intended to be presented to a browser and to humans pops up all the time.

When I first created the first tool that would later turn into curl back in 1997, it was for the purpose of scraping. When I added more protocols beyond the initial HTTP support it too was to extend its abilities to “scrape” contents for me.

I’ve not (yet!) met Michael Schrenk in person, although I’ve communicated with him back and forth over the years and back in 2007 I got a copy of his book Webbots, Spiders and Screen Scrapers in its 1st edition. Already then I liked it to the extent that I posted this positive little review on the curl-and-php mailing list saying:

this book is a rare exception and previously unmatched to my knowledge in how it covers PHP/CURL. It explains to great details on how to write web clients using PHP/CURL, what pitfalls there are, how to make your code behave well and much more.

Fast-forward to the year 2011. I was contacted by Mike and his publisher at Nostarch, and I was asked to review the book with special regards to protocol facts and curl usage. I didn’t hesitate but gladly accepted as I liked the first edition already and I believe an updated version could be useful to people.

Now, in the early 2012 Mike’s efforts have turned out into a finished second edition of his book. With updated contents and a couple of new chapters, it is refreshed and extended. The web has changed since 2007 and so has this book! I hope that my contributions didn’t only annoy Mike but possibly I helped a little bit to make it even more accurate than the original version. If you find technical or factual errors in this edition, don’t feel shy to tell me (and Mike of course) about them!

The first month of Spindly

Let me entertain you with some info and updates from the Spindly project. (Unfortunately we don’t have any logo yet so I don’t get to show it off here.)

Since I announced my intention to proceed and write the SPDY library on my own instead of waiting for libspdy to get back to life, I have worked on a number of infrastructure details.

I converted the build to use autotools and libtool to help us really make it a portable library. I made all test cases run without memory leaks and this took some amount of changes of libspdy since it was clearly not written with carefully checking memory and there were also a lot of unnecessarily small mallocs(). Anyone who does malloc() of 8 bytes should reconsider what they’re doing.

Since I’ve had to bugfix the libspdy so much, change structs and APIs and add new functions that were missing I decided that there’s no point in us trying to keep the original libspdy code or code style intact anymore so I’ve re-indented the whole code base to a style I like better than the original style.

I’ve started to write the fundamentals of a client and server demo application that is meant to use the Spindly API to implement both sides. They don’t really do much yet but the basics are in place. I’ve worked more on my idea of what the spindly API should look like. I’ve written the code for a few functions from that API and I’ve also added a few tests for them.

Most of this work has been made by me and me alone with no particular feedback or help by others. I continue to push my changes to github without delay and I occasionally announce stuff on the mailing list to keep interested people up to date. Hopefully this will lead to someone else joining in sooner or later.

The progress has not been very fast, not only because I’ve had to do a lot of thinking about how the API should ideally work to be really useful, but also because I have quite a lot of commitments in other open source projects (primarily curl and libssh2) that require their amount of time, not to mention that my day job of course needs proper attention.

We offer a daily snapshot of the code if you can’t use or don’t want to use git.

Upcoming

I intend to add more functions from the API document, one by one and test cases for each as I go along. In parallel I hope to get the demo client and server to run so that the API proves to actually work properly.

I want the demo client and server also to allow them to run interop tests against other implementations and I want them to be able to speak SPDY with SSL switched off – for debugging reasons. Later on, I hope to be able to use the demo server in the curl test suite so that I can test that the curl SPDY integration works correctly.

We need to either fix “check” (the unit test suite) to work C89 compatible or replace it with something else.

Want to help?

If you want to help, please subscribe to the mailing list, get familiar with the code base, study the API doc and see if it makes sense to you and then help me get that API turned into code…

Sloppily using SSL_OP_ALL

This story begins with a security flaw in OpenSSL. OpenSSL is truly a fundamental piece of software these days and I would go so far and say that lots of our critical infrastructure today is using it and needs it. Flaws in OpenSSL literally affect entire societies or at least risk doing so if the flaws can be exploited.

SSL/TLS is a rather old and well used protocol with many different implementations, both client and server side. In order to enhance how OpenSSL works with older SSL implementations or just those that have different views on how to implement things, OpenSSL provides an API call to tweak behaviors. The SSL_CTX_set_options function. In the curl project we’ve found good use of it for this purpose, and we use the generic define SSL_OP_ALL to switch on all “rather harmless” workarounds that OpenSSL offers. Rather harmless, that’s what the comment in the header file says.

Ok, enough background and dancing around the issue. The flaw that ignited my idea to write this blog post was a particular mistake made within SSL a long time ago within the code handling SSL 3.0 and TLS 1.0 protocols when speaking this protocol with a peer that could select the plain-text (see this explanation) – the problem is a generic one with the protocol so different SSL libraries would approach it differently. Ok, so OpenSSL fixed the flaw back in the days of 0.9.6d (we’re talking May 9th 2002). As a user of a library such as OpenSSL it always feels good to see them being on top of security problems and releasing fixes. It makes you feel that you’re being looked after to some extent.

Shortly thereafter, the OpenSSL developers discovered that some broken server implementations didn’t work with the work-around they had done…

Alas, on July 30th 2002 the OpenSSL team released version 0.9.6e which offered a way for programs to disable this particular work-around. By switching this off, it would of course make the protocol less secure again but it would inter-operate better with faulty servers. How do you switch off this security measure? By using the SSL_CTX_set_options function setting the bit SSL_OP_DONT_INSERT_EMPTY_FRAGMENTS.

Ok, so far so good. But the next step is what changed everything from fine to not so fine anymore: they then added that new bit to the SSL_OP_ALL define.

Yes. In one blow every single application out there that use SSL_OP_ALL suddently started switching off this security measure as soon as they were recompiled against this version of OpenSSL. This change was made in 2002 and this is still like this today. It fixed the security problem from OpenSSL’s aspect, but the way the bit was later added to the SSL_OP_ALL define it was instead transferred to affect many programs.

In curl’s case, we were alerted about this flaw on January 19th 2012 and it resulted in a security advisory. I did a quick search for SSL_OP_ALL on koders.com and it is obvious that there are hundreds of programs out there still using this bitmask as-is. In the curl project we enabled the SSL_OP_ALL approach for the first time in the 7.10.6 release we did in July 2003. It was wrong already at the time we started using it. It turns out we’ve been enabling this flaw for almost nine years.

In the GnuTLS camp however, they simply stopped doing their work-around for this as soon as they started supporting TLS 1.1 due to the problems the work-around caused to some servers. This since TLS 1.1 isn’t vulnerable to the problem. OpenSSL 1.0.1 beta was released on Janurary 3 2012 and is the first OpenSSL version ever released to support TLS newer than 1.0… The browsers/NSS seem to have mitigated this problem in a different way and there’s a patch available for OpenSSL to implement the same work-around but there’s been no feedback on how or if it will be used.

News in curl 7.24.0

We continue doing curl releases roughly bi-monthly. This time we strike back with a release holding a few interesting new things that I thought are worth highlighting a little extra!

The most important and most depressing news about this release is the two security problems that were fixed. Never before have we released two security advisories for the same release.

Security fixes

The “curl URL sanitization vulnerability” is about how curl trusts user provided URL strings a little too much. Providing sneakily crafted URLs with embeded url-encoded carriage returns and line feeds users could trick curl to do un-intended actions when POP3, SMTP or IMAP protocols were used.

The “curl SSL CBC IV vulnerability” is about how curl inadvertently disables a security measurement in OpenSSL and thus weakens the security for some aspects of SSL 3.0 and TLS 1.0 connections.

Changes

We have a bunch of new changes added to curl and libcurl that some users might like:

  • curl has this ability to run a set of “extra commands” for a couple of protocols when doing a transfer – we call them “quote” operations. A while ago we introduced a way to mark commands within a series of quote commands as not being important if they fail and that the rest of the commands should be sent anyway. We mark such commands with a ‘*’-prefix. Starting now, we support that ‘*’-prefix for SFTP operations as well!
  • CURLOPT_DNS_SERVERS is a brand new option that allows programs to set which DNS server(s) libcurl should use to resolve host names. This function only works if libcurl was built to use a resolver backend that allows it to change DNS servers. That currently means nothing else but c-ares.
  • Now supports nettle for crypto functions. libcurl has long been supporting both OpenSSL and gcrypt backends for some of the crypto functions libcurl supports. The gcrypt made perfect sense when libcurl was built to use GnuTLS built to use gcrypt, but since GnuTLS recently has changed to using nettle by default the newly added support to use nettle with remove the need for an extra crypto link being linked for some users.
  • CURLOPT_INTERFACE was modified to allow “magic prefixes” for the application to tell that it uses an interface and not a host name and vice versa. The previous way would always test for both, which could lead to accidental (and slow) name resolves when the interface name isn’t currently present etc.
  • Active FTP sessions with the multi interface are now done much more non-blocking than before. Previously the multi interface would block while waiting for the server to connect back but it no longer does. A new option called CURLOPT_ACCEPTTIMEOUT_MS was added to allow programs to set how long libcurl should wait for accepting the server getting back.
  • Coming in from the Debian packaging guys, the configure script how features a new option called –enable-versioned-symbols that does exactly what it is called: it enables versioned symbols in the output libcurl.

Join the SPDY library development

Back in October I posted about my intentions to work on getting curl support for SPDY to be based on libspdy. I also got in touch with Thomas, the primary author of libspdy and owner of libspdy.org.

Unfortunately, he was ill already then and he was ill when I communicated with him what I wanted to see happen and I also posted a patch etc to him. He mentioned to me (in a private email) a lot of work they’ve done on the code in a private branch and he invited me to get access to that code to speed up development and allow me to use their code.

I never got any response on my eager “yes, please let me in!” mail and I’ve since mailed him twice over the period of the latest months and as there have been no responses I’ve decided to slowly ramp up my activities on my side while hoping he will soon get back.

I’ve started today by setting up the spdy-library mailing list. I hope to attract fellow interested hackers to join me on this. The goal is quite simply to make a libspdy that works for us. It is to be C89 code that is portable with an API that “makes sense”. I don’t know yet if we will work on libspdy as it currently looks, if Thomas’ team will push their updated work soon or if going with my current spindly fork off github is the way. I hope to get help to decide this!

Join the effort by simply adding yourself the mailing list and participate in the discussions: http://cool.haxx.se/cgi-bin/mailman/listinfo/spdy-library.

And a wiki on github.

Update: I’ve created a hub collecting all related info and pointers over at spindly.haxx.se.

Welcome!