The cookie RFC 6265

http://www.rfc-editor.org/rfc/rfc6265.txt is out!

Back when I was a HTTP rookie in the late 90s, I once expected that there was this fine RFC document somewhere describing how to do HTTP cookies. I was wrong. A lot of others have missed that document too, both before and after my initial search.

I was wrong in the sense that sure there were RFCs for cookies. There were even two of them (RFC2109 and RFC2965)! The only sad thing was however that both of them were totally pointless as in effect nobody (servers nor clients) implemented cookies like that so they documented idealistic protocols that didn’t exist in the real world. This sad state has made people fall into cookie problems all the way into modern days when they’ve implemented services according to those RFCs and then blame their browser for failing.

cookie

It turned out that the only document that existed that were being used, was the original Netscape cookie document. It can’t even be called a specification because it is so short and is so lacking in details that it leaves large holes open and forces implementers to guess about the missing pieces. A sweet irony in itself is the fact that even Netscape removed the document from their site so the only place to find this document is at archive.org or copies like the one I link to above at the curl.haxx.se site. (For some further and more detailed reading about the history of cookies and a bunch of the flaws in the protocol/design, I recommend Michal Zalewski’s excellent blog post HTTP cookies, or how not to design protocols.)

While HTTP was increasing in popularity as a protocol during the 00s and still is, and more and more stuff get done in browsers and everything and everyone are using cookies, the protocol was still not documented anywhere as it was actually used.

Somewhat modeled after the httpbis working group (which is working on updating and bugfixing the HTTP 1.1 spec), IETF setup a mailing list named httpstate in the early 2009 to start discussing what problems there are with cookies and all related matters. After lively discussions throughout the year, the working group with the same name as the mailinglist was founded at December 11th 2009.

One of the initial sparks to get the httpstate group going came from Bill Corry who said this about the start:

In late 2008, Jim Manico and I connected to create a specification for
HTTPOnly — we saw the security issues arising from how the browser vendors
were implementing HTTPOnly in varying ways[1] due to a lack of a specification
and formed an ad-hoc working group to tackle the issue[2].
When I approached the IETF about forming a charter for an official working
group, I was told that I was <quote> “wasting my time” because cookies itself
did not have a proper specification, so it didn’t make sense to work on a spec
for HTTPOnly.  Soon after, we pursued reopening the IETF httpstate Working
Group to tackle the entire cookie spec, not just HTTPOnly.  Eventually Adam
Barth would become editor and Jeff Hodges our chair.

In late 2008, Jim Manico and I connected to create a specification for HTTPOnly — we saw the security issues arising from how the browser vendors were implementing HTTPOnly in varying ways[1] due to a lack of a specification and formed an ad-hoc working group to tackle the issue[2].

When I approached the IETF about forming a charter for an official working group, I was told that I was <quote> “wasting my time” because cookies itself did not have a proper specification, so it didn’t make sense to work on a spec for HTTPOnly.  Soon after, we pursued reopening the IETF httpstate Working Group to tackle the entire cookie spec, not just HTTPOnly. Eventually Adam Barth would become editor and Jeff Hodges our chair.

Since then Adam Barth has worked fiercely as author of the specification and lots of people have joined in and contributed their views, comments and experiences, and we have over time really nailed down how cookies work in the wild today. The current spec now actually describes how to send and receive cookies, the way it is done by existing browsers and clients. Of course, parts of this new spec say things I don’t think it should, like how it deals with the order of cookies in headers, but as everything in life we needed to compromise and I seemed to be rather lonely on my side of that “fence”.
I must stress that the work has only involved to document how things work today and not to invent or create anything new. We don’t fix any of the many known problems with cookies, but we describe how you write your protocol implementation if you want to interact fine with existing infrastructure.

The new spec explicitly obsoletes the older RFC2965, but doesn’t obsolete RFC2109. That was done already by RFC2965. (I updated this paragraph after my initial post.)

Oh, and yours truly is mentioned in the ending “acknowledgements” section. It’s actually the second RFC I get to be mentioned in, the first being RFC5854.

Future

I am convinced that I will get reason to get back to the cookie topic soon and describe what is being worked on for the future. Once the existing cookies have been documented, there’s a desire among people to design something that overcomes the problems with the existing protocol. Adam’s CAKE proposal being one of the attempts and ideas in the pipe.

Another parallel IETF effort is the http-auth mailing list in which lots of discussions around HTTP authentication is being held, and as they often today involve cookies there’s a lot of talk about them there as well. See for example Timothy D. Morgan’s document Weaning the Web off of Session Cookies.

I’ll certainly track the development. And possibly even participate in shaping how this will go. We’ll see.

(cookie image source)

libcurl’s name resolving

Recently we’ve put in some efforts into remodeling libcurl’s code that handles name resolves, and then in particular the two asynchronous name resolver backends that we support: c-ares and threaded.

Name resolving in general in libcurl

libcurl can be built to do name resolves using different means. The primary difference between them is that they are either synchronous or asynchronous. The synchronous way makes the operation block during name resolves and there’s no “decent” way to abort the resolves if they take longer time than the program wants to allow it (other than using signals and that’s not what we consider a decent way).

Asynch resolving in libcurl

This is done using one of two ways: by building libcurl with c-ares support or by building libcurl and tell it to use threads to solve the problem. libcurl can be built using either mechanism on just about all platforms, but on Windows the build defaults to using the threaded resolver.

The c-ares solution

c-ares’ primary benefit is that it is an asynchronous name resolver library so it can do name resolves without blocking without requiring a new thread. It makes it use less resources and remain a perfect choice even if you’d scale up your application up to and beyond an insane number of simultaneous connections. Its primary drawback is that since it isn’t based on the system default name resolver functions, they don’t work exactly like the system name resolver functions and that causes trouble at times.

The threaded solution

By making sure the system functions are still used, this makes name resolving work exactly as with the synchronous solution, but thanks to the threading it doesn’t block. The downside here is of course that it uses a new thread for every name resolve, which in some cases can become quite a large number and of course creating and killing threads at a high rate is much more costly than sticking with the single thread.

Pluggable

Now we’ve made sure that we have an internal API that both our asynchronous name resolvers implement, and all code internally use this API. It makes the code a lot cleaner than the previous #ifdef maze for the different approaches, and it has the side-effect that it should allow much easier pluggable backends in case someone would like to make libcurl support another asynchronous name resolver or system.

This is all brand new in the master branch so please try it out and help us polish the initial quirks that may still exist in the code.

There is no current plan to allow this plugging to happen run-time or using any kind of external plugins. I don’t see any particular benefit for us to do that, but it would give us a lot more work and responsibilities.

cURL

HTTP transfer compression

HTTP is a protocol that looks simple in its simplest form and its readability can easily fool you into believing an implementation is straight forward and quickly done.

That’s not the reality though. HTTP is a very big protocol with lots of corners and twisting mazes that one can get lost in. Even after having been the primary author of curl for 13+ years, there are still lots of HTTP things I don’t master.

To name an example of an area with little known quirks, there’s a funny situation when it comes to how HTTP supports and doesn’t support compression of data and compression of data in transfer.

No header compression

A little flaw in HTTP in regards to compression is that there’s no way to compress headers, in either direction. No matter what we do, we must send the text as-is and both requests and responses are sometimes very big these days. Especially taken into account how cookies are always inserted in requests if they match. Anyway, this flaw is nothing we can do anything about in HTTP 1.1 so we need to live with it.

On the other side, compression of the response body is supported.

Compressing data

Compression of data can be done in two ways: either the actual transfer is compressed or the body data is compressed. The difference is subtle, but when the body data is compressed there’s really nothing that mandates that the client has to uncompress it for the end user, and if the transfer is compressed the receiver must uncompress it in order to deal with the transfer properly.

For reasons that are unknown to me, HTTP clients and servers started out supporting compression only using the Content-Encoding style. It means that the client tells the server what kind of content encodings it supports (using Accept-Encoding:) and the server then sends the response data using one of the supported encodings. The client then decides on its own that if it gets the content in one of the compressed formats that it said it can handle, it will automatically uncompress that on arrival.

The HTTP protocol designers however intended this kind of automatic compression and subsequent uncompress to be done using Transfer-Encoding, as the end result is the completely transparent and the uncompress action is implied and intended by the protocol design. This is done by the client telling the server what transfer encodings it supports with the TE: header and the server adds a Transfer-Encoding: header in the response telling how the transfer is encoded.

HTTP 1.1 introduced a mandatory encoding that all servers can use whenever they feel like it: chunked encoding, so all HTTP 1.1 clients already have to deal with Transfer-Encoding to some degree.

Surely curl is better than all those other guys, right?

Not really. Not yet anyway.

curl has a long history of copying its behavior from what the browsers do, in order to allow users to basically script anything imaginable that is HTTP-like with curl. In this vein, we implemented compression support the same way as all the browsers did it: the content encoding style. (I have reason to believe that at least Opera actually supports or used to support compressed Transfer-Encoding.)

Starting now (code pushed to git repo just after the 7.21.5 release), we’ve taken steps to improve things. We’re changing gears and we’re introducing support for asking for and using compressed Transfer-Encoding. This will start out as an optional feature/flag (–tr-encoding / CURLOPT_TRANSFER_ENCODING) so that we can start out and see how servers in the wild behave and that we can deal with them properly. Then possibly we can switch the default in the future to always ask for compressed transfers. At least for the command line tool.

We know from the little tests we are aware of, that there are at least one known little problem or shall we call it a little detail to keep on eye at, with introducing compressed Transfer-Encoding. As has been so fine reported several years ago in the opera blog Browser sniffing gone wrong (again): Cars.com, there are cases where this may cause the server to send data that gets compressed twice (using both Content and Transfer Encoding) and that needs to be taken care of properly by the client.

At the time of this writing, I’ve not yet taken care of the double-compress case in the code, but I intend to get on to it within shortly.

I’m otherwise very interested in hearing what kind of experience people will have from this. What servers and sites will support this as documented and intended?

Shipping curl 7.21.5

I don’t usually post anything here when we do curl releases, pretty much because we do them bimonthly on a fairly steady schedule so there should be little surprise to anyone interested by the time they get public.

But hey, this is hard work and just to remind you all what’s going on I thought I’d throw in a mention of what we’ve spent the last two months doing. curl and libcurl 7.21.5 is released today.

The five notable changes introduced this time include:

The CURLOPT_SOCKOPTFUNCTION callback can now return information back to libcurl that the socket libcurl operates on is already connected. This is useful for applications that do a lot of fiddling on their own and possibly provide its own socket to start with using the CURLOPT_OPENSOCKETFUNCTION.

curl the tool got support for the –netrc-file option, that allows a user to point out a specific .netrc file instead of always forcing the user to use the fixed $HOME/.netrc one.

Brand new support for building libcurl with the cyassl library for SSL/TLS support. Previously curl only had support for the older OpenSSL emulation API that cyassl used to provide, but starting now we’re using cyassl directly and it is now a proper SSL citizen among the seven SSL libraries curl supports.

Since the previous release when we shipped the first support for TLS-SRP that required GnuTLS, the OpenSSL project accepted patches that introduced TLS-SRP into their official version as well and accordingly we have received patches that now allow users to use TLS-SRP with libcurl built against (a new enough) OpenSSL as well.

We have started to re-use two error codes a bit differently within libcurl, so that it now can return: CURLE_NOT_BUILT_IN (4) when an application tries to use a feature that was missing or was explicitly disabled at build-time and CURLE_UNKNOWN_OPTION (48) when the application has passed in an option that isn’t known or recognized.

And we’re counting more than 40 bugfixes worth mentioning. The most important ones are possibly:

If using the multi interface doing RTSP, libcurl could crash when trying to re-use a previous connection.

POP3 didn’t do TLS properly, it issued the wrong command to start TLS and it didn’t send the password correctly once it did switch to TLS!

When using the multi interface, there could be times when the timeout didn’t trigger so it wouldn’t close lingering connections even when asked to do so.

SFTP and SFP with the multi_socket interface were not working correctly and would very easily end up with stalled transfers due to the application being told to wait for the wrong action (or none at all).

If told to use the CCC command (which is used with FTP-SSL when the client asks the server to switch off from an SSL connection back to plain TCP again), curl would disable SSL on the connection but then use the wrong socket reader function and crash.

… but of course, if you’ve suffered from a particular bug in a previous release I’m sure you’ll consider the exact bug fix that corrects your problem to be the most important one!

Not to forget, the great people apart from yours truly that have contributed with code and insights since the previous release. Without them, the above list of changes and bugfixes just wouldn’t exist. The friends we have to thank are (in no particular order):

Mike Crowe, Kamil Dudka, Julien Chaffraix, Hoi-Ho Chan, Ben Noordhuis, Dan Fandrich, Henry Ludemann, Karl M, Manuel Massing, Marcus Sundberg, Stefan Krause, Todd A Ouska, Saqib Ali, Andre Guibert de Bruet, Tor Arntsen, Vincent Torri, Dave Reisner, Chris Smowton, Tinus van den Berg, Hongli Lai, Gisle Vanem, Andrei Benea, Mehmet Bozkurt

… and now back to working towards the next release. To be expected in roughly two months. Repeat.

Future transports, the video

The talk I did at FSCONS 2010 titled “Future Transports” has now been made available online and you can see the whole thing. It is split up in three separate video snippets. Click on the picture below to get started:

fscons2010-futuretransports

I originally put the videos embedded here on my blog, but it turned out to be a really certain way to kill Firefox so it turned out to be annoying. Now you’ll instead get handed over to the video on vimeo’s site.

IRC use is declining

I discovered IRC around 1993.

Back then, before EFnet split in two, the IRC channel I frequented was #amiga and we were a small bunch of people from all over the globe who got to know each other pretty good. In the 90s I participated in one of my first open source projects and we created the IRC bot we named Dancer. Dancer was a really talented “defence bot” back in the days of the “wild west” of IRC when channel take overs, flood attacks and nick collisions were widespread and frequently occurring. Dancer helped us keep things calm. Later on, I was part of the team that created and setup the new IRC network called amiganet.

I’ve been using IRC on and off since those days in the early 1990s and still today I hang out on 5-6 channels on freenode every day.

IRC was launched to the world already 1988, almost 23 years ago. I’ve been trying to document the basic history of IRC and when I updated that page the other day with some usage numbers for freenode, I decided to have a look around the net to see if there are any general numbers for IRC usage at large, and I found out that usage is decreasing all over and has been doing so for years. Without research, I figure IRC users are either old farts like myself or at least very tech oriented and geeky. Younger, newer and less techy people use other means of communication.

IRC never “took off” among the general public. In general, I find that general people prefer various IM systems (something that I’ve never understood or adopted myself) and most “ordinary”humans I know don’t even know what IRC is. Possibly, the fact that the IRC protocol never got very good (there’s only that original spec from ’93), that there’s a million completely separated IRC networks with no cross-network messages or that all IRC networks still today suffer from netsplits and other artifacts due deficiencies in how the IRC servers are talking to each other.

5-6 years ago the four most popular networks were all over 100,000 users regularly. Quakenet were well over 200,000. Last year, only Quakenet reached over 100,000. It seems basically all of  them have roughly half the numbers they had 2004.

Graphs from irc.netsplit.de:

2004

IRC usage 2004

2010

IRC usage 2010

Email asking for my products

In my mini-series of strange mails I receive, here’s another one:

Subject: Product Request

Hello,
I am interested in purchasing some of your products, I will like to know
if youcan ship directly to SPAIN , I also want you to know my mode of
payment for this order is via Credit Card. Get back to me if you can ship
to that destination and also if you accept the payment type I indicated.
Kindly return this email with your price list of your products..

I assume I’ll never figure out what products he speaks of, or how on earth he ended up sending me this… I’ll admit I was tempted to make up some “interesting” products to offer.

Update: I was informed that this is probably “just” another online fraud attempt. How boring.

Haxxup – cheap remote backup

The pains and guilty consciences from having a lacking backup concept established are widely common. I honestly don’t know anyone (and I mean it) that can say that they have their (home, private) backup covered with a straight face. We all know we should backup locally and remotely, so that we can do fast recovery for the easy things we mistakenly remove or ruin, and if we get burgled or the house burns down we need to have a backup remotely.

The importance of private computer backups has only increased over time, as these days most of us have vast amounts of family pictures and videos stored as well, things that in the old days were stored (and lost) separately.boom

A growing problem with remote backups is of course that we all have ridiculous amounts of data to backup. Getting a commercial remote backup deal for say 300GB (and growing) isn’t cheap. And we’re also very often at loss when it comes to get a solution that works on Linux.

In Haxx, we also recognized and suffered from these problems. We came up with a scheme to fix a distributed networked backup among ourselves! Getting large hard-drives to use locally is fairly cheap. We all have fairly good fixed-fee no-bandwidth-limit internet connections (although admittedly the uplink speeds are lacking for us typical ADSL users).

We decided that among us 4, each of us gets an account at two of our friends’ servers and we’ll be able to upload our backups to those at our own pace to store whatever we want. We decided on getting two places for everyone to decrease the risk even further, especially if you for example urgently need to get something back and one of us have a network problem (not completely unheard of) or something else.

My current total backup is about 100GB and I have a 1mbit uplink. If I use the entire bandwidth for this, other things get a little sluggish so I’ve capped the rsync job to 90KB/sec… My first run thus completed in roughly 13 days. Luckily I don’t add contents at a very high pace so the ordinary sync jobs from then on should be much smaller and should be able to complete within hours. As long as I add less than ~3.5GB during a 24 hour period, it should be able to keep up to sync to two remote places.

curlyears plus equals one

BirthdaycakeThirteen years ago I released the first version of curl to the world – on March 20 1998. curl is now a teenage project and there’s no slowdown or end in sight.

So what does a project like ours introduce after having existed for so long? The recent year has been full of activities in the project, and here’s a run down with some of the stuff that has been going on:

We switched source code versioning system from CVS to git

gopher support got back into curl

support for RTMP was added

Two additional SSL libraries are now supported: PolarSSL and axTLS, making it a total of seven

70 persons provided code into the git repository, making the THANKS document now list 854 names.

About 1000 commits were made – out of a total of almost 14000 (counted from Dec 1999) so it makes this year slightly under average in terms of commit rate.

6 releases were shipped with a total of 179 bugfixes

1 security flaw was found and fixed

More than 4900 mails were posted to the curl mailing lists

We introduced a unit test system

Over 1400GB of data was downloaded from the curl web site

During the end of this period, 45% of our web visitors used Firefox, 23% used Chrome and less than 20% used IE. 75% were on Windows, 13% on Linux and 10% on Mac.

cURL

First time for Steering Board

Some years ago in the Rockbox project (2008 to be exact), we started the Rockbox Steering Board (RSB). A board with the intention of having a core group that would take final hard decision when consensus was not reached among developers, or when conflicts arise or whatever.

I was voted in as member of the board in the first RSB and I’ve been a member of it since. We have annual elections where we vote for 5 trusted persons to attend the board that potentially will make decisions for the project’s good.

But no real crises turned up. No discussion was so heated it wasn’t handled by the developers on the mailing list or over IRC. No decision was needed by the RSB. And time passed.

In February 2011, the first ever case for RSB was brought to us by a member of the project who felt there was potentially some wrong-doing going on or something that was done was against our established procedure.

The issue itself was not that easy to deal with, and it also quickly showed that all five of us RSB members are busy persons with lots of stuff going on in our own ends so each round of discussions and decision-makings took a really long time. In the end we really had to push ourselves to get a statement together and published before the pending release.

I think we did good in the end and I think we learned a little on how to do it better next time. But let’s hope it’ll take another few years until the RSB is brought out again… Thanks Jens, Marianne, Frank and Björn for a job well done!

Rockbox

curl, open source and networking