Category Archives: Web

web stuff

tailmatching them cookies

A brand new libcurl security advisory was announced on April 12th, which details how libcurl can leak cookies to domains with tailmatch. Let me explain the details.

(Did I mention that security is hard?)

cookiecurl first implemented cookie support way back in the early days in the late 90s. I participated in the IETF work that much later documented how cookies work in real life. I know how cookies work, and yet this flaw still existed in the curl cookie implementation for over 13 years. Until someone spotted it. And once again that sense of gaaaah, how come we never saw this before!! came over me.

A quick cookie 101

When cookies are used over HTTP, it is (if we simplify things a little) only a name = value pair that is set to be valid a certain domain and a path. But the path is only specifying the prefix, and the domain only specifies the tail part. This means that a site can set a cookie that is for the entire site that is under the path /members so that it will be sent by the brower even for /members/names/ as well as for /members/profile/me etc. The cookie will then not be sent to the same domain for pages under a different path, such as /logout or similar.

A domain for a cookie can set to be valid for example.org and then it will be sent by the browser also for www.example.org and www.sub.example.org but not at all for example.com or badexample.org.

Unless of course you have a bug in the cookie tailmatching function. The bug libcurl had until 7.30.0 was released made it send cookies for the domain example.org also to sites that would have the same tail but a different prefix. Like badexample.org.

Let me try a story on you

It might not be obvious at first glance how terrible this can become to users. Let me take you through an imaginary story, backed up by some facts:

Imagine that there’s a known web site out there on the internet that provides an email service to users. Users login on a form and they read email. Or perhaps it is a social site. Preferably for our story, the site is using HTTP only but this trick can be done for most HTTPS sites as well with only a mildly bigger effort.

This known and popular site runs its services on ‘site.com’. When you’re logged in to site.com, your session is a cookie that keeps getting sent to the server and the server sometimes updates the contents and sends it back to the browser. This is the way millions of sites work.

As an evil person, you now register a domain and setup an attack server. You register a domain that has the same ending as the legitimate site. You call your domain ‘fun-cat-and-food-pics-from-site.com’ (FCAFPFS among friends).

anattackMr evil person also knows that there are several web browsers, typically special purpose ones for different kinds of devices, that use libcurl as its base. (But it doesn’t have to be a browser, it could be other tools as well but for this story a browser fits fine.) Lets say you know a person or two who use one of those browsers on site.com.

You send a phishing email to these persons. Or post a funny picture on the social site. The idea is to have them click your link to follow through to your funny FCAFPFS site. A little social engineering, who on the internet can truly resist funny cats?

The visitor’s browser (which uses a vulnerable libcurl) does the wrong “tailmatch” on the domain for the session cookie and gladly hands it off to the attacker site. The attacker site could then use that cookie to access site.com and hijack the user’s session. Quite likely the attacker would immediately change password or something and logout/login so that the innocent user who’s off looking at cats will get a “you are logged-out” message when he/she returns to site.com…

The attacker could then use “password reminder” features on other sites to get emails sent to site.com to allow him to continue attacking the user’s other accounts on other services. Or if site.com was a social site, the attacker would post more cat links and harvest more accounts etc…

End of story.

Any process improvements?

For every security vulnerability a project gets, it should be a reason for scrutinizing what went wrong. I don’t mean in the actual code necessarily, but more what processes we lack that made the bug sneak in and remain in there for so long without being detected.

What didn’t we do that made this bug survive this long?

Obviously we didn’t review the code properly. But this is a tricky beast that was added a very long time ago, back in the days when the project was young and not that many developers were involved. Before we even had a test suite. I do believe that we have slightly better reviews these days, but I will also claim that it is far from sure that we would detect this flaw by a sheer code review.

Test cases! We clearly lacked the necessary test case setup that tested the limitations of how cookies are supposed to work and get sent back and forth. We’ve added a few new ones now that detect this particular flaw fine, but I think we have reasons to continue to search for various kinds of negative tests we should do. Involving cookies of course, but also generally in other areas of the curl project.

Of course, we’re all just working voluntarily here on spare time so we can’t expect miracles.

(an attack, picture by Andy Gardner)

Better pipelining in libcurl 7.30.0

Back in October 2006, we added support for HTTP pipelining to libcurl. The implementation was naive and simple: it basically preferred to pipeline everything on the single connection to a given host if it could. It works only with the multi interface and if you do a second request to the same host it will try to pipeline that.

pipelines

Over the years the feature was bugfixed and improved slightly, which proved that at least a couple of applications actually used it – but it was never any particularly big hit among libcurl’s vast amount of features.

Related background information that gives details on some of the problems with pipelining in the wild can be found in Mark Nottingham’s Making HTTP Pipelining Usable on the Open Web internet-draft, Mozilla’s bug report “HTTP pipelining by default” and Chrome’s pipelining docs.

Now, more than six years later, Linus Nielsen Feltzing (a colleague and friend at Haxx) strikes back with a much improved and almost completely revamped HTTP pipelining support (merged into master just hours before the new-feature window closed for the pending 7.30.0 release). This time, the implementation features and provides:

  • a configurable number of connections and pipelines to each unique host name
  • a round-robin approach that favors starting new connections first, and then pipeline on existing connections once the maximum number of connections to the host is reached
  • a max-depth value that when filled makes the code not add any more requests on that connection/pipeline
  • a pipe penalization system that avoids adding new requests to pipes that are known to be receiving very large contents and thus possibly would stall subsequent requests for an extended period of time
  • a server blacklist that allows the application to specify a list of servers for which HTTP pipelining should not be attempted – real world tests has proven that some servers are too broken to be allowed to play the game

The code also adds a feature that helps out applications to do massive amount of requests in a controlled manner:

A hard maximum amount of connections, that when reached makes libcurl queue up easy handles internally until they can create a new connection or re-use a previously used one. That allows an application to for example set the limit to 50 and then add 400 handles to the multi handle but it will still only use 50 connections as a maximum so over time when requests get completed it will start new transfers on the requests that are waiting in line and thus shrinking the queue and keeping the maximum amount of connections until there’s less than 50 left to do…

Previously that kind of queuing had to be done by the application itself, but now with the much more extensive pipelining support it really isn’t as easy for an application to know when the new request can get pipelined or create a new connection so this logic is now provided by libcurl itself. It is likely going to be appreciated and used also by non-pipelining applications…

This implementation is accompanied with a bunch of new test cases for libcurl (and even a new HTTP test server for the purpose), and it has been tested in the wild for a while with libcurl as the engine in a web browser implementation (the company doing that has requested to remain anonymous). We believe it is in fairly decent state, but as this is a large step and the first release it is shipped with I expect there to be some hiccups along the way.

Two things to take note of:

  1. pipelining is only available for users of libcurl’s multi interface, and only if explicitly enabled with CURLMOPT_PIPELINING
  2. the curl command line tool does not use the multi interface and thus it will not use pipelining

The curl year 2012

2012

So what did happen in the curl project during 2012?

First some basic stats

We shipped 6 releases with 199 identified bug fixes and some 40 other changes. That makes on average 33 bug fixes shipped every 61st day or a little over one bug fix done every second day. All this done with about 1000 commits to the git repository, which is roughly the same amount of git activity as 2010 and 2011. We merged commits from 72 different authors, which is a slight increase from the 62 in 2010 and 68 in 2011.

On our main development mailing list, the curl-library list, we now have 1300 subscribers and during 2012 it got about 3500 postings from almost 500 different From addresses. To no surprise, I posted by far the largest amount of mails there (847) with the number two poster being Günter Knauf who posted 151 times. Four more members posted more than 100 times: Steve Holme (145), Dan Fandrich (131), Marc Hoersken (130) and Yang Tse (107). Last year I sent 1175 mails to the same list…

Notable events

I’ve walked through the biggest changes and fixes and here are the particular ones I found stood out during this otherwise rather calm and laid back curl year. Possibly in a rough order of importance…

  1. We started the year with two security vulnerability announcements, regarding an SSL weakness and an injection flaw. They were reported in 2011 though and we didn’t get any further security alerts during 2012 which I think is good. Or a sign that nobody has been looking close enough…
  2. We got two interesting additions in the SSL backend department almost simultaneously. We got native Windows support with the use of the schannel subsystem and we got native Mac OS X support with the use of Darwin SSL. Thanks to these, we can now offer SSL-enabled libcurls on those operating systems without relying on third party SSL libraries.
  3. The VERIFYHOST debacle took off with “security researchers” throwing accusations and insults, ending with us releasing a curl release with the bug removed. It did however unfortunately lead to some follow-up problems in for example the PHP binding.
  4. During the autumn, the brokeness of WSApoll was identified, and we now build libcurl without it and as a result libcurl now works better on Windows!
  5. In an attempt to allow libcurl-using applications to avoid select() and its problems, we introduced the new public function curl_multi_wait. It avoids the FD_SETSIZE limit and makes it harder to screw up…
  6. The overly bloated User-Agent string for the curl tool was dramatically shortened when we cut out all the subsystems/libraries and their version numbers from the string. Now there’s only curl and its version number left. Nice and clean.
  7. In July we finally introduced metalink support in the curl tool with the curl 7.27.0 release. It’s been one of those things we’ve discussed for ages that finally came through and became reality.
  8. With the brand new HTTP CONNECT support in the test suite we suddenly could get much improved test cases that does SSL or just tunnel through an HTTP proxy with the CONNECT request. It of course helps us avoid regressions and otherwise improve curl and libcurl.

What didn’t happen

  1. I made an attempt to get the spindly hacking going, but I’ve mostly failed with that effort. I have personally not had enough time and energy to work on it, and the interest from the rest of the world seems luke warm at best.
  2. HTTP pipelining. Linus Nielsen Feltzing has a patch series in the works with a much improved pipelining support for libcurl. I’ll write a separate post about it once it gets in. Obviously we failed to merge it before the end of the year.
  3. Some of my friends like to mock me about curl not being completely IPv6 friendly due to its lack of support for Happy Eyeballs, and of course they’re right. Making curl just do two connects on IPv6-enabled machines should be a fairly small change but yet I haven’t yet managed to get into actually implementing it…
  4. DANE is SSL cert verification with records from DNS thanks to DNSSEC. Firefox has some experiments going and Chrome already supports it. This is a technology that truly can improve HTTPS going forwards and allow us to avoid the annoyingly weak and broken CA model…

I won’t promise that any of these will happen during 2013 but I can promise there will be efforts…

The Future

I wrote a separate post a short while ago about the HTTP2 progress, and I expect 2013 to bring much more details and discussions in that area. Will we get SRV record support soon? Or perhaps even URI records? Will some of the recent discussions about new HTTP auth schemes develop into something that will reach the internet in the coming year?

In libcurl we will switch to an internal design that is purely non-blocking with a lot of if-then-that-else source code removed for checks which interface that is used. I’ll make a follow-up post with details about that as well as soon as it actually happens.

Our Responsibility

curl and libcurl are considered pillars in the internet world by now. This year I’ve heard from several places by independent sources how people consider support by curl to be an important driver for internet technology. As long as we don’t have it, it hasn’t really reached everyone and that things won’t get adopted for real in the Internet community until curl has it supported. As father of the project it makes me proud and humble, but I also feel the responsibility of making sure that we continue to do the right thing the right way.

I also realize that this position of ours is not automatically glued to us, we need to keep up the good stuff to make it stick.

cURL

HTTP2, SPDY and spindly right now

SPDYOn November 28, the HTTPbis group within the IETF published the first draft for the upcoming HTTP2 protocol. What is being posted now is a start and a foundation for further discussions and changes. It is basically an import of the SPDY version 3 protocol draft.

There’s been a lot of resistance within the HTTPbis to the mandated TLS that SPDY has been promoting so far and it seems unlikely to reach a consensus as-is. There’s also been a lot of discussion and debate over the compression SPDY uses. Not only because of the pre-populated dictionary that might already be a little out of date or the fact that gzip compression consumes a notable amount of memory per stream, but also recently the security aspect to compression thanks to the CRIME attack.

Meanwhile, the discussions on the spdy development list have brought up several changes to the version 3 that are suggested and planned to become part of the version 4 that is work in progress. Including a new compression algorithm, shorter length fields (now 16bit) and more. Recently discussions have brought up a need for better flexibility when it comes to prioritization and especially changing prio run-time. For like when browser users switch tabs or simply scroll down the page and you rather have the images you have in sight to load before the images you no longer have in view…

I started my work on Spindly a little over a year ago to build a stand-alone library, primarily intended for libcurl so that we could soon offer SPDY downloads for it. We’re still only on SPDY protocol 2 there and I’ve failed to attract any fellow developers to the project and my own lack of time has basically made the project not evolve the way I wanted it to. I haven’t given up on it though. I hope to be able to get back to it eventually, very much also depending on how the HTTPbis talk goes. I certainly am determined to have libcurl be part of the upcoming HTTP2 experiments (even if that is not happening very soon) and spindly might very well be the infrastructure that powers libcurl then.

We’ll see…

this vs that and ssh through proxy

Taken from the web stats for daniel.haxx.se during September 2012. The top-10 search phrases used to end up on a page on this site:

  1. ssh proxy (198)
  2. curl vs wget (145)
  3. ftp vs http (92)
  4. wget vs curl (91)
  5. ssh through proxy (72)
  6. http vs ftp (67)
  7. curl wget (55)
  8. wget curl (53)
  9. http ftp (46)
  10. difference between ftp and http (45)

The top-3 most visited pages on my site during the same month were:

  1. SSH Through or Over Proxy (viewed 4800 times)
  2. curl vs Wget (viewed 3000 times)
  3. FTP vs HTTP (viewed 2300 times)

I guess this tells me something. I’m not sure what…

Swedish FOSS-magasin

foss-magasinClaes, our friend from foss-sthlm and several Open Source adventures, has just fired off a new initiative: FOSS-Magasin. The site launched for real on the evening November 19th.

Where there’s no real content on the site yet, Claes has set out a mission for himself and future contributors to create a site with technical content in Swedish that we geeks miss. This would be within areas such as FOSS, *nix, networking and more.

Tired of the poor state of technical and IT related media in Sweden that always seem to try to capture the really large audience and therefore always dumb down everything to a silly level, this is meant to be directed on more competent and interested readers.

The site is free and Claes is looking around for contributors to help hem get content to publish. I can only urge my Swedish friends to join up and help it get going, as I think it would be nice to get a proper Swedish tech site. For me, it will be especially interesting for things that actually happen in or otherwise is related to Sweden, as for all the rest I personally have no problems accessing English sites to get the info.

The cookie RFC 6265

http://www.rfc-editor.org/rfc/rfc6265.txt is out!

Back when I was a HTTP rookie in the late 90s, I once expected that there was this fine RFC document somewhere describing how to do HTTP cookies. I was wrong. A lot of others have missed that document too, both before and after my initial search.

I was wrong in the sense that sure there were RFCs for cookies. There were even two of them (RFC2109 and RFC2965)! The only sad thing was however that both of them were totally pointless as in effect nobody (servers nor clients) implemented cookies like that so they documented idealistic protocols that didn’t exist in the real world. This sad state has made people fall into cookie problems all the way into modern days when they’ve implemented services according to those RFCs and then blame their browser for failing.

cookie

It turned out that the only document that existed that were being used, was the original Netscape cookie document. It can’t even be called a specification because it is so short and is so lacking in details that it leaves large holes open and forces implementers to guess about the missing pieces. A sweet irony in itself is the fact that even Netscape removed the document from their site so the only place to find this document is at archive.org or copies like the one I link to above at the curl.haxx.se site. (For some further and more detailed reading about the history of cookies and a bunch of the flaws in the protocol/design, I recommend Michal Zalewski’s excellent blog post HTTP cookies, or how not to design protocols.)

While HTTP was increasing in popularity as a protocol during the 00s and still is, and more and more stuff get done in browsers and everything and everyone are using cookies, the protocol was still not documented anywhere as it was actually used.

Somewhat modeled after the httpbis working group (which is working on updating and bugfixing the HTTP 1.1 spec), IETF setup a mailing list named httpstate in the early 2009 to start discussing what problems there are with cookies and all related matters. After lively discussions throughout the year, the working group with the same name as the mailinglist was founded at December 11th 2009.

One of the initial sparks to get the httpstate group going came from Bill Corry who said this about the start:

In late 2008, Jim Manico and I connected to create a specification for
HTTPOnly — we saw the security issues arising from how the browser vendors
were implementing HTTPOnly in varying ways[1] due to a lack of a specification
and formed an ad-hoc working group to tackle the issue[2].
When I approached the IETF about forming a charter for an official working
group, I was told that I was <quote> “wasting my time” because cookies itself
did not have a proper specification, so it didn’t make sense to work on a spec
for HTTPOnly.  Soon after, we pursued reopening the IETF httpstate Working
Group to tackle the entire cookie spec, not just HTTPOnly.  Eventually Adam
Barth would become editor and Jeff Hodges our chair.

In late 2008, Jim Manico and I connected to create a specification for HTTPOnly — we saw the security issues arising from how the browser vendors were implementing HTTPOnly in varying ways[1] due to a lack of a specification and formed an ad-hoc working group to tackle the issue[2].

When I approached the IETF about forming a charter for an official working group, I was told that I was <quote> “wasting my time” because cookies itself did not have a proper specification, so it didn’t make sense to work on a spec for HTTPOnly.  Soon after, we pursued reopening the IETF httpstate Working Group to tackle the entire cookie spec, not just HTTPOnly. Eventually Adam Barth would become editor and Jeff Hodges our chair.

Since then Adam Barth has worked fiercely as author of the specification and lots of people have joined in and contributed their views, comments and experiences, and we have over time really nailed down how cookies work in the wild today. The current spec now actually describes how to send and receive cookies, the way it is done by existing browsers and clients. Of course, parts of this new spec say things I don’t think it should, like how it deals with the order of cookies in headers, but as everything in life we needed to compromise and I seemed to be rather lonely on my side of that “fence”.
I must stress that the work has only involved to document how things work today and not to invent or create anything new. We don’t fix any of the many known problems with cookies, but we describe how you write your protocol implementation if you want to interact fine with existing infrastructure.

The new spec explicitly obsoletes the older RFC2965, but doesn’t obsolete RFC2109. That was done already by RFC2965. (I updated this paragraph after my initial post.)

Oh, and yours truly is mentioned in the ending “acknowledgements” section. It’s actually the second RFC I get to be mentioned in, the first being RFC5854.

Future

I am convinced that I will get reason to get back to the cookie topic soon and describe what is being worked on for the future. Once the existing cookies have been documented, there’s a desire among people to design something that overcomes the problems with the existing protocol. Adam’s CAKE proposal being one of the attempts and ideas in the pipe.

Another parallel IETF effort is the http-auth mailing list in which lots of discussions around HTTP authentication is being held, and as they often today involve cookies there’s a lot of talk about them there as well. See for example Timothy D. Morgan’s document Weaning the Web off of Session Cookies.

I’ll certainly track the development. And possibly even participate in shaping how this will go. We’ll see.

(cookie image source)

HTTP transfer compression

HTTP is a protocol that looks simple in its simplest form and its readability can easily fool you into believing an implementation is straight forward and quickly done.

That’s not the reality though. HTTP is a very big protocol with lots of corners and twisting mazes that one can get lost in. Even after having been the primary author of curl for 13+ years, there are still lots of HTTP things I don’t master.

To name an example of an area with little known quirks, there’s a funny situation when it comes to how HTTP supports and doesn’t support compression of data and compression of data in transfer.

No header compression

A little flaw in HTTP in regards to compression is that there’s no way to compress headers, in either direction. No matter what we do, we must send the text as-is and both requests and responses are sometimes very big these days. Especially taken into account how cookies are always inserted in requests if they match. Anyway, this flaw is nothing we can do anything about in HTTP 1.1 so we need to live with it.

On the other side, compression of the response body is supported.

Compressing data

Compression of data can be done in two ways: either the actual transfer is compressed or the body data is compressed. The difference is subtle, but when the body data is compressed there’s really nothing that mandates that the client has to uncompress it for the end user, and if the transfer is compressed the receiver must uncompress it in order to deal with the transfer properly.

For reasons that are unknown to me, HTTP clients and servers started out supporting compression only using the Content-Encoding style. It means that the client tells the server what kind of content encodings it supports (using Accept-Encoding:) and the server then sends the response data using one of the supported encodings. The client then decides on its own that if it gets the content in one of the compressed formats that it said it can handle, it will automatically uncompress that on arrival.

The HTTP protocol designers however intended this kind of automatic compression and subsequent uncompress to be done using Transfer-Encoding, as the end result is the completely transparent and the uncompress action is implied and intended by the protocol design. This is done by the client telling the server what transfer encodings it supports with the TE: header and the server adds a Transfer-Encoding: header in the response telling how the transfer is encoded.

HTTP 1.1 introduced a mandatory encoding that all servers can use whenever they feel like it: chunked encoding, so all HTTP 1.1 clients already have to deal with Transfer-Encoding to some degree.

Surely curl is better than all those other guys, right?

Not really. Not yet anyway.

curl has a long history of copying its behavior from what the browsers do, in order to allow users to basically script anything imaginable that is HTTP-like with curl. In this vein, we implemented compression support the same way as all the browsers did it: the content encoding style. (I have reason to believe that at least Opera actually supports or used to support compressed Transfer-Encoding.)

Starting now (code pushed to git repo just after the 7.21.5 release), we’ve taken steps to improve things. We’re changing gears and we’re introducing support for asking for and using compressed Transfer-Encoding. This will start out as an optional feature/flag (–tr-encoding / CURLOPT_TRANSFER_ENCODING) so that we can start out and see how servers in the wild behave and that we can deal with them properly. Then possibly we can switch the default in the future to always ask for compressed transfers. At least for the command line tool.

We know from the little tests we are aware of, that there are at least one known little problem or shall we call it a little detail to keep on eye at, with introducing compressed Transfer-Encoding. As has been so fine reported several years ago in the opera blog Browser sniffing gone wrong (again): Cars.com, there are cases where this may cause the server to send data that gets compressed twice (using both Content and Transfer Encoding) and that needs to be taken care of properly by the client.

At the time of this writing, I’ve not yet taken care of the double-compress case in the code, but I intend to get on to it within shortly.

I’m otherwise very interested in hearing what kind of experience people will have from this. What servers and sites will support this as documented and intended?

Today I am Chinese

Google thinks I'm ChineseI’m going about my merry life and I use google every day.

Today Google decided I’m in China and redirects me to google.com.hk and it shows me all text in Chinese. It’s just another proof how silly it is trying to use the IP address to figure out location (or even worse trying to guess language based on IP address).

Click on the image to get it in its full glory.

I haven’t changed anything locally, but it seems Google has updated (broken) their database somehow.

Just to be perfectly sure my browser isn’t playing any tricks behind my back, I snooped up the headers sent in the HTTP request and there’s nothing notable:

GET /complete/search?output=firefox&client=firefox&hl=en-US&q=rockbox HTTP/1.1
Host: suggestqueries.google.com
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101209 Fedora/3.6.13-1.fc13 Firefox/3.6.13
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Cookie: PREF=ID=dc410 [truncated]

Luckily, I know about the URL “google.com/ncr” (No Country Redirect) so I can still use it, but not through my browser’s search box…