Let me introduce you to what I consider one of the worst hacks we have in current and modern internet protocols: the Public Suffix List (PSL). This is a list (maintained by Mozilla) with domains that have some kind administrative setup or arrangement that makes sub-domains independent. For example, you can’t be allowed to set cookies for “*.com” because .com is a TLD that has independent domains. But the same thing goes for “*.co.uk” and there’s no hint anywhere about this – except for the Public Suffix List. Then, take that simple little example and extrapolate to a domain system that grows with several new TLDs every month and more. The PSL is now several thousands of entries long.
And cookies isn’t the only thing this is used for. Another really common and perhaps even more important use case is for wildcard matches in TLS server certificates. You should not be allowed to buy and use a cert for “*.co.uk” but you can for “*.yourcompany.co.uk”…
Not really official but still…
If you read the cookie RFC or the spec for how to do TLS wildcard certificate matching you won’t read any line putting it crystal clear that the Suffix List is what you must use and I’m sure different browser solve this slightly differently but in practice and most unfortunately (if you ask me) you must either use the list or make your own to be fully compliant with how the web works 2014.
curl, wget and the PSL
In curl and libcurl, we have so far not taken the PSL into account which is by choice since I’ve not had any decent way to handle it and there are lots of embedded and other use cases that simply won’t be able to cope with that large PSL chunk.
Wget hasn’t had any PSL awareness either, but the recent weeks this has been brought up on the wget list and more attention has been given to this. Work has been initiated to do something about it, which has lead to…
I’ve mostly cheered the effort so far and said that I wouldn’t mind building on this to enhance curl in the future if it just gets a suitable (liberal enough) license and it seems to go in that direction. For curl’s sake, I would like to get a conditional dependency on this so that people without particular size restrictions can use this, and people on more embedded and special-purpose situations can continue to build without PSL support.
If you’re interested in helping out in curl and libcurl in this area, feel most welcome!
Meanwhile, the IETF has set up a new mailing list called dbound for discussions around PSL and similar issues and it seems very timely!
While the first traces of http2 support in curl was added already back in September 2013 it hasn’t been until recently it actually was made useful. There’s been a lot of http2 related activities in the curl team recently and in the late January 2014 we could run our first command line inter-op tests against public http2 (draft-09) servers on the Internet.
There’s a lot to be said about http2 for those not into its nitty gritty details, but I’ll focus on the curl side of this universe in this blog post. I’ll do separate posts and presentations on http2 “internals” later.
A quick http2 overview
http2 (without the minor version, as per what the IETF work group has decided on) is a binary protocol that allows many logical streams multiplexed over the same physical TCP connection, it features compressed headers in both directions and it has stream priorities and more. It is being designed to maintain the user concepts and paradigms from HTTP 1.1 so web sites don’t have to change contents and web authors won’t need to relearn a lot. The web will not break because of http2, it will just magically work a little better, a little smoother and a little faster.
In libcurl we build http2 support with the help of the excellent library called nghttp2, which takes care of all the binary protocol details for us. You’ll also have to build it with a new enough version of the SSL library of your choice, as http2 over TLS will require use of some fairly recent TLS extensions that not many older releases have and several TLS libraries still completely lack!
The need for an extension is because with speaking TLS over port 443 which HTTPS implies, the current and former web infrastructure assumes that we will speak HTTP 1.1 over that, while we now want to be able to instead say we want to talk http2. When Google introduced SPDY then pushed for a new extension called NPN to do this, which when taken through the standardization in IETF has been forked, changed and renamed to ALPN with roughly the same characteristics (I don’t know the specific internals so I’ll stick to how they appear from the outside).
So, NPN and especially ALPN are fairly recent TLS extensions so you need a modern enough SSL library to get that support. OpenSSL and NSS both support NPN and ALPN with a recent enough version, while GnuTLS only supports ALPN. You can build libcurl to use any of these threes libraries to get it to talk http2 over TLS.
http2 using libcurl
(This still describes what’s in curl’s git repository, the first release to have this level of http2 support is the upcoming 7.36.0 release.)
Users of libcurl who want to enable http2 support will only have to set CURLOPT_HTTP_VERSION to CURL_HTTP_VERSION_2_0 and that’s it. It will make libcurl try to use http2 for the HTTP requests you do with that handle.
For HTTP URLs, this will make libcurl send a normal HTTP 1.1 request with an offer to the server to upgrade the connection to version 2 instead. If it does, libcurl will continue using http2 in the clear on the connection and if it doesn’t, it’ll continue using HTTP 1.1 on it. This mode is what Firefox and Chrome will not support.
For HTTPS URLs, libcurl will use NPN and ALPN as explained above and offer to speak http2 and if the server supports it. there will be http2 sweetness from than point onwards. Or it selects HTTP 1.1 and then that’s what will be used. The latter is also what will be picked if the server doesn’t support ALPN and NPN.
Alt-Svc and ALTSVC are new things planned to show up in time for http2 draft-11 so we haven’t really thought through how to best support them and provide their features in the libcurl API. Suggestions (and patches!) are of course welcome!
http2 with curl
Hardly surprising, the curl command line tool also has this power. You use the –http2 command line option to switch on the libcurl behavior as described above.
Translated into old-style
To reduce transition pains and problems and to work with the rest of the world to the highest possible degree, libcurl will (decompress and) translate received http2 headers into http 1.1 style headers so that applications and users will get a stream of headers that look very much the way you’re used to and it will produce an initial response line that says HTTP 2.0 blabla.
Building (lib)curl to support http2
See the README.http2 file in the lib/ directory.
This is still a draft version of http2!
I just want to make this perfectly clear: http2 is not out “for real” yet. We have tried our http2 support somewhat at the draft-09 level and Tatsuhiro has worked on the draft-10 support in nghttp2. I expect there to be at least one more draft, but perhaps even more, before http2 becomes an official RFC. We hope to be able to stay on the frontier of http2 and deliver support for the most recent draft going forward.
PS. If you try any of this and experience any sort of problems, please speak to us on the curl-library mailing list and help us smoothen out whatever problem you got!
I’m writing this just hours after the HTTPbis design team meeting in London 2014 has ended.
Around 30 people attended the meeting i Mozilla’s central London office. The fridge was filled up with drinks, the shelves were full of snacks and goodies. The day could begin. This is the Saturday after the IETF89 week so most people attending had already spent the whole or parts of the week before here in London doing other HTTP and network related work. The HTTPbis sessions at the IETF itself were productive and had already pushed us forward.
We started at 9:30 and we quickly got to work. Mark Nottingham guided us through the day with usual efficiency.
We all basically hang out in a huge room, some in chairs, some in sofas and a bunch of people on the floor or just standing up. We had mikes passed around and the http2 discussions were flowing back and forth depending on the topics and what people felt about them. Some of the issues that were nailed down this time and will end up detailed in the upcoming draft-11 are (strictly speaking, we only discussed the things and formed opinions, as by IETF guidelines we can’t decide things on an offline meeting like this):
- Priorities of streams will have a dependency graph or direction, making individual streams less or more important than other
- A client can send headers without compression and tell the proxy that the header shouldn’t be compressed – used a way to mitigate some of the compression security problems
- There will be no TLS renegotiation allowed mid-session. Basically a client will have to tear down the connection and negotiate again if suddenly a need to use a client certificate arises.
- Alt-Svc is the way forward so ALTSVC will appear a new frame in draft-11. This is the way to signal to an application that there is another “route” to the same content on the same server. This will allow for what is popularly known as “opportunistic encryption” or at least one sort of that. In short, you can do “plain-text” HTTP over a TLS connection using this…
- We decided that a server should support gzip contents from clients
There were some other things too handled, but I believe those are the main changes. When the afternoon started to turn long, beers and other beverages were brought out and we did enjoy a relaxing social finale of the day before we split up in smaller groups and headed out in the busy London night to get dinner…
Thanks everyone for a great day. I also appreciated meeting several people in real-life I never met before, only discussed with and read emails from online and of course some old friends I hadn’t seen in a long time!
Oh, there’s also a new rough time frame for http2 going forward. Nearest in time would be the draft-11 at the end of March and another interim in the beginning of June (Boston?).
Out of all people present today, I believe Mozilla was the company with the largest team (8 attendees) – funnily enough none of us Mozillians there actually work in this office or even in this country.
The HTTPbis working group of the IETF had an interim meeting in Zurich January 22nd to 24th. I participated from remote and I listened in on the discussions over webex and followed the jabber room while the meetings were going on, addressing HTTP2 protocol issues one by one ironing out quirks and progressing forward.
I won’t bore you with details why I wasn’t present in Zurich.
Here’s a couple of quick and brief reflections from my perspective:
Listening in from remote like this is not at all adequately compensating for not being there. A room full of people discussing something is really hard to follow from remote and completely impossible to interact with. It is better than not being able to listen in at all, but it is certainly not a replacement for being there.
It is amazing how much faster people can come to conclusions and fix issues when being in the same room. Issues that have been lingering in the tracker for a very long time could be dealt with and closed within minutes. Things like what to call the protocol in ALPN or to remove the ability to switched off flow control. Not all issues of course…
HTTP2 draft-09 that soon will become draft-10 to reflect the updates from this meeting and more, is from my perspective quite far in its process. It is clearly at a point that seems to be OK with most people and the discussions are now just about details. Of course the devil is in the details and I’m not saying it can’t take a long time to settle on them, but the structure and main concepts of the protocol are probably now established.
There were not very many proxy or server people at the interim. Most of the people seemed to be client-side oriented and some service oriented. I’m personally client-side biased myself but I hope this doesn’t lead to us deciding on things that the “other side” will have problems with down the line.
Firefox nightly supports HTTP2 draft-09 (for https:// URLs) and twitter supports it server side. Enable it in the browser by entering “about:config” in the URL bar and change the config entry called “network.http.spdy.enabled.http2draft” to true. Done.
Some of the biggest HTTP2 changes brought up compared to what draft-09 says include:
- no more ability to switch off flow control
- the prioritization field/concept “weighted dependency tree approach”
- >= TLS 1.2 with ephemeral ciphers
- MUST not use TLS compression
- tolerant to TLS false start or at least must accept/buffer application layer data
There was also a whole lot of discussions about TLS for http:// urls, proxys, MITM for SSL, opportunistic encryption and more but I believe most of those issues remained at the same position as before – I missed out on parts of the last afternoon so I may have missed some details. It’ll all be revealed in draft-10 and the mailing list I’m sure!
Update: the minutes from the interim meeting.
Allow as much as possible and only sanitize what’s absolutely necessary.
That has basically been the rule for the URL parser in curl and libcurl since the project was started in the 90s. The upside with this is that you can use curl to torture your web servers with tests and you can handicraft really imaginary stuff to send and thus subsequently to receive. It kind of assumes that the user truly gives curl a URL the user wants to use.
Why would you give curl a broken URL?
But of course life and internet protocols, and perhaps in particular HTTP, is more involved than that. It soon becomes more complicated.
Everyone who’s writing a web user-agent based on RFC 2616 soon faces the fact that redirects based on the Location: header is a source of fun and head-scratching. It is defined in the spec as only allowing “absolute URLs” but the reality is that they were also provided as relative ones by web servers already from the start so the browsers of course support that (and the pending HTTPbis document is already making this clear). curl thus also adopted support for relative URLs, meaning the ability to “merge” or “add” a relative URL onto a previously used absolute one had to be implemented. And even illegally constructed URLs are done this way and in the grand tradition of web browsers, they have not tried to stop users from doing bad things, they have instead adapted and now instead try to convert it to what the user could’ve meant. Like for example using a white space within the URL you send in a Location: header. Even curl has to sanitize that so that it works more like the browsers.
Relative path segments
The path part of URLs are truly to be seen as a path, in that it is a hierarchical scheme where each slash-separated part adds a piece. Like “/first/second/third.html”
As it turns out, you can also include modifiers in the path that have special meanings. Like the “..” (two dots or periods next to each other) known from shells and command lines to mean “one directory level up” can also be used in the path part of a URL like “/one/three/../two/three.html” which equals “/one/two/three.html” when the dotdot sequence is handled. This dot removal procedure is documented in the generic URL specification RFC 3986 (published January 2005) and is completely protocol agnostic. It works like this for HTTP, FTP and every other protocol you provide a path part for.
In its traditional spirit of just accepting and passing along, curl didn’t use to treat “dotdots” in any particular way but handed it over to the server to deal with. There probably aren’t that terribly many such occurrences either so it never really caused any problems or made any users hit any particular walls (or they were too shy to report it); until one day back in February this year… so we finally had to do something about this. Some 8 years after the spec saying it must be done was released.
Alas, libcurl 7.32.0 now features (once it gets released around August 12th) full traversal and handling of such sequences in the path part of URLs. It also includes single dot sequences like in “/one/./two”. libcurl will detect such uses and convert the path to a sequence without them and continue on. This of course will cause a limited altered behavior for the possible small portion of users out there in the world who would use dotdot sequences and actually want them to get sent as-is the way libcurl has been doing it. I decided against adding an option for disabling this behavior, but of course if someone would experience terrible pain and can reported about it convincingly to us we could possible reconsider that decision in the future.
I suspect (and hope) this will just be another little change along the way that will make libcurl act more standard and more like the browsers and thus cause less problems to users but without people much having to care about how or why.
Further reading: the dotdot.c file from the libcurl source tree!
A dot to dot surprise drawing for you and your kids (click for higher resolution)
On November 28, the HTTPbis group within the IETF published the first draft for the upcoming HTTP2 protocol. What is being posted now is a start and a foundation for further discussions and changes. It is basically an import of the SPDY version 3 protocol draft.
There’s been a lot of resistance within the HTTPbis to the mandated TLS that SPDY has been promoting so far and it seems unlikely to reach a consensus as-is. There’s also been a lot of discussion and debate over the compression SPDY uses. Not only because of the pre-populated dictionary that might already be a little out of date or the fact that gzip compression consumes a notable amount of memory per stream, but also recently the security aspect to compression thanks to the CRIME attack.
Meanwhile, the discussions on the spdy development list have brought up several changes to the version 3 that are suggested and planned to become part of the version 4 that is work in progress. Including a new compression algorithm, shorter length fields (now 16bit) and more. Recently discussions have brought up a need for better flexibility when it comes to prioritization and especially changing prio run-time. For like when browser users switch tabs or simply scroll down the page and you rather have the images you have in sight to load before the images you no longer have in view…
I started my work on Spindly a little over a year ago to build a stand-alone library, primarily intended for libcurl so that we could soon offer SPDY downloads for it. We’re still only on SPDY protocol 2 there and I’ve failed to attract any fellow developers to the project and my own lack of time has basically made the project not evolve the way I wanted it to. I haven’t given up on it though. I hope to be able to get back to it eventually, very much also depending on how the HTTPbis talk goes. I certainly am determined to have libcurl be part of the upcoming HTTP2 experiments (even if that is not happening very soon) and spindly might very well be the infrastructure that powers libcurl then.
For the readers of my blog, this is a copy of what I posted to the httpbis mailing list on July 12th 2012.
This is a response to the httpis call for expressions of interest
I am the project leader and maintainer of the curl project. We are the open source project that makes libcurl, the transfer library and curl the command line tool. It is among many things a client-side implementation of HTTP and HTTPS (and some dozen other application layer protocols). libcurl is very portable and there exist around 40 different bindings to libcurl for virtually all languages/enviornments imaginable. We estimate we might have upwards 500
million users or so. We’re entirely voluntary driven without any paid developers or particular company backing.
HTTP/1.1 problems I’d like to see adressed
Pipelining – I can see how something that better deals with increasing bandwidths with stagnated RTT can improve the end users’ experience. It is not easy to implement in a nice manner and provide in a library like ours.
Many connections – to avoid problems with pipelining and queueing on the connections, many connections are used and and it seems like a general waste that can be improved.
We’ve implemented HTTP/1.1 and we intend to continue to implement any and all widely deployed transport layer protocols for data transfers that appear on the Internet. This includes HTTP/2.0 and similar related protocols.
curl has not yet implemented SPDY support, but fully intends to do so. The current plan is to provide SPDY support with the help of spindly, a separate SPDY library project that I lead.
We’ve selected to support SPDY due to the momentum it has and the multiple existing implementaions that A) have multi-company backing and B) prove it to be a truly working concept. SPDY seems to address HTTP’s pipelining and many-connections problems in a decent way that appears to work in reality too. I believe SPDY keeps enough HTTP paradigms to be easily upgraded to for most parties, and yet the ones who can’t or won’t can remain with HTTP/1.1 without too much pain. Also, while Spindly is not production-ready, it has still given me the sense that implementing a SPDY protocol engine is not rocket science and that the existing protocol specs are good.
By relying on external libs for protocol and implementation details, my hopes is that we should be able to add support for other potentially coming HTTP/2.0-ish protocols that gets deployed and used in the wild. In the curl project we’re unfortunately rarely able to be very pro-active due to the nature of our contributors, which tends to make us follow the rest and implement and go with what others have already decided to go with.
I’m not aware of any competitors to SPDY that is deployed or used to any particular and notable extent on the public internet so therefore no other ”HTTP/2.0 protocol” has been considered by us. The two biggest protocol details people will keep mention that speak against SPDY is SSL and the compression requirements, yet I like both of them. I intend to continue to participate in dicussions and technical arguments on the ietf-http-wg mailing list on HTTP details for as long as I have time and energy.
curl currently supports Basic, Digest, NTLM and Negotiate for both host and proxy.
Similar to the HTTP protocol, we intend to support any widely adopted authentication protocols. The HOBA, SCRAM and Mutual auth suggestions all seem perfectly doable and fine in my perspective.
However, if there’s no proper logout mechanism provided for HTTP auth I don’t forsee any particular desire from browser vendor or web site creators to use any of these just like they don’t use the older ones either to any significant extent. And for automatic (non-browser) uses only, I’m not sure there’s motivation enough to add new auth protocols to HTTP as at least historically we seem to rarely be able to pull anything through that isn’t pushed for by at least one of the major browsers.
The “updated HTTP auth” work should be kept outside of the HTTP/2.0 work as far as possible and similar to how RFC2617 is separate from RFC2616 it should be this time around too. The auth mechnism should not be too tightly knit to the HTTP protocol.
The day we can clense connection-oriented authentications like NTLM from the HTTP world will be a happy day, as it’s state awareness is a pain to deal with in a generic HTTP library and code.
As a protocol geek I love working in my open source projects curl, libssh2, c-ares and spindly. I also participate in a few related IETF working groups around these protocols, and perhaps primarily I enjoy the HTTPbis crowd.
Meanwhile, I’m a consultant during the day and most of my projects and assignments involve embedded systems and primarily embedded Linux. The protocol part of my life tends to be left to get practiced during my “copious” amount of spare time – you know that time after your work, after you’ve spent time with your family and played with your kids and done the things you need to do at home to keep the household in a decent shape. That time when the rest of the family has gone to bed and you should too but if you did when would you ever get time to do that fun things you really want to do?
IETF has these great gatherings every now and then and they’re awesome places to just drown in protocol mumbo jumbo for several days. They’re being hosted by various cities all over the world so often I deem them too far away or too awkward to go to, also a lot because I rarely have any direct monetary gain or compensation for going but rather I’d have to do it as a vacation and pay for it myself.
IETF 83 is going to be held in Paris during March 25-30 and it is close enough for me to want to go and HTTPbis and a few other interesting work groups are having scheduled meetings. I really considered going, at least to meet up with HTTP friends.
Something very rare instead happened that prevents me from going there! My customer (for whom I work full-time since about six months and shall remain nameless for now) asked me to join their team and go visit the large embedded conference ESC in San Jose, California in the exact same week! It really wasn’ t a hard choice for me, since this is my job and being asked to do something because I’m wanted is a nice feeling and position – and they’re paying me to go there. It will also be my first time in California even though I guess I won’t get time to actually see much of it.
I hope to write a follow-up post later on about what I’m currently working with, once it has gone public.