http2 explained

http2 front page

I’m hereby offering you all the first version of my document explaining http2, the protocol. It features explanations on the background, basic fundamentals, details on the wire format and something about existing implementations and what’s to expect for the future.

The full PDF currently boasts 27 pages at version 1.0, but I plan to keep up with the http2 development going further and I’m also kind of thinking that I will get at least some user feedback, and I’ll do subsequent updates to improve and extend the document over time. Of course time will tell how good that will work.

The document is edited in libreoffice and that file is available on github, but ODT is really not a format suitable for patches and merges so I hope we can sort out changes with filing issues and sending emails.

Wireshark dissector work

WiresharkRecently I cloned the Wireshark git repository and started updating the http2 dissector. That’s the piece of code that gets called to analyze a stream of data that Wireshark thinks is http2.

The current http2 dissector was left at draft-09 state, while the current draft at the time was number 11 and there have been several changes on the binary format since so any reasonably updated client or server would send or receive byte streams that Wireshark couldn’t properly display.

I never wrote any dissector code before but I must say Wireshark didn’t disappoint. It was straight forward and mostly downright easy to fix most of the wrong details. I’m not pretending to be a master at this nor is the dissector code anywhere near “finished” yet but I still enjoyed the API and how to write a thing like this.

I’ve since dissected plain-text http2 streams that I’ve done with curl+nghttp2 and I’ve also used the SSLKEYLOGFILE trick with Firefox to automatically decrypt the TLS session and have the dissector figure out the underlying http2 parts.

If there’s any little snag to mention, it is the fact that they insist on getting patches submitted directly to gerrit instead of any mailing list or similar. This required me to create a gerrit account, and really figure out how to push my stuff from git to there, instead of the more traditional and simpler approach of just sending my patch to a mailing list or possibly submitting it to a bug/patch tracker somewhere with my browser.

Call me old-style but in fact the hip way of today with a pull-request github style would also have been much easier. Here’s what my gerrit submission looks like. But I get it, gerrit does push a little more work over to the submitter and I figure that once a submitter such as myself finally has fixed all the nits in the patch it is very easy for the project to actually merge it. I actually got someone else to help me point out how to even find the link to view the code review after the first one was submitted on the site… (when I post this, my patch has not yet been accepted or merged into the wireshark git repo)

Here’s a basic screenshot showing a trace of Firefox requesting https://nghttp2.org using http2. Click it for the full thing.

wireshark-screenshot

.. and what happens this morning my time? There’s a brand new http2 draft-12 out with more changes on the on-the-wire format! Well to be honest, that really wasn’t a surprise. I’ll get the new stuff supported too, but I’ll do that in a separate patch as I prefer to hold off until I see a live stream by at least one implementation to test against.

curl and proxy headers

Starting in the next curl release, 7.37.0, the curl tool supports the new command line option –proxy-header. (Completely merged at this commit.)

It works exactly like –header does, but will only include the headers in requests sent to a proxy, while the opposite is true for –header: that will only be sent in requests that will go to the end server. But of course, if you use a HTTP proxy and do a normal GET for example, curl will include headers for both the proxy and the server in the request. The bigger difference is when using CONNECT to a proxy, which then only will use proxy headers.

libcurl

For libcurl, the story is slightly different and more complicated since we’re having things backwards compatible there. The new libcurl still works exactly like the former one by default.

CURLOPT_PROXYHEADER is the new option that is the new proxy header option that should be set up exactly like CURLOPT_HTTPHEADER is

CURLOPT_HEADEROPT is then what an application uses to set how libcurl should use the two header options. Again, by default libcurl will keep working like before and use the CURLOPT_HTTPHEADER list in all HTTP requests. To change that behavior and use the new functionality instead, set CURLOPT_HEADEROPT to CURLHEADER_SEPARATE.

Then, the header lists will be handled as separate. An application can then switch back to the old behavior with a unified header list by using CURLOPT_HEADEROPT set to CURLHEADER_UNIFIED.

curl and the road to IPv6

I’d like to comment Paul Saab’s presentation from the other day at the World IPv6 Congress titled “The Road To IPv6 – Bumpy“. Paul works for Facebook and in his talk he apparently mentioned curl (slide 24 of the PDF set).

Lots of my friends have since directed my attention to those slides and asked for my comment. I haven’t seen Paul’s actual presentation, only read the slides, but I have had a shorter twitter conversation with him about what he meant with his words.

The slide in question says exactly this:

Curl

  • Very hostile to the format of the IPv6 address
  • Wants everything bracket enclosed
  • Many IPv6 bugs that only recently were fixed

Let’s see what those mean. Very hostile to the format of the IPv6 address and Wants everything bracket enclosed are basically the same thing.

Paul makes a big point about the fact that if you want to write a URL with an IP address instead of a host name, you have to put that IP address within [brackets] when the IP address is an IPv6 one, which you don’t do if it is an IPv4 one.

Right. Sure. You do. That’s certainly an obstacle when converting slightly naive applications from IPv4 to IPv6 environments. This syntax is mandated by RFCs and standards (RFC3986 to be exact). curl follows the standards and you’ll do it the same way in other tools and clients that use URLs. The problem manifests itself if you use curl for your task, but if you’d use something else instead that something else would have the same issue if it follows the standards. The reason for the brackets requirements is of course that IPv6 numerical addresses contain colons and colons already have a reserved meaning in the host part of URLs so they had to come up with some way to handle that.

Then finally, Many IPv6 bugs that only recently were fixed he said.

I’m the main developer and maintainer of the curl project. This is news to me. Sure we always fix bugs and we always find stupid things we fix so there’s no doubt about that we’ve had IPv6 related bugs that we’ve fixed – and that we still have IPv6 related bugs we haven’t yet found – but saying that we fixed many such bugs recently? That isn’t something I’m aware of. My guess is that he’s talking about hiccups we’ve had after introducing happy eyeballs, a change we introduced in release 7.34.0 in December 2013.

curl has had IPv6 support since January 2001. We’re on that bumpy road to IPv6!

groups.google.com hates greylisting

Dear Google,

Here’s a Wikipedia article for you: Greylisting.

After you’ve read that, then consider the error message I always get for my groups.google.com account when you disable mail sending to me due to “bouncing”:

Bounce status Your email address is currently flagged as bouncing. For additional information or to correct this, view your email status here [link].

Following that link I get to read the reason:

“Google tried to deliver your message, but it was rejected by the server for the recipient domain haxx.se by [mailserver]. The error that the other server returned was: 451 4.7.1 Greylisting in action, please come back later”

See, even the error message spells out what it is all about!

Thanks to this feature of Google groups, I cannot participate in any such lists/groups for as long as I keep my greylisting activated since it’ll keep disabling mail delivery to me.

Enabling greylisting decreased my spam flood to roughly a third of the previous volume (and now I’m at 500-1000 spam emails/day) so I’m not ready to disable it yet. I just have to not use google groups.

Update: I threw in the towel and I now whitelist google.com servers to get around this problem…

Reducing the Public Suffix pain?

Let me introduce you to what I consider one of the worst hacks we have in current and modern internet protocols: the Public Suffix List (PSL). This is a list (maintained by Mozilla) with domains that have some kind administrative setup or arrangement that makes sub-domains independent. For example, you can’t be allowed to set cookies for “*.com” because .com is a TLD that has independent domains. But the same thing goes for “*.co.uk” and there’s no hint anywhere about this – except for the Public Suffix List. Then, take that simple little example and extrapolate to a domain system that grows with several new TLDs every month and more. The PSL is now several thousands of entries long.

And cookies isn’t the only thing this is used for. Another really common and perhaps even more important use case is for wildcard matches in TLS server certificates. You should not be allowed to buy and use a cert for “*.co.uk” but you can for “*.yourcompany.co.uk”…

Not really official but still…

If you read the cookie RFC or the spec for how to do TLS wildcard certificate matching you won’t read any line putting it crystal clear that the Suffix List is what you must use and I’m sure different browser solve this slightly differently but in practice and most unfortunately (if you ask me) you must either use the list or make your own to be fully compliant with how the web works 2014.

curl, wget and the PSL

In curl and libcurl, we have so far not taken the PSL into account which is by choice since I’ve not had any decent way to handle it and there are lots of embedded and other use cases that simply won’t be able to cope with that large PSL chunk.

Wget hasn’t had any PSL awareness either, but the recent weeks this has been brought up on the wget list and more attention has been given to this. Work has been initiated to do something about it, which has lead to…

libpsl

Tim Rühsen took the baton and started the libpsl project and its associated mailing list, as a foundation for something for Wget to use to get PSL awareness.

I’ve mostly cheered the effort so far and said that I wouldn’t mind building on this to enhance curl in the future if it just gets a suitable (liberal enough) license and it seems to go in that direction. For curl’s sake, I would like to get a conditional dependency on this so that people without particular size restrictions can use this, and people on more embedded and special-purpose situations can continue to build without PSL support.

If you’re interested in helping out in curl and libcurl in this area, feel most welcome!

dbound

Meanwhile, the IETF has set up a new mailing list called dbound for discussions around PSL and similar issues and it seems very timely!

what’s –next for curl

curl is finally getting support for doing multiple independent requests specified in the same command line, which allows users to make even better use of curl’s excellent persistent connection handling and more. I don’t know when I first got the question of how to do a GET and a POST in a single command line with curl, but I do know that we’ve had the TODO item about adding such a feature mentioned since 2004 – and I know it wasn’t added there right away…

Starting in curl 7.36.0, we can respond with a better answer: use the --next option!

curl has been able to work with multiple URLs on the command line virtually since day 1, but all the command line options would then mostly apply and be used for all specified URLs.

This new --next option introduces a “boundary”, or a wall if you like, between options on the command line. The options set before –next will be handled as one request and the options set on the right side of –next will start adding up to another request. You of course then need to specify at least one URL per individual such section and you can add any number of –next on the command line. If the command line then gets too long, we also support the same logic and sequence in the “config files” which is the way you can specify command line arguments into a text file and have curl read them from there using -K or --config.

Here’s a somewhat silly example to illustrate. This fist makes a POST and then a HEAD to two different pages on the same host:

curl -d FOO example.com/input.cgi --next--head example.com/robots.txt

Thanks to Steve Holme for his hard work on implementing this!

http2 in curl

While the first traces of http2 support in curl was added already back in September 2013 it hasn’t been until recently it actually was made useful. There’s been a lot of http2 related activities in the curl team recently and in the late January 2014 we could run our first command line inter-op tests against public http2 (draft-09) servers on the Internet.

There’s a lot to be said about http2 for those not into its nitty gritty details, but I’ll focus on the curl side of this universe in this blog post. I’ll do separate posts and presentations on http2 “internals” later.

A quick http2 overview

http2 (without the minor version, as per what the IETF work group has decided on) is a binary protocol that allows many logical streams multiplexed over the same physical TCP connection, it features compressed headers in both directions and it has stream priorities and more. It is being designed to maintain the user concepts and paradigms from HTTP 1.1 so web sites don’t have to change contents and web authors won’t need to relearn a lot. The web will not break because of http2, it will just magically work a little better, a little smoother and a little faster.

In libcurl we build http2 support with the help of the excellent library called nghttp2, which takes care of all the binary protocol details for us. You’ll also have to build it with a new enough version of the SSL library of your choice, as http2 over TLS will require use of some fairly recent TLS extensions that not many older releases have and several TLS libraries still completely lack!

The need for an extension is because with speaking TLS over port 443 which HTTPS implies, the current and former web infrastructure assumes that we will speak HTTP 1.1 over that, while we now want to be able to instead say we want to talk http2. When Google introduced SPDY then pushed for a new extension called NPN to do this, which when taken through the standardization in IETF has been forked, changed and renamed to ALPN with roughly the same characteristics (I don’t know the specific internals so I’ll stick to how they appear from the outside).

So, NPN and especially ALPN are fairly recent TLS extensions so you need a modern enough SSL library to get that support. OpenSSL and NSS both support NPN and ALPN with a recent enough version, while GnuTLS only supports ALPN. You can build libcurl to use any of these threes libraries to get it to talk http2 over TLS.

http2 using libcurl

(This still describes what’s in curl’s git repository, the first release to have this level of http2 support is the upcoming 7.36.0 release.)

Users of libcurl who want to enable http2 support will only have to set CURLOPT_HTTP_VERSION to CURL_HTTP_VERSION_2_0 and that’s it. It will make libcurl try to use http2 for the HTTP requests you do with that handle.

For HTTP URLs, this will make libcurl send a normal HTTP 1.1 request with an offer to the server to upgrade the connection to version 2 instead. If it does, libcurl will continue using http2 in the clear on the connection and if it doesn’t, it’ll continue using HTTP 1.1 on it. This mode is what Firefox and Chrome will not support.

For HTTPS URLs, libcurl will use NPN and ALPN as explained above and offer to speak http2 and if the server supports it. there will be http2 sweetness from than point onwards. Or it selects HTTP 1.1 and then that’s what will be used. The latter is also what will be picked if the server doesn’t support ALPN and NPN.

Alt-Svc and ALTSVC are new things planned to show up in time for http2 draft-11 so we haven’t really thought through how to best support them and provide their features in the libcurl API. Suggestions (and patches!) are of course welcome!

http2 with curl

Hardly surprising, the curl command line tool also has this power. You use the –http2 command line option to switch on the libcurl behavior as described above.

Translated into old-style

To reduce transition pains and problems and to work with the rest of the world to the highest possible degree, libcurl will (decompress and) translate received http2 headers into http 1.1 style headers so that applications and users will get a stream of headers that look very much the way you’re used to and it will produce an initial response line that says HTTP 2.0 blabla.

Building (lib)curl to support http2

See the README.http2 file in the lib/ directory.

This is still a draft version of http2!

I just want to make this perfectly clear: http2 is not out “for real” yet. We have tried our http2 support somewhat at the draft-09 level and Tatsuhiro has worked on the draft-10 support in nghttp2. I expect there to be at least one more draft, but perhaps even more, before http2 becomes an official RFC. We hope to be able to stay on the frontier of http2 and deliver support for the most recent draft going forward.

PS. If you try any of this and experience any sort of problems, please speak to us on the curl-library mailing list and help us smoothen out whatever problem you got!

cURL

HTTPbis design team meeting London

I’m writing this just hours after the HTTPbis design team meeting in London 2014 has ended.

Around 30 people attended the meeting i Mozilla’s central London office. The fridge was filled up with drinks, the shelves were full of snacks and goodies. The day could begin. This is the Saturday after the IETF89 week so most people attending had already spent the whole or parts of the week before here in London doing other HTTP and network related work. The HTTPbis sessions at the IETF itself were productive and had already pushed us forward.

We started at 9:30 and we quickly got to work. Mark Nottingham guided us through the day with usual efficiency.

We all basically hang out in a huge room, some in chairs, some in sofas and a bunch of people on the floor or just standing up. We had mikes passed around and the http2 discussions were flowing back and forth depending on the topics and what people felt about them. Some of the issues that were nailed down this time and will end up detailed in the upcoming draft-11 are (strictly speaking, we only discussed the things and formed opinions, as by IETF guidelines we can’t decide things on an offline meeting like this):

  • Priorities of streams will have a dependency graph or direction, making individual  streams less or more important than other
  • A client can send headers without compression and tell the proxy that the header shouldn’t be compressed – used a way to mitigate some of the compression security problems
  • There will be no TLS renegotiation allowed mid-session. Basically a client will have to tear down the connection and negotiate again if suddenly a need to use a client certificate arises.
  • Alt-Svc is the way forward so ALTSVC will appear a new frame in draft-11. This is the way to signal to an application that there is another “route” tIMG_20140308_100453o the same content on the same server. This will allow for what is popularly known as “opportunistic encryption” or at least one sort of that. In short, you can do “plain-text” HTTP over a TLS connection using this…
  • We decided that a server should support gzip contents from clients

There were some other things too handled, but I believe those are the main changes. When the afternoon started to turn long, beers and other beverages were brought out and we did enjoy a relaxing social finale of the day before we split up in smaller groups and headed out in the busy London night to get dinner…

Thanks everyone for a great day. I also appreciated meeting several people in real-life I never met before, only discussed with and read emails from online and of course some old friends I hadn’t seen in a long time!

Oh, there’s also a new rough time frame for http2 going forward. Nearest in time would be the draft-11 at the end of March and another interim in the beginning of June (Boston?).

As a reminder, here’s what was happened for draft-10, and here is http2 draft-10.

Out of all people present today, I believe Mozilla was the company with the largest team (8 attendees) – funnily enough none of us Mozillians there actually work in this office or even in this country.

curl, open source and networking