I sat down and talked curl, HTTP, HTTP/2, IETF, the web, Firefox and various internet subjects with Mattias Geniar on his podcast the syscast the other day.
There is no websockets for HTTP/2.
By this, I mean that there’s no way to negotiate or upgrade a connection to websockets over HTTP/2 like there is for HTTP/1.1 as expressed by RFC 6455. That spec details how a client can use Upgrade: in a HTTP/1.1 request to switch that connection into a websockets connection.
Note that websockets is not part of the HTTP/1 spec, it just uses a HTTP/1 protocol detail to switch an HTTP connection into a websockets connection. Websockets over HTTP/2 would similarly not be a part of the HTTP/2 specification but would be separate.
(As a side-note, that Upgrade: mechanism is the same mechanism a HTTP/1.1 connection can get upgraded to HTTP/2 if the server supports it – when not using HTTPS.)
There’s was once a draft submitted that describes how websockets over HTTP/2 could’ve been done. It didn’t get any particular interest in the IETF HTTP working group back then and as far as I’ve seen, there has been very little general interest in any group to pick up this dropped ball and continue running. It just didn’t go any further.
This is important: the lack of websockets over HTTP/2 is because nobody has produced a spec (and implementations) to do websockets over HTTP/2. Those things don’t happen by themselves, they actually require a bunch of people and implementers to believe in the cause and work for it.
Websockets over HTTP/2 could of course have the benefit that it would only be one stream over the connection that could serve regular non-websockets traffic at the same time in many other streams, while websockets upgraded on a HTTP/1 connection uses the entire connection exclusively.
So what do users do instead of using websockets over HTTP/2? Well, there are several options. You probably either stick to HTTP/2, upgrade from HTTP/1, use Web push or go the WebRTC route!
If you really need to stick to websockets, then you simply have to upgrade to that from a HTTP/1 connection – just like before. Most people I’ve talked to that are stuck really hard on using websockets are app developers that basically only use a single connection anyway so doing that HTTP/1 or HTTP/2 makes no meaningful difference.
Sticking to HTTP/2 pretty much allows you to go back and use the long-polling tricks of the past before websockets was created. They were once rather bad since they would waste a connection and be error-prone since you’d have a connection that would sit idle most of the time. Doing this over HTTP/2 is much less of a problem since it’ll just be a single stream that won’t be used that much so it isn’t that much of a waste. Plus, the connection may very well be used by other streams so it will be less of a problem with idle connections getting killed by NATs or firewalls.
The Web Push API was brought by W3C during 2015 and is in many ways a more “webby” way of doing push than the much more manual and “raw” method that websockets is. If you use websockets mostly for push notifications, then this might be a more convenient choice.
Also introduced after websockets, is WebRTC. This is a technique introduced for communication between browsers, but it certainly provides an alternative to some of the things websockets were once used for.
Websockets over HTTP/2 could still be done. The fact that it isn’t done just shows that there isn’t enough interest.
Recall how browsers only speak HTTP/2 over TLS, while websockets can also be done over plain TCP. In fact, the only way to upgrade a HTTP connection to websockets is using the HTTP/1 Upgrade: header trick, and not the ALPN method for TLS that HTTP/2 uses to reduce the number of round-trips required.
If anyone would introduce websockets over HTTP/2, they would then probably only be possible to be made over TLS from within browsers.
When I started the precursor to the curl project, httpget, back in 1996, I wrote my first URL parser. Back then, the universal address was still called URL: Uniform Resource Locators. That spec was published by the IETF in 1994. The term “URL” was then used as source for inspiration when naming the tool and project curl.
The term URL was later effectively changed to become URI, Uniform Resource Identifiers (published in 2005) but the basic point remained: a syntax for a string to specify a resource online and which protocol to use to get it. We claim curl accepts “URLs” as defined by this spec, the RFC 3986. I’ll explain below why it isn’t strictly true.
There was also a companion RFC posted for IRI: Internationalized Resource Identifiers. They are basically URIs but allowing non-ascii characters to be used.
The WHATWG consortium later produced their own URL spec, basically mixing formats and ideas from URIs and IRIs with a (not surprisingly) strong focus on browsers. One of their expressed goals is to “Align RFC 3986 and RFC 3987 with contemporary implementations and obsolete them in the process“. They want to go back and use the term “URL” as they rightfully state, the terms URI and IRI are just confusing and no humans ever really understood them (or often even knew they exist).
The WHATWG spec follows the good old browser mantra of being very liberal in what it accepts and trying to guess what the users mean and bending backwards trying to fulfill. (Even though we all know by now that Postel’s Law is the wrong way to go about this.) It means it’ll handle too many slashes, embedded white space as well as non-ASCII characters.
From my point of view, the spec is also very hard to read and follow due to it not describing the syntax or format very much but focuses far too much on mandating a parsing algorithm. To test my claim: figure out what their spec says about a trailing dot after the host name in a URL.
On top of all these standards and specs, browsers offer an “address bar” (a piece of UI that often goes under other names) that allows users to enter all sorts of fun strings and they get converted over to a URL. If you enter “http://localhost/%41” in the address bar, it’ll convert the percent encoded part to an ‘A’ there for you (since 41 in hex is a capital A in ASCII) but if you type “http://localhost/A A” it’ll actually send “/A%20A” (with a percent encoded space) in the outgoing HTTP GET request. I’m mentioning this since people will often think of what you can enter there as a “URL”.
The above is basically my (skewed) perspective of what specs and standards we have so far to work with. Now we add reality and let’s take a look at what sort of problems we get when my URL isn’t your URL.
So what is a URL?
Or more specifically, how do we write them. What syntax do we use.
I think one of the biggest mistakes the WHATWG spec has made (and why you will find me argue against their spec in its current form with fierce conviction that they are wrong), is that they seem to believe that URLs are theirs to define and work with and they limit their view of URLs for browsers, HTML and their address bars. Sure, they are the big companies behind the browsers almost everyone uses and URLs are widely used by browsers, but URLs are still much bigger than so.
The WHATWG view of a URL is not widely adopted outside of browsers.
If we ask users, ordinary people with no particular protocol or web expertise, what a URL is what would they answer? While it was probably more notable years ago when the browsers displayed it more prominently, the :// (colon-slash-slash) sequence will be high on the list. Seeing that marks the string as a URL.
Heck, going beyond users, there are email clients, terminal emulators, text editors, perl scripts and a bazillion other things out there in the world already that detects URLs for us and allows operations on that. It could be to open that URL in a browser, to convert it to a clickable link in generated HTML and more. A vast amount of said scripts and programs will use the colon-slash-slash sequence as a trigger.
The WHATWG spec says it has to be one slash and that a parser must accept an indefinite amount of slashes. “http:/example.com” and “http:////////////////////////////////////example.com” are both equally fine. RFC 3986 and many others would disagree. Heck, most people I’ve confronted the last few days, even people working with the web, seem to say, think and believe that a URL has two slashes. Just look closer at the google picture search screen shot at the top of this article, which shows the top images for “URL” google gave me.
We just know a URL has two slashes there (and yeah, file: URLs most have three but lets ignore that for now). Not one. Not three. Two. But the WHATWG doesn’t agree.
“Is there really any reason for accepting more than two slashes for non-file: URLs?” (my annoyed question to the WHATWG)
The spec says so because browsers have implemented the spec.
No better explanation has been provided, not even after I pointed out that the statement is wrong and far from all browsers do. You may find reading that thread educational.
In the curl project, we’ve just recently started debating how to deal with “URLs” having another amount of slashes than two because it turns out there are servers sending back such URLs in Location: headers, and some browsers are happy to oblige. curl is not and neither is a lot of other libraries and command line tools. Who do we stand up for?
A space character (the ASCII code 32, 0x20 in hex) cannot be part of a URL. If you want it sent, you percent encode it like you do with any other illegal character you want to be part of the URL. Percent encoding is the byte value in hexadecimal with a percent sign in front of it. %20 thus means space. It also means that a parser that for example scans for URLs in a text knows that it reaches the end of the URL when the parser encounters a character that isn’t allowed. Like space.
Browsers typically show the address in their address bars with all %20 instances converted to space for appearance. If you copy the address there into your clipboard and then paste it again in your text editor you still normally get the spaces as %20 like you want them.
I’m not sure if that is the reason, but browsers also accept spaces as part of URLs when for example receiving a redirect in a HTTP response. That’s passed from a server to a client using a Location: header with the URL in it. The browsers happily allow spaces in that URL, encode them as %20 and send out the next request. This forced curl into accepting spaces in redirected “URLs”.
Making URLs support non-ASCII languages is of course important, especially for non-western societies and I’ve understood that the IRI spec was never good enough. I personally am far from an expert on these internationalization (i18n) issues so I just go by what I’ve heard from others. But of course users of non-latin alphabets and typing systems need to be able to write their “internet addresses” to resources and use as links as well.
In an ideal world, we would have the i18n version shown to users and there would be the encoded ASCII based version below, to get sent over the wire.
For international domain names, the name gets converted over to “punycode” so that it can be resolved using the normal system name resolvers that know nothing about non-ascii names. URIs have no IDN names, IRIs do and WHATWG URLs do. curl supports IDN host names.
WHATWG states that URLs are specified as UTF-8 while URIs are just ASCII. curl gets confused by non-ASCII letters in the path part but percent encodes such byte values in the outgoing requests – which causes “interesting” side-effects when the non-ASCII characters are provided in other encodings than UTF-8 which for example is standard on Windows…
Similar to what I’ve written above, this leads to servers passing back non-ASCII byte codes in HTTP headers that browsers gladly accept, and non-browsers need to deal with…
No URL standard
I’ve not tried to write a conclusive list of problems or differences, just a bunch of things I’ve fallen over recently. A “URL” given in one place is certainly not certain to be accepted or understood as a “URL” in another place.
Not even curl follows any published spec very closely these days, as we’re slowly digressing for the sake of “web compatibility”.
There’s no unified URL standard and there’s no work in progress towards that. I don’t count WHATWG’s spec as a real effort either, as it is written by a closed group with no real attempts to get the wider community involved.
I’m employed by Mozilla and Mozilla is a member of WHATWG and I have colleagues working on the WHATWG URL spec and other work items of theirs but it makes absolutely no difference to what I’ve written here. I also participate in the IETF and I consider myself friends with authors of RFC 1738, RFC 3986 and others but that doesn’t matter here either. My opinions are my own and this is my personal blog.
On April 12 I had the pleasure of doing another talk in the Google Tech Talk series arranged in the Google Stockholm offices. I had given it the title “HTTP/2 is upon us, and here’s what you need to know about it.” in the invitation.
The room seated 70 persons but we had the amazing amount of over 300 people in the waiting line who unfortunately didn’t manage to get a seat. To those, and to anyone else who cares, here’s the video recording of the event.
If you’ve seen me talk about HTTP/2 before, you might notice that I’ve refreshed the material somewhat since before.
In July 2015, 40-something HTTP implementers and experts of the world gathered in the city of Münster, Germany, to discuss nitty gritty details about the HTTP protocol during four intense days. Representatives for major browsers, other well used HTTP tools and the most popular HTTP servers were present. We discussed topics like how HTTP/2 had done so far, what we thought we should fix going forward and even some early blue sky talk about what people could potentially see being subjects to address in a future HTTP/3 protocol.
The HTTP Workshop was much appreciated by the attendees and it is now about to be repeated. In the summer of 2016, the HTTP Workshop is again taking place in Europe, but this time as a three-day event slightly further up north: in the capital of Sweden and my home town: Stockholm. During 25-27 July 2016, we intend to again dig in deep.
If you feel this is something for you, then please head over to the workshop site and submit your proposal and show your willingness to attend. This year, I’m also joining the Program Committee and I’ve signed up for arranging some of the local stuff required for this to work out logistically.
The HTTP Workshop 2015 was one of my favorite events of last year. I’m now eagerly looking forward to this year’s version. It’ll be great to meet you here!
I find that many web minded people working client-side or even server-side have neglected to learn the subtle details of the redirects of today. Here’s my attempt at writing another text about it that the ones who should read it still won’t.
Nothing here, go there!
The “redirect” is a fundamental part of the HTTP protocol. The concept was present and is documented already in the first spec (RFC 1945), published in 1996, and it has remained well used ever since.
A redirect is exactly what it sounds like. It is the server sending back an instruction to the client – instead of giving back the contents the client wanted. The server basically says “go look over [here] instead for that thing you asked for“.
But not all redirects are alike. How permanent is the redirect? What request method should the client use in the next request?
All redirects also need to send back a Location: header with the new URI to ask for, which can be absolute or relative.
Permanent or Temporary
Is the redirect meant to last or just remain for now? If you want a GET to resource A permanently redirect users to resource B with another GET, send back a 301. It also means that the user-agent (browser) is meant to cache this and keep going to the new URI from now on when the original URI is requested.
The temporary alternative is 302. Right now the server wants the client to send a GET request to B, but it shouldn’t cache this but keep trying the original URI when directed to it.
Note that both 301 and 302 will make browsers do a GET in the next request, which possibly means changing method if it started with a POST (and only if POST). This changing of the HTTP method to GET for 301 and 302 responses is said to be “for historical reasons”, but that’s still what browsers do so most of the public web will behave this way.
In practice, the 303 code is very similar to 302. It will not be cached and it will make the client issue a GET in the next request. The differences between a 302 and 303 are subtle, but 303 seems to be more designed for an “indirect response” to the original request rather than just a redirect.
These three codes were the only redirect codes in the HTTP/1.0 spec.
GET or POST?
All three of these response codes, 301 and 302/303, will assume that the client sends a GET to get the new URI, even if the client might’ve sent a POST in the first request. This is very important, at least if you do something that doesn’t use GET.
If the server instead wants to redirect the client to a new URI and wants it to send the same method in the second request as it did in the first, like if it first sent POST it’d like it to send POST again in the next request, the server would use different response codes.
To tell the client “the URI you sent a POST to, is permanently redirected to B where you should instead send your POST now and in the future”, the server responds with a 308. And to complicate matters, the 308 code is only recently defined (the spec was published in June 2014) so older clients may not treat it correctly! If so, then the only response code left for you is…
The (older) response code to tell a client to send a POST also in the next request but temporarily is 307. This redirect will not be cached by the client though so it’ll again post to A if requested to. The 307 code was introduced in HTTP/1.1.
Oh, and redirects work the exact same way in HTTP/2 as they do in HTTP/1.1.
The helpful table version
|Switch to GET||301||302 and 303|
|Keep original method||308||307|
It’s a gap!
What about other HTTP methods?
I decided to simplify the explanation above. In all places where it says POST above, you can replace it with any non-GET method. They’re just slightly less common on the browser centric web.
curl and redirects
I couldn’t write a text like this without spicing it up with some curl details!
It turns out that there are web services out there in the world that want a POST sent, are responding with HTTP redirects that use a 301, 302 or 303 response code and still want the HTTP client to send the next request as a POST. As explained above, browsers won’t do that and neither will curl – by default.
Since these setups exist, and they’re actually not terribly rare, curl offers options to alter its behavior.
You can tell curl to not change the non-GET request method to GET after a 30x response by using the dedicated options for that:
–post301, –post302 and –post303. If you are instead writing a libcurl based application, you control that behavior with the CURLOPT_POSTREDIR option.
Here’s how a simple HTTP/1.1 redirect can look like. Note the 301, this is “permanent”:
When I asked my surrounding in March 2015 to guess the expected HTTP/2 adoption by now, we as a group ended up with about 10%. OK, the question was vaguely phrased and what does it really mean? Let’s take a look at some aspects of where we are now.
Perhaps the biggest flaw in the question was that it didn’t specify HTTPS. All the browsers of today only implement HTTP/2 over HTTPS so of course if every HTTPS site in the world would support HTTP/2 that would still be far away from all the HTTP requests. Admittedly, browsers aren’t the only HTTP clients…
During the fall of 2015, both nginx and Apache shipped release versions with HTTP/2 support. nginx made it slightly harder for people by forcing users to select either SPDY or HTTP/2 (which was a technical choice done by them, not really enforced by the protocols) and also still telling users that SPDY is the safer choice.
Let’s Encrypt‘s finally launching their public beta in the early December also helps HTTP/2 by removing one of the most annoying HTTPS obstacles: the cost and manual administration of server certs.
Amount of Firefox responses
This is the easiest metric since Mozilla offers public access to the metric data. It is skewed since it is opt-in data and we know that certain kinds of users are less likely to enable this (if you’re more privacy aware or if you’re using it in enterprise environments for example). This also then measures the share by volume of requests; making the popular sites get more weight.
Firefox 43 counts no less than 22% of all HTTP responses as HTTP/2 (based on data from Dec 8 to Dec 16, 2015).
Out of all HTTP traffic Firefox 43 generates, about 63% is HTTPS which then makes almost 35% of all Firefox HTTPS requests are HTTP/2!
Firefox 43 is also negotiating HTTP/2 four times as often as it ends up with SPDY.
Amount of browser traffic
One estimate of how large share of browsers that supports HTTP/2 is the caniuse.com number. Roughly 70% on a global level. Another metric is the one published by KeyCDN at the end of October 2015. When they enabled HTTP/2 by default for their HTTPS customers world wide, the average number of users negotiating HTTP/2 turned out to be 51%. More than half!
Cloudflare however, claims the share of supported browsers are at a mere 26%. That’s a really big difference and I personally don’t buy their numbers as they’re way too negative and give some popular browsers very small market share. For example: Chrome 41 – 49 at a mere 15% of the world market, really?
I think the key is rather that it all boils down to what you measure – as always.
Amount of the top-sites in the world
Netcraft bundles SPDY with HTTP/2 in their October report, but it says that “29% of SSL sites within the thousand most popular sites currently support SPDY or HTTP/2, while 8% of those within the top million sites do.” (note the “of SSL sites” in there)
That’s now slightly old data that came out almost exactly when Apache first release its HTTP/2 support in a public release and Nginx hadn’t even had it for a full month yet.
Facebook eventually enabled HTTP/2 in November 2015.
Amount of “regular” sites
There’s still no ideal service that scans a larger portion of the Internet to measure adoption level. The httparchive.org site is about to change to a chrome-based spider (from IE) and once that goes live I hope that we will get better data.
W3Tech’s report says 2.5% of web sites in early December – less than SPDY!
I like how isthewebhttp2yet.com looks so far and I’ve provided them with my personal opinions and feedback on what I think they should do to make that the preferred site for this sort of data.
Using the shodan search engine, we could see that mid December 2015 there were about 115,000 servers on the Internet using HTTP/2. That’s 20,000 (~24%) more than isthewebhttp2yet site says. It doesn’t really show percentages there, but it could be interpreted to say that slightly over 6% of HTTP/1.1 sites also support HTTP/2.
On Dec 3rd 2015, Cloudflare enabled HTTP/2 for all its customers and they claimed they doubled the number of HTTP/2 servers on the net in that single move. (The shodan numbers seem to disagree with that statement.)
Amount of system lib support
iOS 9 supports HTTP/2 in its native HTTP library. That’s so far the leader of HTTP/2 in system libraries department. Does Mac OS X have something similar?
I had expected Window’s wininet or other HTTP libs to be up there as well but I can’t find any details online about it. I hear the Android HTTP libs are not up to snuff either but since okhttp is now part of Android to some extent, I guess proper HTTP/2 in Android is not too far away?
Amount of HTTP API support
I hear very little about HTTP API providers accepting HTTP/2 in addition or even instead of HTTP/1.1. My perception is that this is basically not happening at all yet.
If you’re using a modern Chrome browser today against a Google service you’re already (mostly) using QUIC instead of HTTP/2, thus you aren’t really adding to the HTTP/2 client side numbers but you’re also not adding to the HTTP/1.1 numbers.
QUIC and other QUIC-like (UDP-based with the entire stack in user space) protocols are destined to grow and get used even more as we go forward. I’m convinced of this.
Everyone was right! It is mostly a matter of what you meant and how to measure it.
Recall the words on the Chromium blog: “We plan to remove support for SPDY in early 2016“. For Firefox we haven’t said anything that absolute, but I doubt that Firefox will support SPDY for very long after Chrome drops it.
curl shipped “proper” HTTP/2 support as it looks in the final specification both for the command line tool and the libcurl library before any browsers did in their release versions. (Firefox was the first browser to ship HTTP/2 in a release version, followed by Chrome. Both did this in the beginning of 2015.)
libcurl features an option that lets the application to select HTTP version to use, and that includes HTTP/2 since back then. The command line tool got a corresponding command line option (aptly named –http2) to switch on this protocol version.
This goes hand in hand with curl’s general philosophy that it just does the basics and you have to specifically switch on more features and tell it to enable things you want to use. This conservative approach makes it very reliable protocol-wise and provides applications a very large level of control. The downside is of course that fewer people switch on certain features since they’re just not aware of them. Or as in this case with HTTP/2, it also complicates matters that only a subset of users still have a HTTP/2 tool and library since they might still run outdated versions or they may run recent versions that were built without the necessary prerequisites present (basically the nghttp2 library).
libcurl is even more conservative that the curl tool so switching default for the library isn’t really on the agenda yet. We are very careful of modifying behavior so we’re saving that for later but what about upping the curl tool a notch?
We could switch the default to use HTTP/2 as soon as the tool has the powers built-in. But for regular clear text HTTP, using the Upgrade: header has a few drawbacks. It makes the requests larger, it complicates matter somewhat that most servers don’t do upgrades on HTTP POST requests (and a few others) so there might indeed be several requests before an upgrade is actually made even on a server that supports HTTP/2 and perhaps the strongest reason: we already found servers that (wrongly, I would claim) reject requests with Upgrade: headers they don’t understand. All this taken together, Upgrade over HTTP will break too many requests that work with HTTP 1.1. And then we haven’t even considered how the breakage situation will be when using explicit or transparent proxies…
To help users with this problem of HTTP upgrades not being feasible by default, we’ve just landed a new alternative to the “set HTTP version” that only sets HTTP/2 for HTTPS requests and leaves it to HTTP/1.1 for clear text HTTP requests. This option will ship in the next release, to be called 7.47.0, and can of course be tested out before that with git or daily snapshot builds.
Setting this option is next to risk-free, as the HTTP/2 negotiation in TLS is based on one or two TLS extensions (NPN and ALPN) that both have proper fallbacks to 1.1.
Said and done. The curl tool now sets this option. Using the curl tool on a HTTPS:// URL will from now on use HTTP/2 by default as soon as both the libcurl it uses and the server it connects to support HTTP/2!
We will of course keep our eyes and ears open to see if this causes any problems. Let us know what you see!
Using curl to perform an operation a user just managed to do with his or her browser is one of the more common requests and areas people ask for help about.
How do you get a curl command line to get a resource, just like the browser would get it, nice and easy? Both Chrome and Firefox have provided this feature for quite some time already!
You get the site shown with Firefox’s network tools. You then right-click on the specific request you want to repeat in the “Web Developer->Network” tool when you see the HTTP traffic, and in the menu that appears you select “Copy as cURL”. Like this screenshot below shows. The operation then generates a curl command line to your clipboard and you can then paste that into your favorite shell window. This feature is available by default in all Firefox installations.
When you pop up the More tools->Developer mode in Chrome, and you select the Network tab you see the HTTP traffic used to get the resources of the site. On the line of the specific resource you’re interested in, you right-click with the mouse and you select “Copy as cURL” and it’ll generate a command line for you in your clipboard. Paste that in a shell to get a curl command line that makes the transfer. This feature is available by default in all Chome and Chromium installations.
On Firefox, without using the devtools
If this is something you’d like to get done more often, you probably find using the developer tools a bit inconvenient and cumbersome to pop up just to get the command line copied. Then cliget is the perfect add-on for you as it gives you a new option in the right-click menu, so you can get a quick command line generated really quickly, like this example when I right-click an image in Firefox:
I’m the author of a brand new internet-draft that I submitted just the other day. The title is TCP Tuning for HTTP, and the intent is to gather a set of current best practices for HTTP implementers; to share and distribute knowledge we’ve gathered over the years. Clients, servers and intermediaries. For HTTP/1.1 as well as HTTP/2.
I’m now awaiting, expecting and looking forward to feedback, criticisms and additional content for this document so that it can become the resource I’d like it to be.
How to contribute to this?
- ideally, send your feedback to the HTTPbis mailing list,
- or submit an issue or pull-request on github for the draft.md
- or simply email me your comments: daniel <at> haxx.se
I’ve been participating first passively and more and more actively over the years within the IETF, mostly in the HTTPbis working group. I think open protocols and open standards are important and I like being part of making them reality. I have the utmost respect and admiration for those who are involved in putting the RFCs together and thus improve the world we live in, step by step.
For a long while I’ve been wanting to step up and “pull my weight” too, to become a better participant in this area, and I’m happy to now finally take this step. Hopefully this is just the first step of many more to come.
(Psssst: While gathering feedback and updating the git version, the current work in progress version of the draft is always visible here.)