Tag Archives: Firefox

One URL standard please

Following up on the problem with our current lack of a universal URL standard that I blogged about in May 2016: My URL isn’t your URL. I want a single, unified URL standard that we would all stand behind, support and adhere to.

What triggers me this time, is yet another issue. A friendly curl user sent me this URL:

http://user@example.com:80@daniel.haxx.se

… and pasting this URL into different tools and browsers show that there’s not a wide agreement on how this should work. Is the URL legal in the first place and if so, which host should a client contact?

  • curl treats the ‘@’-character as a separator between userinfo and host name so ‘example.com’ becomes the host name, the port number is 80 followed by rubbish that curl ignores. (wget2, the next-gen wget that’s in development works identically)
  • wget extracts the example.com host name but rejects the port number due to the rubbish after the zero.
  • Edge and Safari say the URL is invalid and don’t go anywhere
  • Firefox and Chrome allow ‘@’ as part of the userinfo, take the ’80’ as a password and the host name then becomes ‘daniel.haxx.se’

The only somewhat modern “spec” for URLs is the WHATWG URL specification. The other major, but now somewhat aged, URL spec is RFC 3986, made by the IETF and published in 2005.

In 2015, URL problem statement and directions was published as an Internet-draft by Masinter and Ruby and it brings up most of the current URL spec problems. Some of them are also discussed in Ruby’s WHATWG URL vs IETF URI post from 2014.

What I would like to see happen…

Which group? A group!

Friends I know in the WHATWG suggest that I should dig in there and help them improve their spec. That would be a good idea if fixing the WHATWG spec would be the ultimate goal. I don’t think it is enough.

The WHATWG is highly browser focused and my interactions with members of that group that I have had in the past, have shown that there is little sympathy there for non-browsers who want to deal with URLs and there is even less sympathy or interest for URL schemes that the popular browsers don’t even support or care about. URLs cover much more than HTTP(S).

I have the feeling that WHATWG people would not like this work to be done within the IETF and vice versa. Since I’d like buy-in from both camps, and any other camps that might have an interest in URLs, this would need to be handled somehow.

It would also be great to get other major URL “consumers” on board, like authors of popular URL parsing libraries, tools and components.

Such a URL group would of course have to agree on the goal and how to get there, but I’ll still provide some additional things I want to see.

Update: I want to emphasize that I do not consider the WHATWG’s job bad, wrong or lost. I think they’ve done a great job at unifying browsers’ treatment of URLs. I don’t mean to belittle that. I just know that this group is only a small subset of the people who probably should be involved in a unified URL standard.

A single fixed spec

I can’t see any compelling reasons why a URL specification couldn’t reach a stable state and get published as *the* URL standard. The “living standard” approach may be fine for certain things (and in particular browsers that update every six weeks), but URLs are supposed to be long-lived and inter-operate far into the future so they really really should not change. Therefore, I think the IETF documentation model could work well for this.

The WHATWG spec documents what browsers do, and browsers do what is documented. At least that’s the theory I’ve been told, and it causes a spinning and never-ending loop that goes against my wish.

Document the format

The WHATWG specification is written in a pseudo code style, describing how a parser would “walk” over the string with a state machine and all. I know some people like that, I find it utterly annoying and really hard to figure out what’s allowed or not. I much more prefer the regular RFC style of describing protocol syntax.

IDNA

Can we please just say that host names in URLs should be handled according to IDNA2008 (RFC 5895)? WHATWG URL doesn’t state any IDNA spec number at all.

Move out irrelevant sections

“Irrelevant” when it comes to documenting the URL format that is. The WHATWG details several things that are related to URL for browsers but are mostly irrelevant to other URL consumers or producers. Like section “5. application/x-www-form-urlencoded” and “6. API”.

They would be better placed in a “URL considerations for browsers” companion document.

Working doesn’t imply sensible

So browsers accept URLs written with thousands of forward slashes instead of two. That is not a good reason for the spec to say that a URL may legitimately contain a thousand slashes. I’m totally convinced there’s no critical content anywhere using such formatted URLs and no soul will be sad if we’d restricted the number to a single-digit. So we should. And yeah, then browsers should reject URLs using more.

The slashes are only an example. The browsers have used a “liberal in what you accept” policy for a lot of things since forever, but we must resist to use that as a basis when nailing down a standard.

The odds of this happening soon?

I know there are individuals interested in seeing the URL situation getting worked on. We’ve seen articles and internet-drafts posted on the issue several times the last few years. Any year now I think we will see some movement for real trying to fix this. I hope I will manage to participate and contribute a little from my end.

Lesser HTTPS for non-browsers

An HTTPS client needs to do a whole lot of checks to make sure that the remote host is fine to communicate with to maintain the proper high security levels.

In this blog post, I will explain why and how the entire HTTPS ecosystem relies on the browsers to be good and strict and thanks to that, the rest of the HTTPS clients can get away with being much more lenient. And in fact that is good, because the browsers don’t help the rest of the ecosystem very much to do good verification at that same level.

Let me me illustrate with some examples.

CA certs

The server’s certificate must have been signed by a trusted CA (Certificate Authority). A client then needs the certificates from all the CAs that are trusted. Who’s a trusted CA and how would a client get their certs to use for verification?

You can say that you trust the same set of CAs that your operating system vendor trusts (which I’ve always thought is a bit of a stretch but hey, I can very well understand the convenience in this). If you want to do this as an HTTPS client you need to use native APIs in Windows or macOS, or you need to figure out where the cert bundle is stored if you’re using Linux.

If you’re not using the native libraries on windows and macOS or if you can’t find the bundle in your Linux distribution, or you’re in one of a large amount of other setups where you can’t use someone else’s bundle, then you need to gather this list by yourself.

How on earth would you gather a list of hundreds of CA certs that are used for the popular web sites on the net of today? Stand on someone else’s shoulders and use what they’ve done? Yeps, and conveniently enough Mozilla has such a bundle that is licensed to allow others to use it…

Mozilla doesn’t offer the set of CA certs in a format that anyone else can use really, which is the primary reason why we offer Mozilla’s cert bundle converted to PEM format on the curl web site. The other parties that collect CA certs at scale (Microsoft for Windows, Apple for macOS, etc) do even less.

Before you ask, Google doesn’t maintain their own list for Chrome. They piggyback the CA store provided on the operating system it runs on. (Update: Google maintains its own list for Android/Chrome OS.)

Further constraints

But the browsers, including Firefox, Chrome, Edge and Safari all add additional constraints beyond that CA cert store, on what server certificates they consider to be fine and okay. They blacklist specific fingerprints, they set a last allowed date for certain CA providers to offer certificates for servers and more.

These additional constraints, or additional rules if you want, are never exported nor exposed to the world in ways that are easy for anyone to mimic (in other ways than that everyone of course can implement the same code logic in their ends). They’re done in code and they’re really hard for anyone not a browser to implement and keep up with.

This makes every non-browser HTTPS client susceptible to okaying certificates that have already been deemed not OK by security experts at the browser vendors. And in comparison, not many HTTPS clients can compare or stack up the amount of client-side TLS and security expertise that the browser developers can.

HSTS preload

HTTP Strict Transfer Security is a way for sites to tell clients that they are to be accessed over HTTPS only for a specified time into the future, and plain HTTP should then not be used for the duration of this rule. This setup removes the Man-In-The-Middle (MITM) risk on subsequent accesses for sites that may still get linked to via HTTP:// URLs or by users entering the web site names directly into the address bars and so on.

The browsers have a “HSTS preload list” which is a list of sites that people have submitted and they are HSTS sites that basically never time out and always will be accessed over HTTPS only. Forever. No risk for MITM even in the first access to these sites.

There are no such HSTS preload lists being provided for non-browser HTTPS clients so there’s no easy way for non-browsers to avoid the first access MITM even for these class of forever-on-HTTPS sites.

Update: The Chromium HSTS preload list is available in a JSON format.

SHA-1

I’m sure you’ve heard about the deprecation of SHA-1 as a certificate hashing algorithm and how the browsers won’t accept server certificates using this starting at some cut off date.

I’m not aware of any non-browser HTTPS client that makes this check. For services, API providers and others don’t serve “normal browsers” they can all continue to play SHA-1 certificates well into 2017 without tears or pain. Another ecosystem detail we rely on the browsers to fix for us since most of these providers want to work with browsers as well…

This isn’t really something that is magic or would be terribly hard for non-browsers to do, its just that it will make some users suddenly get errors for their otherwise working setups and that takes a firm attitude from the software provider that is hard to maintain. And you’d have to introduce your own cut-off date that you’d have to fight with your users about! 😉

TLS is hard to get right

TLS and HTTPS are full of tricky areas and dusty corners that are hard to get right. The more we can share tricks and rules the better it is for everyone.

I think the browser vendors could do much better to help the rest of the ecosystem. By making their meta data available to us in sensible formats mostly. For the good of the Internet.

Disclaimer

Yes I work for Mozilla which makes Firefox. A vendor and a browser that I write about above. I’ve been communicating internally about some of these issues already, but I’m otherwise not involved in those parts of Firefox.

HTTPS proxy with curl

Starting in version 7.52.0 (due to ship December 21, 2016), curl will support HTTPS proxies when doing network transfers, and by doing this it joins the small exclusive club of HTTP user-agents consisting of Firefox, Chrome and not too many others.

Yes you read this correctly. This is different than the good old HTTP proxy.

HTTPS proxy means that the client establishes a TLS connection to the proxy and then communicates over that, which is different to the normal and traditional HTTP proxy approach where the clients speak plain HTTP to the proxy.

Talking HTTPS to your proxy is a privacy improvement as it prevents people from snooping on your proxy communication. Even when using HTTPS over a standard HTTP proxy, there’s typically a setting up phase first that leaks information about where the connection is being made, user credentials and more. Not to mention that an HTTPS proxy makes HTTP traffic “safe” to and from the proxy. HTTPS to the proxy also enables clients to speak HTTP/2 more easily with proxies. (Even though HTTP/2 to the proxy is not yet supported in curl.)

In the case where a client wants to talk HTTPS to a remote server, when using a HTTPS proxy, it sends HTTPS through HTTPS.

Illustrating this concept with images. When using a traditional HTTP proxy, we connect initially to the proxy with HTTP in the clear, and then from then on the HTTPS makes it safe:

HTTP proxyto compare with the HTTPS proxy case where the connection is safe already in the first step:

HTTPS proxyThe access to the proxy is made over network A. That network has traditionally been a corporate network or within a LAN or something but we’re seeing more and more use cases where the proxy is somewhere on the Internet and then “Network A” is really huge. That includes use cases where the proxy for example compresses images or otherwise reduces bandwidth requirements.

Actual HTTPS connections from clients to servers are still done end to end encrypted even in the HTTP proxy case. HTTP traffic to and from the user to the web site however, will still be HTTPS protected to the proxy when a HTTPS proxy is used.

A complicated pull request

This awesome work was provided by Dmitry Kurochkin, Vasy Okhin, and Alex Rousskov. It was merged into master on November 24 in this commit.

Doing this sort of major change in the TLS area in curl code is a massive undertaking, much so because of the fact that curl supports getting built with one out of 11 or 12 different TLS libraries. Several of those are also system-specific so hardly any single developer can even build all these backends on his or hers own machines.

In addition to the TLS backend maze, curl and library also offers a huge amount of different options to control the TLS connection and handling. You can switch on and off features, provide certificates, CA bundles and more. Adding another layer of TLS pretty much doubles the amount of options since now you can tweak everything both in the TLS connection to the proxy as well as the one to the remote peer.

This new feature is supported with the OpenSSL, GnuTLS and NSS backends to start with.

Consider it experimental for now

By all means, go ahead and use it and torture the code and file issues for everything bad you see, but I think we make ourselves a service by considering this new feature set to be a bit experimental in this release.

New options

There’s a whole forest of new command line and libcurl options to control all the various aspects of the new TLS connection this introduces. Since it is a totally separate connection it gets a whole set of options that are basically identical to the server connection but with a –proxy prefix instead. Here’s a list:

  --proxy-cacert 
  --proxy-capath
  --proxy-cert
  --proxy-cert-type
  --proxy-ciphers
  --proxy-crlfile
  --proxy-insecure
  --proxy-key
  --proxy-key-type
  --proxy-pass
  --proxy-ssl-allow-beast
  --proxy-sslv2
  --proxy-sslv3
  --proxy-tlsv1
  --proxy-tlsuser
  --proxy-tlspassword
  --proxy-tlsauthtype

curl and TLS 1.3

Draft 18 of the TLS version 1.3 spec was publiSSL padlockshed at the end of October 2016.

Already now, both Firefox and Chrome have test versions out with TLS 1.3 enabled. Firefox 52 will have it by default, and while Chrome will ship it, I couldn’t figure out exactly when we can expect it to be there by default.

Over the last few days we’ve merged TLS 1.3 support to curl, primarily in this commit by Kamil Dudka. Both the command line tool and libcurl will negotiate TLS 1.3 in the next version (7.52.0 – planned release date at the end of December 2016) if built with a TLS library that supports it and told to do it by the user.

The two current TLS libraries that will speak TLS 1.3 when built with curl right now is NSS and BoringSSL. The plan is to gradually adjust curl over time as the other libraries start to support 1.3 as well. As always we will appreciate your help in making this happen!

Right now, there’s also the minor flux in that servers and clients may end up running implementations of different draft versions of the TLS spec which contributes to a layer of extra fun!

Three TLS current 1.3 test servers to play with: https://enabled.tls13.com/ , https://www.allizom.org/ and https://tls13.crypto.mozilla.org/. I doubt any of these will give you any guarantees of functionality.

TLS 1.3 offers a few new features that allow clients such as curl to do subsequent TLS connections much faster, with only 1 or even 0 RTTs, but curl has no code for any of those features yet.

1,000,000 sites run HTTP/2

… out of the top ten million sites that is. So there’s at least that many, quite likely a few more.

This is according to w3techs who runs checks daily. Over the last few months, there have been about 50,000 new sites per month switching it on.

ht2-10-percent

It also shows that the HTTP/2 ratio has increased from a little over 1% deployment a year ago to the 10% today.

HTTP/2 gets more used the more  popular site it is. Among the top 1,000 sites on the web, more than 20% of them use HTTP/2. HTTP/2 also just recently (September 9) overcame SPDY among the top-1000 most popular sites.

h2-sep28

On September 7, Amazon announced their CloudFront service having enabled HTTP/2, which could explain an adoption boost over the last few days. New CloudFront users get it enabled by default but existing users actually need to go in and click a checkbox to make it happen.

As the web traffic of the world is severely skewed toward the top ones, we can be sure that a significantly larger share than 10% of the world’s HTTPS traffic is using version 2.

Recent usage stats in Firefox shows that HTTP/2 is used in half of all its HTTPS requests!

http2

HTTP/2 connection coalescing

Section 9.1.1 in RFC7540 explains how HTTP/2 clients can reuse connections. This is my lengthy way of explaining how this works in reality.

Many connections in HTTP/1

With HTTP/1.1, browsers are typically using 6 connections per origin (host name + port). They do this to overcome the problems in HTTP/1 and how it uses TCP – as each connection will do a fair amount of waiting. Plus each connection is slow at start and therefore limited to how much data you can get and send quickly, you multiply that data amount with each additional connection. This makes the browser get more data faster (than just using one connection).

6 connections

Add sharding

Web sites with many objects also regularly invent new host names to trigger browsers to use even more connections. A practice known as “sharding”. 6 connections for each name. So if you instead make your site use 4 host names you suddenly get 4 x 6 = 24 connections instead. Mostly all those host names resolve to the same IP address in the end anyway, or the same set of IP addresses. In reality, some sites use many more than just 4 host names.

24 connections

The sad reality is that a very large percentage of connections used for HTTP/1.1 are only ever used for a single HTTP request, and a very large share of the connections made for HTTP/1 are so short-lived they actually never leave the slow start period before they’re killed off again. Not really ideal.

One connection in HTTP/2

With the introduction of HTTP/2, the HTTP clients of the world are going toward using a single TCP connection for each origin. The idea being that one connection is better in packet loss scenarios, it makes priorities/dependencies work and reusing that single connections for many more requests will be a net gain. And as you remember, HTTP/2 allows many logical streams in parallel over that single connection so the single connection doesn’t limit what the browsers can ask for.

Unsharding

The sites that created all those additional host names to make the HTTP/1 browsers use many connections now work against the HTTP/2 browsers’ desire to decrease the number of connections to a single one. Sites don’t want to switch back to using a single host name because that would be a significant architectural change and there are still a fair number of HTTP/1-only browsers still in use.

Enter “connection coalescing”, or “unsharding” as we sometimes like to call it. You won’t find either term used in RFC7540, as it merely describes this concept in terms of connection reuse.

Connection coalescing means that the browser tries to determine which of the remote hosts that it can reach over the same TCP connection. The different browsers have slightly different heuristics here and some don’t do it at all, but let me try to explain how they work – as far as I know and at this point in time.

Coalescing by example

Let’s say that this cool imaginary site “example.com” has two name entries in DNS: A.example.com and B.example.com. When resolving those names over DNS, the client gets a list of IP address back for each name. A list that very well may contain a mix of IPv4 and IPv6 addresses. One list for each name.

You must also remember that HTTP/2 is also only ever used over HTTPS by browsers, so for each origin speaking HTTP/2 there’s also a corresponding server certificate with a list of names or a wildcard pattern for which that server is authorized to respond for.

In our example we start out by connecting the browser to A. Let’s say resolving A returns the IPs 192.168.0.1 and 192.168.0.2 from DNS, so the browser goes on and connects to the first of those addresses, the one ending with “1”. The browser gets the server cert back in the TLS handshake and as a result of that, it also gets a list of host names the server can deal with: A.example.com and B.example.com. (it could also be a wildcard like “*.example.com”)

If the browser then wants to connect to B, it’ll resolve that host name too to a list of IPs. Let’s say 192.168.0.2 and 192.168.0.3 here.

Host A: 192.168.0.1 and 192.168.0.2
Host B: 192.168.0.2 and 192.168.0.3

Now hold it. Here it comes.

The Firefox way

Host A has two addresses, host B has two addresses. The lists of addresses are not the same, but there is an overlap – both lists contain 192.168.0.2. And the host A has already stated that it is authoritative for B as well. In this situation, Firefox will not make a second connect to host B. It will reuse the connection to host A and ask for host B’s content over that single shared connection. This is the most aggressive coalescing method in use.

one connection

The Chrome way

Chrome features a slightly less aggressive coalescing. In the example above, when the browser has connected to 192.168.0.1 for the first host name, Chrome will require that the IPs for host B contains that specific IP for it to reuse that connection.  If the returned IPs for host B really are 192.168.0.2 and 192.168.0.3, it clearly doesn’t contain 192.168.0.1 and so Chrome will create a new connection to host B.

Chrome will reuse the connection to host A if resolving host B returns a list that contains the specific IP of the connection host A is already using.

The Edge and Safari ways

They don’t do coalescing at all, so each host name will get its own single connection. Better than the 6 connections from HTTP/1 but for very sharded sites that means a lot of connections even in the HTTP/2 case.

curl also doesn’t coalesce anything (yet).

Surprises and a way to mitigate them

Given some comments in the Firefox bugzilla, the aggressive coalescing sometimes causes some surprises. Especially when you have for example one IPv6-only host A and a second host B with both IPv4 and IPv4 addresses. Asking for data on host A can then still use IPv4 when it reuses a connection to B (assuming that host A covers host B in its cert).

In the rare case where a server gets a resource request for an authority (or scheme) it can’t serve, there’s a dedicated error code 421 in HTTP/2 that it can respond with and the browser can then  go back and retry that request on another connection.

Starts out with 6 anyway

Before the browser knows that the server speaks HTTP/2, it may fire up 6 connection attempts so that it is prepared to get the remote site at full speed. Once it figures out that it doesn’t need all those connections, it will kill off the unnecessary unused ones and over time trickle down to one. Of course, on subsequent connections to the same origin the client may have the version information cached so that it doesn’t have to start off presuming HTTP/1.

A workshop Monday

http workshopI decided I’d show up a little early at the Sheraton as I’ve been handling the interactions with hotel locally here in Stockholm where the workshop will run for the coming three days. Things were on track, if we ignore how they got the wrong name of the workshop on the info screens in the lobby, instead saying “Haxx Ab”…

Mark welcomed us with a quick overview of what we’re here for and quick run-through of the rough planning for the days. Our schedule is deliberately loose and open to allow for changes and adaptations as we go along.

Patrick talked about the 1 1/2 years of HTTP/2 working in Firefox so far, and we discussed a lot around the numbers and telemetry. What do they mean and why do they look like this etc. HTTP/2 is now at 44% of all HTTPS requests and connections using HTTP/2 are used for more than 8 requests on median (compared to slightly over 1 in the HTTP/1 case). What’s almost not used at all? HTTP/2 server push, Alt-Svc and HTTP 308 responses. Patrick’s presentation triggered a lot of good discussions. His slides are here.

RTT distribution for Firefox running on desktop and mobile, from Patrick’s slide set:

rtt-dist

The lunch was lovely.

Vlad then continued to talk about experiences from implementing and providing server push at Cloudflare. It and the associated discussions helped emphasize that we need better help for users on how to use server push and there might be reasons for browsers to change how they are stored in the current “secondary cache”. Also, discussions around how to access pushed resources and get information about pushes from javascript were briefly touched on.

After a break with some sweets and coffee, Kazuho continued to describe cache digests and how this concept can help making servers do better or more accurate server pushes. Back to more discussions around push and what it actually solved, how much complexity it is worth and so on. I thought I could sense hesitation in the room on whether this is really something to proceed with.

We intend to have a set of lightning talks after lunch each day and we have already have twelve such suggested talks listed in the workshop wiki, but the discussions were so lively and extensive that we missed them today and we even had to postpone the last talk of today until tomorrow. I can already sense how these three days will not be enough for us to cover everything we have listed and planned…

We ended the evening with a great dinner sponsored by Mozilla. I’d say it was a great first day. I’m looking forward to day 2!

HTTP Workshop 2016, day -1

http workshop The HTTP Workshop 2016 will take place in Stockholm starting tomorrow Monday, as I’ve mentioned before. Today we’ll start off slowly by having a few pre workshop drinks and say hello to old and new friends.

I did a casual count, and out of the 40 attendees coming, I believe slightly less than half are newcomers that didn’t attend the workshop last year. We’ll see browser people come, more independent HTTP implementers, CDN representatives, server and intermediary developers as well as some friends from large HTTP operators/sites. I personally view my attendance to be primarily with my curl hat on rather than my Firefox one. Firmly standing in the client side trenches anyway.

Visitors to Stockholm these days are also lucky enough to arrive when the weather is possibly as good as it can get here with the warmest period through the summer so far with lots of sun and really long bright summer days.

News this year includes the @http_workshop twitter account. If you have questions or concerns for HTTP workshoppers, do send them that way and they might get addressed or at least noticed.

I’ll try to take notes and post summaries of each workshop day here. Of course I will fully respect our conference rules about what to reveal or not.

stockholm castle and ship

HTTP/2 in April 2016

On April 12 I had the pleasure of doing another talk in the Google Tech Talk series arranged in the Google Stockholm offices. I had given it the title “HTTP/2 is upon us, and here’s what you need to know about it.” in the invitation.

The room seated 70 persons but we had the amazing amount of over 300 people in the waiting line who unfortunately didn’t manage to get a seat. To those, and to anyone else who cares, here’s the video recording of the event.

If you’ve seen me talk about HTTP/2 before, you might notice that I’ve refreshed the material somewhat since before.