Category Archives: Work

Work stuff

How to DoH-only with Firefox

Firefox supports DNS-over-HTTPS (aka DoH) since version 62.

You can instruct your Firefox to only use DoH and never fall-back and try the native resolver; the mode we call trr-only. Without any other ability to resolve host names, this is a little tricky so this guide is here to help you. (This situation might improve in the future.)

In trr-only mode, nobody on your local network nor on your ISP can snoop on your name resolves. The SNI part of HTTPS connections are still clear text though, so eavesdroppers on path can still figure out which hosts you connect to.

There's a name in my URI

A primary problem for trr-only is that we usually want to use a host name in the URI for the DoH server (we typically need it to be a name so that we can verify the server's certificate against it), but we can't resolve that host name until DoH is setup to work. A catch-22.

There are currently two ways around this problem:

  1. Tell Firefox the IP address of the name that you use in the URI. We call it the "bootstrapAddress". See further below.
  2. Use a DoH server that is provided on an IP-number URI. This is rather unusual. There's for example one at 1.1.1.1.

Setup and use trr-only

There are three prefs to focus on (they're all explained elsewhere):

network.trr.mode - set this to the number 3.

network.trr.uri - set this to the URI of the DoH server you want to use. This should be a server you trust and want to hand over your name resolves to. The Cloudflare one we've previously used in DoH tests with Firefox is https://mozilla.cloudflare-dns.com/dns-query.

network.trr.bootstrapAddress- when you use a host name in the URI for the network.trr.uri pref you must set this pref to an IP address that host name resolves to for you. It is important that you pick an IP address that the name you use actually would resolve to.

Example

Let's pretend you want to go full trr-only and use a DoH server at https://example.com/dns. (it's a pretend URI, it doesn't work).

Figure out the bootstrapAddress with dig. Resolve the host name from the URI:

$ dig +short example.com
93.184.216.34

or if you prefer to be classy and use the IPv6 address (only do this if IPv6 is actually working for you)

$ dig -t AAAA +short example.com
2606:2800:220:1:248:1893:25c8:1946

dig might give you a whole list of addresses back, and then you can pick any one of them in the list. Only pick one address though.

Go to "about:config" and paste the copied IP address into the value field for network.trr.bootstrapAddress. Now TRR / DoH should be able to get going. When you can see web pages, you know it works!

DoH-only means only DoH

If you happen to start Firefox behind a captive portal while in trr-only mode, the connections to the DoH server will fail and no name resolves can be performed.

In those situations, normally Firefox's captive portable detector would trigger and show you the login page etc, but when no names can be resolved and the captive portal can't respond with a fake response to the name lookup and redirect you to the login, it won't get anywhere. It gets stuck. And currently, there's no good visual indication anywhere that this is what happens.

You simply can't get out of a captive portal with trr-only. You probably then temporarily switch mode, login to the portal and switch the mode to 3 again.

If you "unlock" the captive portal with another browser/system, Firefox's regular retries while in trr-only will soon detect that and things should start working again.

administrative purgatory

 your case is still going through administrative processing and we don't know when that process will be completed.

Last year I was denied to go to the US when I was about to travel to San Francisco. Me and my employer's legal team never got answers as to why this happened so I've personally tried to convince myself it was all because of some human screw-up. Because why would they suddenly block me? I've traveled to the US almost a dozen times over the years.

The fact that there was no reason or explanation given makes any theory as likely as the next. Whatever we think or guess might have happened can be true. Or not. We will probably never know. And I've been told a lot of different theories.

Denied again

In early April 2018 I applied for ESTA again to go to San Francisco in mid June for another Mozilla All Hands conference and... got denied. The craziness continues. This also ruled out some of the theories from last year that it was just some human error by the airline or similar...

As seen on the screenshot, this decision has no expire date... While they don't provide any motivation for not accepting me, this result makes it perfectly clear that it wasn't just a mistake last year. It makes me view last year with different eyes.

Put in this situation, I activated plan B.

Plan B

I then applied for a "real" non-immigrant visa - even though it feels that having been denied ESTA probably puts me in a disadvantage for that as well. Applying for this visa means filling in a 10-something-page "DS-160" form online on a site that sometimes takes minutes just to display the next page in the form where they ask for a lot of personal details. After finally having conquered that obstacle, I paid the 160 USD fee and scheduled an appointment to appear physically at the US embassy in Sweden.

I acquired an "extraction of the population register" ("personbevis" in Swedish) from the Swedish tax authorities - as required (including personal details of my parents and siblings), I got myself a new mugshot printed on photo paper and was lucky enough to find a date for an appointment not too far into the future.

Appointment

I spent the better part of a fine Tuesday morning in different waiting lines at my local US embassy where I eventually was called up to a man at a counter behind a window. I was fingerprinted, handed over my papers and told the clerk I have no idea why I was denied ESTA when asked, and no, I have not been on vacation in Iraq, Iran or Sudan. The clerk game me the impression that's the sort of thing that is the common reason for not getting ESTA.

When I answered the interviewer's question that I work for Mozilla, he responded "Aha, Firefox?" - which brightened up my moment a little.

Apparently the process is then supposed to take "several weeks" until I get to know anything more. I explained that I needed my passport in three weeks (for another trip) and he said he didn't expect them to be done that quickly.  Therefore I got the passport back while they process my application and I'm expected to mail it to them when they ask for it.

The next form

When I got back home again, I got an email from "the visa unit" asking me to fill in another form (in the shape of a Word document). And what a form it is! It might be called "OMB 1405-0226" and has this fancy title:

"SUPPLEMENTAL QUESTIONS FOR VISA APPLICANTS"

Among other things it requires me to provide info about all trips abroad (with dates and duration) I've done over the last 15 years. What aliases I use on social media sites (hello mr US visa agent, how do you like this post so far?), every physical address I've lived at in the last 15 years, information about all my employers the last 15 years and every email address I've used during the last 5 years.

It took me many hours digging through old calendars, archives and memories and asking around in order to fill this in properly. ("hey that company trip we did to Germany back in 2005, can you remember the dates?") As a side-note: it turns out I've been in the US no less than nine times the last fifteen years. In total I managed to list sixty-five different trips abroad for this period.

How do I submit my filled-in form, with all these specific and very private details from my life for the last 15 years, back to "the visa unit"? By email. Good old insecure, easy to snoop on, email! At least I'm using my own mail server (and it is configured to prefer TLS for connections) but that's a small comfort.

Is it worth it?

This is a very time and energy consuming process - I understand why this puts people off and simply make them decide its not worth it to go there. And of course I understand that I'm in a lucky position where I've not had to deal with this much in the past.

I have many friends and contacts in the US in both my personal and professional life. I would be sad if I couldn't go there ever again. It would give me grief personally since it'll limit where I can go on vacation and who out of my friends I can visit, but it will also limit my professional life as interesting Mozilla, Internet, open source and curl related events that I'd like to attend are frequently hosted there.

What's happening?

So the weeks came and went and on May 29th,  six weeks after I was interviewed at the embassy, I checked the online service that allows me to check my application progress. It said "Case Created: April 17" and the following useful addition "Case Last Updated: April 17".

Wat? Did something go fatally wrong here? I emailed the embassy to double-check. I got this single sentence response back:

Dear Sir,

You don't have to do anything, your case is still going through administrative processing and we don't know when that process will be completed.

In my life I've visited a whole series of countries for which I've been required to apply for a visa. None of them have ever taken more than a few weeks, including countries with complicated bureaucracy like India and China. What are they doing all this time?

At the time of this writing, more than 100 days have passed and I have still not heard back from them. I know this is unusually long and I have a strong suspicion this means they will deny me visa, but for some reason they want to keep me unaware for a while more.

No All Hands in the US

I clearly underestimated the time this required so I missed our meeting in SF this year again...

Mozilla has since then announced that a number of the forthcoming All Hands conferences in the coming years will be held outside of the US. Unfortunately several of them are to be held in Canada, and there are indications that having being denied entry to the US means that Canada will deny me as well. But I have yet to test that!

Why they deny me?

Me knowingly, I've never broken a law, rule or regulation that would explain this. Some speculations me and others can think of include...

  1. I'm the main author of curl, a tool that is used in a lot of security research and proof of concept exploits of security vulnerabilities
  2. I'm the main author of libcurl, a transfer library that is one of the world's most widely used software components. It is subsequently also used extensively by malware and other offensive and undesired software.
  3. I use the name haxx.se domain for many of my sites and email address etc. haxx or hacking could be interpreted by some, not as "To program a computer in a clever, virtuosic, and wizardly manner" but as the act to "gain unauthorized access to data in a system or computer".
  4. It's been suggested that my presence at multiple conferences in the US over the years could've been a violation of the ESTA rules - but the rules explicitly allow this. I have not violated the ESTA rules.

Administrative Processing

It's been 102 days now. I'm not optimistic.

quic wg interim Kista

The IETF QUIC working group had its fifth interim meeting the other day, this time in Kista, Sweden hosted by Ericsson. For me as a Stockholm resident, this was ridiculously convenient. Not entirely coincidentally, this was also the first quic interim I attended in person.

We were 30 something persons gathered in a room without windows, with another dozen or so participants joining from remote. This being a meeting in a series, most people already know each other from before so the atmosphere was relaxed and friendly. Lots of the participants have also been involved in other protocol developments and standards before. Many familiar faces.

Schedule

As QUIC is supposed to be done "soon", the emphasis is now a lot to close issues, postpone some stuff to "QUICv2" and make sure to get decisions on outstanding question marks.

Kazuho did a quick run-through with some info from the interop days prior to the meeting.

After MT's initial explanation of where we're at for the upcoming draft-13, Ian took us a on a deep dive into the Stream 0 Design Team report. This is a pretty radical change of how the wire format of the quic protocol, and how the TLS is being handled.

The existing draft-12 approach...

Is suggested to instead become...

What's perhaps the most interesting take away here is that the new format doesn't use TLS records anymore - but simplifies a lot of other things. Not using TLS records but still doing TLS means that a QUIC implementation needs to get data from the TLS layer using APIs that existing TLS libraries don't typically provide. PicoTLS, Minq, BoringSSL. NSS already have or will soon provide the necessary APIs. Slightly behind, OpenSSL should offer it in a nightly build soon but the impression is that it is still a bit away from an actual OpenSSL release.

EKR continued the theme. He talked about the quic handshake flow and among other things explained how 0-RTT and early data works. Taken from that context, I consider this slide (shown below) fairly funny because it makes it look far from simple to me. But it shows communication in different layers, and how the acks go, etc.

HTTP

Mike then presented the state of HTTP over quic. The frames are no longer that similar to the HTTP/2 versions. Work is done to ensure that the HTTP layer doesn't need to refer or "grab" stream IDs from the transport layer.

There was a rather lengthy discussion around how to handle "placeholder streams" like the ones Firefox uses over HTTP/2 to create "anchors" on which to make dependencies but are never actually used over the wire. The nature of the quic transport makes those impractical and we talked about what alternatives there are that could still offer similar functionality.

The subject of priorities and dependencies and if the relative complexity of the h2 model should be replaced by something simpler came up (again) but was ultimately pushed aside.

QPACK

Alan presented the state of QPACK, the HTTP header compression algorithm for hq (HTTP over QUIC). It is not wire compatible with HPACK anymore and there have been some recent improvements and clarifications done.

Alan also did a great step-by-step walk-through how QPACK works with adding headers to the dynamic table and how it works with its indices etc. It was very clarifying I thought.

The discussion about the static table for the compression basically ended with us agreeing that we should just agree on a fairly small fixed table without a way to negotiate the table. Mark said he'd try to get some updated header data from some server deployments to get another data set than just the one from WPT (which is from a single browser).

Interop-testing of QPACK implementations can be done by encode  + shuffle + decode a HAR file and compare the results with the source data. Just do it - and talk to Alan!

And the first day was over. A fully packed day.

ECN

Magnus started off with some heavy stuff talking Explicit Congestion Notification in QUIC and it how it is intended to work and some remaining issues.

He also got into the subject of ACK frequency and how the current model isn't ideal in every situation, causing to work like this image below (from Magnus' slide set):

Interestingly, it turned out that several of the implementers already basically had implemented Magnus' proposal of changing the max delay to min(RTT/4, 25 ms) independently of each other!

mvfst deployment

Subodh took us on a journey with some great insights from Facebook's deployment of mvfast internally, their QUIC implementation. Getting some real-life feedback is useful and with over 100 billion requests/day, it seems they did give this a good run.

Since their usage and stack for this is a bit use case specific I'm not sure how relevant or universal their performance numbers are. They showed roughly the same CPU and memory use, with a 70% RPS rate compared to h2 over TLS 1.2.

He also entertained us with some "fun issues" from bugs and debugging sessions they've done and learned from. Awesome.

The story highlights the need for more tooling around QUIC to help developers and deployers.

Load balancers

Martin talked about load balancers and servers, and how they could or should communicate to work correctly with routing and connection IDs.

The room didn't seem overly thrilled about this work and mostly offered other ways to achieve the same results.

Implicit Open

During the last session for the day and the entire meeting, was mt going through a few things that still needed discussion or closure. On stateless reset and the rather big bike shed issue: implicit open. The later being the question if opening a stream with ID N + 1 implicitly also opens the stream with ID N. I believe we ended with a slight preference to the implicit approach and this will be taken to the list for a consensus call.

Frame type extensibility

How should the QUIC protocol allow extensibility? The oldest still open issue in the project can be solved or satisfied in numerous different ways and the discussion waved back and forth for a while, debating various approaches merits and downsides until the group more or less agreed on a fairly simple and straight forward approach where the extensions will announce support for a feature which then may or may involve one or more new frame types (to be in a registry).

We proceeded to discuss other issues all until "closing time", which was set to be 16:00 today. This was just two days of pushing forward but still it felt quite intense and my personal impression is that there were a lot of good progress made here that took the protocol a good step forward.

The facilities were lovely and Ericsson was a great host for us. The Thursday afternoon cakes were great! Thank you!

Coming up

There's an IETF meeting in Montreal in July and there's a planned next QUIC interim probably in New York in September.

Inside Firefox’s DOH engine

DNS over HTTPS (DOH) is a feature where a client shortcuts the standard native resolver and instead asks a dedicated DOH server to resolve names.

Compared to regular unprotected DNS lookups done over UDP or TCP, DOH increases privacy, security and sometimes even performance. It also makes it easy to use a name server of your choice for a particular application instead of the one configured globally (often by someone else) for your entire system.

DNS over HTTPS is quite simply the same regular DNS packets (RFC 1035 style) normally sent in clear-text over UDP or TCP but instead sent with HTTPS requests. Your typical DNS server provider (like your ISP) might not support this yet.

To get the finer details of this concept, check out Lin Clark's awesome cartoon explanation of DNS and DOH.

This new Firefox feature is planned to get ready and ship in Firefox release 62 (early September 2018). You can test it already now in Firefox Nightly by setting preferences manually as described below.

This article will explain some of the tweaks, inner details and the finer workings of the Firefox TRR implementation (TRR == Trusted Recursive Resolver) that speaks DOH.

Preferences

All preferences (go to "about:config") for this functionality are located under the "network.trr" prefix.

network.trr.mode - set which resolver mode you want.

0 - Off (default). use standard native resolving only (don't use TRR at all)
1 - Race native against TRR. Do them both in parallel and go with the one that returns a result first.
2 - TRR first. Use TRR first, and only if the name resolve fails use the native resolver as a fallback.
3 - TRR only. Only use TRR. Never use the native (after the initial setup).
4 - Shadow mode. Runs the TRR resolves in parallel with the native for timing and measurements but uses only the native resolver results.
5 - Explicitly off. Also off, but selected off by choice and not default.

network.trr.uri - (default: none) set the URI for your DOH server. That's the URL Firefox will issue its HTTP request to. It must be a HTTPS URL (non-HTTPS URIs will simply be ignored). If "useGET" is enabled, Firefox will append "?ct&dns=...." to the URI when it makes its HTTP requests. For the default POST requests, they will be issued to exactly the specified URI.

"mode" and "uri" are the only two prefs required to set to activate TRR. The rest of them listed below are for tweaking behavior.

We list some publicly known DOH servers here. If you prefer to, it is easy to setup and run your own.

network.trr.credentials - (default: none) set credentials that will be used in the HTTP requests to the DOH end-point. It is the right side content, the value, sent in the Authorization: request header. Handy if you for example want to run your own public server and yet limit who can use it.

network.trr.wait-for-portal - (default: true) this boolean tells Firefox to first wait for the captive portal detection to signal "okay" before TRR is used.

network.trr.allow-rfc1918 - (default: false) set this to true to allow RFC 1918 private addresses in TRR responses. When set false, any such response will be considered a wrong response that won't be used.

network.trr.useGET - (default: false) When the browser issues a request to the DOH server to resolve host names, it can do that using POST or GET. By default Firefox will use POST, but by toggling this you can enforce GET to be used instead. The DOH spec says a server MUST support both methods.

network.trr.confirmationNS - (default: example.com) At startup, Firefox will first check an NS entry to verify that TRR works, before it gets enabled for real and used for name resolves. This preference sets which domain to check. The verification only checks for a positive answer, it doesn't actually care what the response data says.

network.trr.bootstrapAddress - (default: none) by setting this field to the IP address of the host name used in "network.trr.uri", you can bypass using the system native resolver for it. This avoids that initial (native) name resolve for the host name mentioned in the network.trr.uri pref.

network.trr.blacklist-duration - (default: 60) is the number of seconds a name will be kept in the TRR blacklist until it expires and can be tried again. The default duration is one minute. (Update: this has been cut down from previous longer defaults.)

network.trr.request-timeout - (default: 3000) is the number of milliseconds a request to and corresponding response from the DOH server is allowed to spend until considered failed and discarded.

network.trr.early-AAAA (default: false) For each normal name resolve, Firefox issues one HTTP request for A entries and another for AAAA entries. The responses come back separately and can come in any order. If the A records arrive first, Firefox will - as an optimization - continue and use those addresses without waiting for the second response. If the AAAA records arrive first, Firefox will only continue and use them immediately if this option is set to true.

Split-horizon and blacklist

With regular DNS, it is common to have clients in different places get different results back. This can be done since the servers know from where the request comes (which also enables quite a degree of spying) and they can then respond accordingly. When switching to another resolver with TRR, you may experience that you don't always get the same set of addresses back. At times, this causes problems.

As a precaution, Firefox features a system that detects if a name can't be resolved at all with TRR and can then fall back and try again with just the native resolver (the so called TRR-first mode). Ending up in this scenario is of course slower and leaks the name over clear-text UDP but this safety mechanism exists to avoid users risking ending up in a black hole where certain sites can't be accessed. Names that causes such TRR failures are then put in an internal dynamic blacklist so that subsequent uses of that name automatically avoids using DNS-over-HTTPS for a while (see the blacklist-duration pref to control that period). Of course this fall-back is not in use if TRR-only mode is selected.

In addition, if a host's address is retrieved via TRR and Firefox subsequently fails to connect to that host, it will redo the resolve without DOH and retry the connect again just to make sure that it wasn't a split-horizon situation that caused the problem.

When a host name is added to the TRR blacklist, its domain also gets checked in the background to see if that whole domain perhaps should be blacklisted to ensure a smoother ride going forward.

Additionally, "localhost" and all names in the ".local" TLD are sort of hard-coded as blacklisted and will never be resolved with TRR. (Unless you run TRR-only...)

TTL as a bonus!

With the implementation of DNS-over-HTTPS, Firefox now gets the TTL (Time To Live, how long a record is valid) value for each DNS address record and can store and use that for expiry time in its internal DNS cache. Having accurate lifetimes improves the cache as it then knows exactly how long the name is meant to work and means less guessing and heuristics.

When using the native name resolver functions, this time-to-live data is normally not provided and Firefox does in fact not use the TTL on other platforms than Windows and on Windows it has to perform some rather awkward quirks to get the TTL from DNS for each record.

Server push

Still left to see how useful this will become in real-life, but DOH servers can push new or updated DNS records to Firefox. HTTP/2 Server Push being responses to requests the client didn't send but the server thinks the client might appreciate anyway as if it sent requests for those resources.

These pushed DNS records will be treated as regular name resolve responses and feed the Firefox in-memory DNS cache, making subsequent resolves of those names to happen instantly.

Bootstrap

You specify the DOH service as a full URI with a name that needs to be resolved, and in a cold start Firefox won't know the IP address of that name and thus needs to resolve it first (or use the provided address you can set with network.trr.bootstrapAddress). Firefox will then use the native resolver for that, until TRR has proven itself to work by resolving the network.trr.confirmationNS test domain. Firefox will also by default wait for the captive portal check to signal "OK" before it uses TRR, unless you tell it otherwise.

As a result of this bootstrap procedure, and if you're not in TRR-only mode, you might still get  a few native name resolves done at initial Firefox startups. Just telling you this so you don't panic if you see a few show up.

CNAME

The code is aware of CNAME records and will "chase" them down and use the final A/AAAA entry with its TTL as if there were no CNAMEs present and store that in the in-memory DNS cache. This initial approach, at least, does not cache the intermediate CNAMEs nor does it care about the CNAME TTL values.

Firefox currently allows no more than 64(!) levels of CNAME redirections.

about:networking

Enter that address in the Firefox URL bar to reach the debug screen with a bunch of networking information. If you then click the DNS entry in the left menu, you'll get to see the contents of Firefox's in-memory DNS cache. The TRR column says true or false for each name if that was resolved using TRR or not. If it wasn't, the native resolver was used instead for that name.

Private Browsing

When in private browsing mode, DOH behaves similar to regular name resolves: it keeps DNS cache entries separately from the regular ones and the TRR blacklist is then only kept in memory and not persisted to disk. The DNS cache is flushed when the last PB session is exited.

Tools

I wrote up dns2doh, a little tool to create DOH requests and responses with, that can be used to build your own toy server with and to generate requests to send with curl or similar.

It allows you to manually issue a type A (regular IPv4 address) DOH request like this:

$ dns2doh --A --onlyq --raw daniel.haxx.se | \
curl --data-binary @- \
https://dns.cloudflare.com/.well-known/dns \
-H "Content-Type: application/dns-udpwireformat"

I also wrote doh, which is a small stand-alone tool (based on libcurl) that issues requests for the A and AAAA records of a given host name from the given DOH URI.

Why HTTPS

Some people giggle and think of this as a massive layer violation. Maybe it is, but doing DNS over HTTPS makes a lot of sense compared to for example using plain TLS:

  1. We get transparent and proxy support "for free"
  2. We get multiplexing and the use of persistent connections from the get go (this can be supported by DNS-over-TLS too, depending on the implementation)
  3. Server push is a potential real performance booster
  4. Browsers often already have a lot of existing HTTPS connections to the same CDNs that might offer DOH.

Further explained in Patrick Mcmanus' The Benefits of HTTPS for DNS.

It still leaks the SNI!

Yes, the Server Name Indication field in the TLS handshake is still clear-text, but we hope to address that as well in the future with efforts like encrypted SNI.

Bugs?

File bug reports in Bugzilla! (in "Core->Networking:DNS" please)

If you want to enable HTTP logging and see what TRR is doing, set the environment variable MOZ_LOG component and level to "nsHostResolver:5". The TRR implementation source code in Firefox lives in netwerk/dns.

Caveats

Credits

While I have written most of the Firefox TRR implementation, I've been greatly assisted by Patrick Mcmanus. Valentin Gosu, Nick Hurley and others in the Firefox Necko team.

DOH in curl?

Since I am also the lead developer of curl people have asked. The work on DOH for curl has not really started yet, but I've collected some thoughts on how DNS-over-HTTPS could be implemented in curl and the doh tool I mentioned above has the basic function blocks already written.

Other efforts to enhance DNS security

There have been other DNS-over-HTTPS protocols and efforts. Recently there was one offered by at least Google that was a JSON style API. That's different.

There's also DNS-over-TLS which shares some of the DOH characteristics, but lacks for example the nice ability to work through proxies, do multiplexing and share existing connections with standard web traffic.

DNScrypt is an older effort that encrypts regular DNS packets and sends them over UDP or TCP.

Firefox Quantum

Next week, Mozilla will release Firefox 57. Also referred to as Firefox Quantum, from the project name we've used for all the work that has been put into making this the most awesome Firefox release ever. This is underscored by the fact that I've gotten mailed release-swag for the first time during my four years so far as a Mozilla employee.

Firefox 57 is the major milestone hundreds of engineers have worked really hard toward during the last year or so, and most of the efforts have been focused on performance. Or perhaps perceived end user snappiness. Early comments I've read and heard also hints that it is also quite notable. I think every single Mozilla engineer (and most non-engineers as well) has contributed to at least some parts of this, and of course many have done a lot. My personal contributions to 57 are not much to write home about, but are mostly a stream of minor things that combined at least move the notch forward.

[edited out some secrets I accidentally leaked here.] I'm a proud Mozillian and being part of a crowd that has put together something as grand as Firefox 57 is an honor and a privilege.

Releasing a product to hundreds of millions of end users across the world is interesting. People get accustomed to things, get emotional and don't particularly like change very much. I'm sure Firefox 57 will also get a fair share of sour feedback and comments written in uppercase. That's inevitable. But sometimes, in order to move forward and do good stuff, we have to make some tough decisions for the greater good that not everyone will agree with.

This is however not the end of anything. It is rather the beginning of a new Firefox. The work on future releases goes on, we will continue to improve the web experience for users all over the world. Firefox 58 will have even more goodies, and I know there are much more good stuff planned for the releases coming in 2018 too...

Onwards and upwards!

(Update: as I feared in this text, I got a lot of negativism, vitriol and criticism in the comments to this post. So much that I decided to close down comments for this entry and delete the worst entries.)

Denied entry

 - Sorry, you're not allowed entry to the US on your ESTA.

The lady who delivered this message to me this early Monday morning, worked behind the check-in counter at the Arlanda airport. I was there, trying to check-in to my two-leg trip to San Francisco to the Mozilla "all hands" meeting of the summer of 2017. My chance for a while ahead to meet up with colleagues from all around the world.

This short message prevented me from embarking on one journey, but instead took me on another.

Returning home

I was in a bit of a shock by this treatment really. I mean, I wasn't treated particularly bad or anything but just the fact that they downright refused to take me on for unspecified reasons wasn't easy to swallow. I sat down for a few moments trying to gather my thoughts on what to do next. I then sent a few tweets out expressing my deep disappointment for what happened, emailed my manager and some others at Mozilla about what happened and that I can't come to the meeting and then finally walked out the door again and traveled back home.

This tweet sums up what I felt at the time:

Then the flood

That Monday passed with some casual conversations with people of what I had experienced, and then...

Someone posted to hacker news about me. That post quickly rose to the top position and it began. My twitter feed suddenly got all crazy with people following me and retweeting my rejection tweets from yesterday. Several well-followed people retweeted me and that caused even more new followers and replies.

By the end of the Tuesday, I had about 2000 new followers and twitter notifications that literally were flying by at a high speed.

I was contacted by writers and reporters. The German Linux Magazine was first out to post about me, and then golem.de did the same. I talked to Kate Conger on Gizmodo who wrote Mozilla Employee Denied Entry to the United States. The Register wrote about me. I was for a moment considered for a TV interview, but I think they realized that we had too little facts to actually know why I was denied so maybe it wasn't really that TV newsworthy.

These articles of course helped boosting my twitter traffic even more.

In the flood of responses, the vast majority were positive and supportive of me. Lots of people highlighted the role of curl and acknowledged that my role in that project has been beneficial for quite a number of internet related software in the world. A whole bunch of the responses offered to help me in various ways. The one most highlighted is probably this one from Microsoft's Chief Legal Officer Brad Smith:

I also received a bunch of emails. Some of them from people who offered help - and I must say I'm deeply humbled and grateful by the amount of friends I apparently have and the reach this got.

Some of the emails also echoed the spirit of some of the twitter replies I got: quite a few Americans feel guilty, ashamed or otherwise apologize for what happened to me. However, I personally do not at all think of this setback as something that my American friends are behind. And I have many.

Mozilla legal

Tuesday evening I had a phone call with our (Mozilla's) legal chief about my situation and I helped to clarify exactly what I had done, what I've been told and what had happened. There's a team working now to help me sort out what happened and why, and what I and we can do about it so that I don't get to experience this again the next time I want to travel to the US. People are involved both on the US as well as on the Swedish side of things.

Personally I don't have any plans to travel to the US in the near future so there's no immediate rush. I had already given up attending this Mozilla all-hands.

Repercussions

Mark Nottingham sent an email on the QUIC working group's mailing list, and here follows two selected sections from it:

You may have seen reports that someone who participates in this work was recently refused entry to the US*, for unspecified reasons.

...

We won't hold any further interim meetings in the US, until there's a change in this situation. This means that we'll either need to find suitable hosts in Canada or Mexico, or our meeting rotation will need to change to be exclusively Europe and Asia.

I trust I don't actually need to point out that I am that "someone" and again I'm impressed and humbled by the support and actions in my community.

Now what?

I'm now (end of Wednesday, 60 hours since the check-in counter) at 3000 more twitter followers than what I started out with this Monday morning. This turned out to be a totally crazy week and it has severally impacted my productivity. I need to get back to write code, I'm getting behind!

I hope we'll get some answers soon as to why I was denied and what I can do to fix this for the future. When I get that, I will share all the info I can with you all.

So, back to work!

Thanks again

Before I forget: thank you all. Again. With all my heart. The amount of love I've received these last two days is amazing.

HTTP Workshop s03e02

(Season three, episode two)

Previously, on the HTTP Workshop. Yesterday ended with a much appreciated group dinner and now we're back energized and eager to continue blabbing about HTTP frames, headers and similar things.

Martin from Mozilla talked on "connection management is hard". Parts of the discussion was around the HTTP/2 connection coalescing that I've blogged about before. The ORIGIN frame is a draft for a suggested way for servers to more clearly announce which origins it can answer for on that connection which should reduce the frequency of 421 needs. The ORIGIN frame overrides DNS and will allow coalescing even for origins that don't otherwise resolve to the same IP addresses. The Alt-Svc header, a suggested CERTIFICATE frame and how does a HTTP/2 server know for which origins it can do PUSH for?

A lot of positive words were expressed about the ORIGIN frame. Wildcard support?

Willy from HA-proxy talked about his Memory and CPU efficient HPACK decoding algorithm. Personally, I think the award for the best slides of the day goes to Willy's hand-drawn notes.

Lucas from BBC talked about usage data for iplayer and how much data and number of requests they serve and how their largest share of users are "non-browsers". Lucas mentioned their work on writing a libcurl adaption to make gstreamer use it instead of libsoup. Lucas talk triggered a lengthy discussion on what needs there are and how (if at all) you can divide clients into browsers and non-browser.

Wenbo from Google spoke about Websockets and showed usage data from Chrome. The median websockets connection time is 20 seconds and 10% something are shorter than 0.5 seconds. At the 97% percentile they live over an hour. The connection success rates for Websockets are depressingly low when done in the clear while the situation is better when done over HTTPS. For some reason the success rate on Mac seems to be extra low, and Firefox telemetry seems to agree. Websockets over HTTP/2 (or not) is an old hot topic that brought us back to reiterate issues we've debated a lot before. This time we also got a lovely and long side track into web push and how that works.

Roy talked about Waka, a HTTP replacement protocol idea and concept that Roy's been carrying around for a long time (he started this in 2001) and to which he is now coming back to do actual work on. A big part of the discussion was focused around the wakli compression ideas, what the idea is and how it could be done and evaluated. Also, Roy is not a fan of content negotiation and wants it done differently so he's addressing that in Waka.

Vlad talked about his suggestion for how to do cross-stream compression in HTTP/2 to significantly enhance compression ratio when, for example, switching to many small resources over h2 compared to a single huge resource over h1. The security aspect of this feature is what catches most of people's attention and the following discussion. How can we make sure this doesn't leak sensitive information? What protocol mechanisms exist or can we invent to help out making this work in a way that is safer (by default)?

Trailers. This is again a favorite topic that we've discussed before that is resurfaced. There are people around the table who'd like to see support trailers and we discussed the same topic in the HTTP Workshop in 2016 as well. The corresponding issue on trailers filed in the fetch github repo shows a lot of the concerns.

Julian brought up the subject of "7230bis" - when and how do we start the work. What do we want from such a revision? Fixing the bugs seems like the primary focus. "10 years is too long until update".

Kazuho talked about "HTTP/2 attack mitigation" and how to handle clients doing many parallel slow POST requests to a CDN and them having an origin server behind that runs a new separate process for each upload.

And with this, the day and the workshop 2017 was over. Thanks to Facebook for hosting us. Thanks to the members of the program committee for driving this event nicely! I had a great time. The topics, the discussions and the people - awesome!

HTTP Workshop – London edition. First day.

The HTTP workshop series is back for a third time this northern hemisphere summer. The selected location for the 2017 version is London and this time we're down to a two-day event (we seem to remove a day every year)...

Nothing in this blog entry is a quote to be attributed to a specific individual but they are my interpretations and paraphrasing of things said or presented. Any mistakes or errors are all mine.

At 9:30 this clear Monday morning, 35 persons sat down around a huge table in a room in the Facebook offices. Most of us are the same familiar faces that have already participated in one or two HTTP workshops, but we also have a set of people this year who haven't attended before. Getting fresh blood into these discussions is certainly valuable. Most major players are represented, including Mozilla, Google, Facebook, Apple, Cloudflare, Fastly, Akamai, HA-proxy, Squid, Varnish, BBC, Adobe and curl!

Mark (independent, co-chair of the HTTP working group as well as the QUIC working group) kicked it all off with a presentation on quic and where it is right now in terms of standardization and progress. The upcoming draft-04 is becoming the first implementation draft even though the goal for interop is set basically at handshake and some very basic data interaction. The quic transport protocol is still in a huge flux and things have not settled enough for it to be interoperable right now to a very high level.

Jana from Google presented on quic deployment over time and how it right now uses about 7% of internet traffic. The Android Youtube app's switch to QUIC last year showed a huge bump in usage numbers. Quic is a lot about reducing latency and numbers show that users really do get a reduction. By that nature, it improves the situation best for those who currently have the worst connections.

It doesn't solve first world problems, this solves third world connection issues.

The currently observed 2x CPU usage increase for QUIC connections as compared to h2+TLS is mostly blamed on the Linux kernel which apparently is not at all up for this job as good is should be. Things have clearly been more optimized for TCP over the years, leaving room for improvement in the UDP areas going forward. "Making kernel bypassing an interesting choice".

Alan from Facebook talked header compression for quic and presented data, graphs and numbers on how HPACK(-for-quic), QPACK and QCRAM compare when used for quic in different networking conditions and scenarios. Those are the three current header compression alternatives that are open for quic and Alan first explained the basics behind them and then how they compare when run in his simulator. The current HPACK version (adopted to quic) seems to be out of the question for head-of-line-blocking reasons, the QCRAM suggestion seems to run well but have two main flaws as it requires an awkward layering violation and an annoying possible reframing requirement on resends. Clearly some more experiments can be done, possible with a hybrid where some QCRAM ideas are brought into QPACK. Alan hopes to get his simulator open sourced in the coming months which then will allow more people to experiment and reproduce his numbers.

Hooman from Fastly on problems and challenges with HTTP/2 server push, the 103 early hints HTTP response and cache digests. This took the discussions on push into the weeds and into the dark protocol corners we've been in before and all sorts of ideas and suggestions were brought up. Some of them have been discussed before without having been resolved yet and some ideas were new, at least to me. The general consensus seems to be that push is fairly complicated and there are a lot of corner cases and murky areas that haven't been clearly documented, but it is a feature that is now being used and for the CDN use case it can help with a lot more than "just an RTT". But is perhaps the 103 response good enough for most of the cases?

The discussion on server push and how well it fares is something the QUIC working group is interested in, since the question was asked already this morning if a first version of quic could be considered to be made without push support. The jury is still out on that I think.

ekr from Mozilla spoke about TLS 1.3, 0-RTT, how the TLS 1.3 handshake looks like and how applications and servers can take advantage of the new 0-RTT and "0.5-RTT" features. TLS 1.3 is already passed the WGLC and there are now "only" a few issues pending to get solved. Taking advantage of 0RTT in an HTTP world opens up interesting questions and issues as HTTP request resends and retries are becoming increasingly prevalent.

Next: day two.

A curl delivery network

I've run my own public web sites on hardware I've administered myself for over twenty years now. I've hosted the curl web site myself since it's inception.

The curl web site at curl.haxx.se has recently been delivering roughly 1.5 terabyte of data to the world per month. The CA bundle we convert to PEM from the Mozilla source code, is alone downloaded more than 100,000 times per day. Occasional blog entries I've posted here on my blog have climbed very fast on popular sites such as Hacker news and Reddit, and have resulted in intense visitor storms hitting this same server - sometimes reaching visitor counts above 200,000 "uniques" - most of them within the first few hours of the publication. At times, those visitor spikes have effectively brought the server to its knees.

Yes, my personal web site and the curl web site are both sharing the same physical server. It also hosts more than a dozen other sites and numerous services for our own pleasures and fun, providing services for a handful of different open source projects. So when the server has to cease doing work because it runs out of memory or hits other resource restraints, that causes interruptions all over. Oh yes, and my email doesn't reach me.

Inconvenient and annoying.

The server

Haxx owns and runs this co-located server that we have a busload of web servers on - for the good of the projects and people that run things on it. This machine's worst bottle neck is available RAM memory and perhaps I/O performance. Every time the server goes down to a crawl due to network traffic overload we discuss how we should upgrade it. Installing a new machine and transferring over all the sites and services is work. Work that none of us at Haxx are very happy to volunteer to do. So it hasn't been done yet, and frankly the server handles the daily load just fine and without even a blink. Which is ninety nine point something percent of the time...

Haxx pays for a certain amount of network traffic so as long as we're below some threshold we remain paying the same monthly fee. We don't want to increase the traffic by magnitudes as that would cost more.

The specific machine, that sits deep inside a server room in Stockholm Sweden, is a five(?) years old Dell Poweredge E310, Intel Xeon X3440 2.53GHz with 8GB ram, This model is shown on the image at the top.

Alternatives that hasn't helped

Why not a mirror system? We had a fair amount of curl site mirrors a few years ago, but it never worked well because they were always less reliable than the main site and they often turned stale and out of sync with the master site which eventually just hurt users.

They also trick visitors into bookmark or otherwise go back to the mirror site instead of the real one and there were always the annoying people who couldn't resist but to fill the mirror with ads and stuff. Plus, they didn't help much with with the storms to the main site.

Why not a cloud server? Because with the amount of services, servers and various things we do on our server, it would be inconvenient and expensive. But perhaps even more because we started out like this so we have invested time and energy into the infrastructure as it works right now. And I enjoy rowing my own boat!

The CDN

Fastly reached out and graciously offered to help us handle the load. Both on the account of traffic amounts but also to save our machine from struggling this hard the next time I'll write something that tickles people's curiosity (or rage) to that level when several thousands of visitors want to read the same article at the same time.

Starting now, the curl.haxx.se and the daniel.haxx.se web sites are fronted by Fastly. It should give web site visitors from all over the world faster response times and it will make the site more reliable and less likely to have problems due to traffic load going forward.

In case you're not familiar with what a CDN is, a simplified explanation would say it is a globally distributed network of reverse proxy servers deployed in multiple data centers. These CDN servers front the Internet and will to the largest extent possible serve the visitors with the right content directly from their own caches instead of them reaching the actual lowly backend server I run that hosts the original content. Fastly has lots of servers across the globe for this purpose. Users who are a long way away from Sweden will probably be the ones who will notice this change the most, as you may suddenly find haxx.se content much closer (network round-trip wise) than before.

Standards

These new servers will host the sites over HTTPS just like before, and they will require TLS 1.2 and SNI. They will work over IPv6 and support HTTP/2.  Network standard wise, there shouldn't be any step down - and honestly, I haven't exactly been on the cutting edge of these technologies myself for these sites in the past.

Editing the site

We will keep editing and maintaining the site like before. It is made up of an old system with templates and include files that generate mostly static web pages. The site is mostly available on github and using that, you can build a local version for development and trying out changes before they land.

Hopefully, this move to Fastly will only make the site faster and more reliable. If you notice any glitches or experience any problems with the site, please let us know!

New screen and new fuses

I got myself a new 27" 4K screen to my work setup, a Dell P2715Q, and replaced one of my old trusty twenty-four inch friends with it.

I now work with the "Thinkpad 13" on the left as my video conference machine (it does nothing else and it runs Windows!), the two mid screens are a 24" and the new 27" and they are connected to my primary dev machine while the rightmost thing is my laptop for when I need to move.

Did everything run smoothly? Heck no.

When I first inserted the 4K screen without modifying anything else in the setup, it was immediately obvious that I really needed to upgrade my graphics card since it didn't have muscles enough to drive the screen at 4K so the screen would then instead upscale a 1920x1200 image in a slightly blurry fashion. I couldn't have that!

New graphics card

So when I was out and about later that day I more or less accidentally passed a Webhallen store, and I got myself a new card. I wanted to play it easy so I stayed with an AMD processor and went with ASUS Dual-Rx460-O2G. The key feature I wanted was to be able to drive one 4K screen and one at 1920x1200, and then I unfortunately had to give up on the ones with only passive cooling and I instead had to pick what sounds like a gaming card. (I hate shopping graphics cards.)As I was about to do surgery on the machine anyway. I checked and noticed that I could add more memory to the motherboard so I bought 16 more GB to a total of 32GB.

Blow some fuses

Later that night, when the house was quiet and dark I shut down my machine, inserted the new card, the new memory DIMMs and powered it back up again.

At least that was the plan. When I fired it back on, it said clock and my lamps around me all got dark and the machine didn't light up at all. The fuse was blown! Man, wasn't that totally unexpected?

I did some further research on what exactly caused the fuse to blow and blew a few more in the process, as I finally restored the former card and removed the memory DIMMs again and it still blew the fuse. Puzzled and slightly disappointed I went to bed when I had no more spare fuses.

I hate leaving the machine dead in parts on the floor with an uncertain future, but what could I do?

A new PSU

Tuesday morning I went to get myself a PSU replacement (Plexgear PS-600 Bronze), and once I had that installed no more fuses blew and I could start the machine again!

I put the new memory back in and I could get into the BIOS config with both screens working with the new card (and it detected 32GB ram just fine). But as soon as I tried to boot Linux, the boot process halted after just 3-4 seconds and seemingly just froze. Hm, I tested a few different kernels and safety mode etc but they all acted like that. Weird!

BIOS update

A little googling on the messages that appeared just before it froze gave me the idea that maybe I should see if there's an update for my bios available. After all, I've never upgraded it and it was a while since I got my motherboard (more than 4 years).

I found a much updated bios image on ASUS support site, put it on a FAT-formatted USB-drive and I upgraded.

Now it booted. Of course the error messages I had googled for are still present, and I suppose they were there before too, I just hadn't put any attention to them when everything was working dandy!

Displayport vs HDMI

I had the wrong idea that I should use the display port to get 4K working, but it just wouldn't work. DP + DVI just showed up on one screen and I even went as far as trying to download some Ubuntu Linux driver package for Radeon RX460 that I found, but of course it failed miserably due to my Debian Unstable having a totally different kernel running and what not.

In a slightly desperate move (I had now wasted quite a few hours on this and my machine still wasn't working), I put back the old graphics card - (with DVI + hdmi) only to note that it no longer works like it did (the DVI one didn't find the correct resolution anymore). Presumably the BIOS upgrade or something shook the balance?

Back on the new card I booted with DVI + HDMI, leaving DP entirely, and now suddenly both screens worked!

HiDPI + LoDPI

Once I had logged in, I could configure the 4K screen to show at its full 3840x2160 resolution glory. I was back.

Now I only had to start fiddling with getting the two screens to somehow co-exist next to each other, which is a challenge in its own. The large difference in DPI makes it hard to have one config that works across both screens. Like I usually have terminals on both screens - which font size should I use? And I put browser windows on both screens...

So far I've settled with increasing the font DPI in KDE and I use two different terminal profiles depending on which screen I put the terminal on. Seems to work okayish. Some texts on the 4K screen are still terribly small, so I guess it is good that I still have good eye sight!

24 + 27

So is it comfortable to combine a 24" with a 27" ? Sure, the size difference really isn't that notable. The 27 one is really just a few centimeters taller and the differences in width isn't an inconvenience. The photo below shows how similar they look, size-wise: