I talked with Ed Hoover on the between screens podcast a while ago and that episode has now been published. It is a dense 12 minutes as the good Ed edited it massively.
Category Archives: Work
Work stuff
Mozilla’s search for a new logo
I’m employed by Mozilla. The same Mozilla that recently has announced that it is looking around for feedback on how to revamp its logo and graphical image.
It was with amusement I saw one of the existing suggestions for a new logo by using “://” (colon slash slash) the name:
… compared with the recently announced new curl logo:
Me being in both teams and being a general Internet protocol enthusiast I couldn’t be more happy if Mozilla would end up using a design so clearly based on the same underlying thoughts. After all,
Imitation is the sincerest of flattery
as Charles Caleb Colton once so eloquently expressed it.
A workshop Monday
I decided I’d show up a little early at the Sheraton as I’ve been handling the interactions with hotel locally here in Stockholm where the workshop will run for the coming three days. Things were on track, if we ignore how they got the wrong name of the workshop on the info screens in the lobby, instead saying “Haxx Ab”…
Mark welcomed us with a quick overview of what we’re here for and quick run-through of the rough planning for the days. Our schedule is deliberately loose and open to allow for changes and adaptations as we go along.
Patrick talked about the 1 1/2 years of HTTP/2 working in Firefox so far, and we discussed a lot around the numbers and telemetry. What do they mean and why do they look like this etc. HTTP/2 is now at 44% of all HTTPS requests and connections using HTTP/2 are used for more than 8 requests on median (compared to slightly over 1 in the HTTP/1 case). What’s almost not used at all? HTTP/2 server push, Alt-Svc and HTTP 308 responses. Patrick’s presentation triggered a lot of good discussions. His slides are here.
RTT distribution for Firefox running on desktop and mobile, from Patrick’s slide set:
The lunch was lovely.
Vlad then continued to talk about experiences from implementing and providing server push at Cloudflare. It and the associated discussions helped emphasize that we need better help for users on how to use server push and there might be reasons for browsers to change how they are stored in the current “secondary cache”. Also, discussions around how to access pushed resources and get information about pushes from javascript were briefly touched on.
After a break with some sweets and coffee, Kazuho continued to describe cache digests and how this concept can help making servers do better or more accurate server pushes. Back to more discussions around push and what it actually solved, how much complexity it is worth and so on. I thought I could sense hesitation in the room on whether this is really something to proceed with.
We intend to have a set of lightning talks after lunch each day and we have already have twelve such suggested talks listed in the workshop wiki, but the discussions were so lively and extensive that we missed them today and we even had to postpone the last talk of today until tomorrow. I can already sense how these three days will not be enough for us to cover everything we have listed and planned…
We ended the evening with a great dinner sponsored by Mozilla. I’d say it was a great first day. I’m looking forward to day 2!
HTTP Workshop 2016, day -1
The HTTP Workshop 2016 will take place in Stockholm starting tomorrow Monday, as I’ve mentioned before. Today we’ll start off slowly by having a few pre workshop drinks and say hello to old and new friends.
I did a casual count, and out of the 40 attendees coming, I believe slightly less than half are newcomers that didn’t attend the workshop last year. We’ll see browser people come, more independent HTTP implementers, CDN representatives, server and intermediary developers as well as some friends from large HTTP operators/sites. I personally view my attendance to be primarily with my curl hat on rather than my Firefox one. Firmly standing in the client side trenches anyway.
Visitors to Stockholm these days are also lucky enough to arrive when the weather is possibly as good as it can get here with the warmest period through the summer so far with lots of sun and really long bright summer days.
News this year includes the @http_workshop twitter account. If you have questions or concerns for HTTP workshoppers, do send them that way and they might get addressed or at least noticed.
I’ll try to take notes and post summaries of each workshop day here. Of course I will fully respect our conference rules about what to reveal or not.
syscast discussion on curl and life
I sat down and talked curl, HTTP, HTTP/2, IETF, the web, Firefox and various internet subjects with Mattias Geniar on his podcast the syscast the other day.
everybody runs this code all the time
I was invited to talk about curl at the recent FOSS North conference in Gothenburg on May 26th. It was the first time the conference ran, but I think it went smooth and the ~110 visitors seemed to have a good time. It was a single track and there was a fairly good and interesting mix of talkers and subjects I think. They’re already planning to make it return again in spring 2017, so if you’re into FOSS and you’re in the Nordic region, consider this event next year…
I took on the subject of talking about my hacker ring^W^Wcurl project insights. Here’s my slide set:
At the event I sat down and had a chat with Simon Campanello, a reporter at IDG Techworld here in Sweden who subsequently posted this article about curl (in Swedish) and how our code has ended up getting used so widely.
My URL isn’t your URL
When I started the precursor to the curl project, httpget, back in 1996, I wrote my first URL parser. Back then, the universal address was still called URL: Uniform Resource Locators. That spec was published by the IETF in 1994. The term “URL” was then used as source for inspiration when naming the tool and project curl.
The term URL was later effectively changed to become URI, Uniform Resource Identifiers (published in 2005) but the basic point remained: a syntax for a string to specify a resource online and which protocol to use to get it. We claim curl accepts “URLs” as defined by this spec, the RFC 3986. I’ll explain below why it isn’t strictly true.
There was also a companion RFC posted for IRI: Internationalized Resource Identifiers. They are basically URIs but allowing non-ascii characters to be used.
The WHATWG consortium later produced their own URL spec, basically mixing formats and ideas from URIs and IRIs with a (not surprisingly) strong focus on browsers. One of their expressed goals is to “Align RFC 3986 and RFC 3987 with contemporary implementations and obsolete them in the process“. They want to go back and use the term “URL” as they rightfully state, the terms URI and IRI are just confusing and no humans ever really understood them (or often even knew they exist).
The WHATWG spec follows the good old browser mantra of being very liberal in what it accepts and trying to guess what the users mean and bending backwards trying to fulfill. (Even though we all know by now that Postel’s Law is the wrong way to go about this.) It means it’ll handle too many slashes, embedded white space as well as non-ASCII characters.
From my point of view, the spec is also very hard to read and follow due to it not describing the syntax or format very much but focuses far too much on mandating a parsing algorithm. To test my claim: figure out what their spec says about a trailing dot after the host name in a URL.
On top of all these standards and specs, browsers offer an “address bar” (a piece of UI that often goes under other names) that allows users to enter all sorts of fun strings and they get converted over to a URL. If you enter “http://localhost/%41” in the address bar, it’ll convert the percent encoded part to an ‘A’ there for you (since 41 in hex is a capital A in ASCII) but if you type “http://localhost/A A” it’ll actually send “/A%20A” (with a percent encoded space) in the outgoing HTTP GET request. I’m mentioning this since people will often think of what you can enter there as a “URL”.
The above is basically my (skewed) perspective of what specs and standards we have so far to work with. Now we add reality and let’s take a look at what sort of problems we get when my URL isn’t your URL.
So what is a URL?
Or more specifically, how do we write them. What syntax do we use.
I think one of the biggest mistakes the WHATWG spec has made (and why you will find me argue against their spec in its current form with fierce conviction that they are wrong), is that they seem to believe that URLs are theirs to define and work with and they limit their view of URLs for browsers, HTML and their address bars. Sure, they are the big companies behind the browsers almost everyone uses and URLs are widely used by browsers, but URLs are still much bigger than so.
The WHATWG view of a URL is not widely adopted outside of browsers.
colon-slash-slash
If we ask users, ordinary people with no particular protocol or web expertise, what a URL is what would they answer? While it was probably more notable years ago when the browsers displayed it more prominently, the :// (colon-slash-slash) sequence will be high on the list. Seeing that marks the string as a URL.
Heck, going beyond users, there are email clients, terminal emulators, text editors, perl scripts and a bazillion other things out there in the world already that detects URLs for us and allows operations on that. It could be to open that URL in a browser, to convert it to a clickable link in generated HTML and more. A vast amount of said scripts and programs will use the colon-slash-slash sequence as a trigger.
The WHATWG spec says it has to be one slash and that a parser must accept an indefinite amount of slashes. “http:/example.com” and “http:////////////////////////////////////example.com” are both equally fine. RFC 3986 and many others would disagree. Heck, most people I’ve confronted the last few days, even people working with the web, seem to say, think and believe that a URL has two slashes. Just look closer at the google picture search screen shot at the top of this article, which shows the top images for “URL” google gave me.
We just know a URL has two slashes there (and yeah, file: URLs most have three but lets ignore that for now). Not one. Not three. Two. But the WHATWG doesn’t agree.
“Is there really any reason for accepting more than two slashes for non-file: URLs?” (my annoyed question to the WHATWG)
“The fact that all browsers do.”
The spec says so because browsers have implemented the spec.
No better explanation has been provided, not even after I pointed out that the statement is wrong and far from all browsers do. You may find reading that thread educational.
In the curl project, we’ve just recently started debating how to deal with “URLs” having another amount of slashes than two because it turns out there are servers sending back such URLs in Location: headers, and some browsers are happy to oblige. curl is not and neither is a lot of other libraries and command line tools. Who do we stand up for?
Spaces
A space character (the ASCII code 32, 0x20 in hex) cannot be part of a URL. If you want it sent, you percent encode it like you do with any other illegal character you want to be part of the URL. Percent encoding is the byte value in hexadecimal with a percent sign in front of it. %20 thus means space. It also means that a parser that for example scans for URLs in a text knows that it reaches the end of the URL when the parser encounters a character that isn’t allowed. Like space.
Browsers typically show the address in their address bars with all %20 instances converted to space for appearance. If you copy the address there into your clipboard and then paste it again in your text editor you still normally get the spaces as %20 like you want them.
I’m not sure if that is the reason, but browsers also accept spaces as part of URLs when for example receiving a redirect in a HTTP response. That’s passed from a server to a client using a Location: header with the URL in it. The browsers happily allow spaces in that URL, encode them as %20 and send out the next request. This forced curl into accepting spaces in redirected “URLs”.
Non-ASCII
Making URLs support non-ASCII languages is of course important, especially for non-western societies and I’ve understood that the IRI spec was never good enough. I personally am far from an expert on these internationalization (i18n) issues so I just go by what I’ve heard from others. But of course users of non-latin alphabets and typing systems need to be able to write their “internet addresses” to resources and use as links as well.
In an ideal world, we would have the i18n version shown to users and there would be the encoded ASCII based version below, to get sent over the wire.
For international domain names, the name gets converted over to “punycode” so that it can be resolved using the normal system name resolvers that know nothing about non-ascii names. URIs have no IDN names, IRIs do and WHATWG URLs do. curl supports IDN host names.
WHATWG states that URLs are specified as UTF-8 while URIs are just ASCII. curl gets confused by non-ASCII letters in the path part but percent encodes such byte values in the outgoing requests – which causes “interesting” side-effects when the non-ASCII characters are provided in other encodings than UTF-8 which for example is standard on Windows…
Similar to what I’ve written above, this leads to servers passing back non-ASCII byte codes in HTTP headers that browsers gladly accept, and non-browsers need to deal with…
No URL standard
I’ve not tried to write a conclusive list of problems or differences, just a bunch of things I’ve fallen over recently. A “URL” given in one place is certainly not certain to be accepted or understood as a “URL” in another place.
Not even curl follows any published spec very closely these days, as we’re slowly digressing for the sake of “web compatibility”.
There’s no unified URL standard and there’s no work in progress towards that. I don’t count WHATWG’s spec as a real effort either, as it is written by a closed group with no real attempts to get the wider community involved.
My affiliation
I’m employed by Mozilla and Mozilla is a member of WHATWG and I have colleagues working on the WHATWG URL spec and other work items of theirs but it makes absolutely no difference to what I’ve written here. I also participate in the IETF and I consider myself friends with authors of RFC 1738, RFC 3986 and others but that doesn’t matter here either. My opinions are my own and this is my personal blog.
HTTP/2 in April 2016
On April 12 I had the pleasure of doing another talk in the Google Tech Talk series arranged in the Google Stockholm offices. I had given it the title “HTTP/2 is upon us, and here’s what you need to know about it.” in the invitation.
The room seated 70 persons but we had the amazing amount of over 300 people in the waiting line who unfortunately didn’t manage to get a seat. To those, and to anyone else who cares, here’s the video recording of the event.
If you’ve seen me talk about HTTP/2 before, you might notice that I’ve refreshed the material somewhat since before.
My HTTP/2 slide updates
My first HTTP/2 talk of the year I did for OWASP Stockholm on January 27th, and I subsequently updated my public slide set:
Two years of Mozilla
Today marks my two year anniversary of being employed by one of the greatest companies I’m aware of.
I get to work with open source all day, every day. I get to work for a company that isn’t driven by handing over profits to its owners for some sort of return on investment. I get to work on curl as part of my job. I get to work with internetworking, which is awesomely fun, hard, thrilling and hair-tearing all at once. I get to work with protocol standards like within the IETF and my employer can let me go to meetings. In the struggle for good, against evil and for the users of the world, I think I’m on the right side. For users, for privacy, for openness, for inclusiveness. I feel I’m a mozillian now.
So what did I achieve during my first two years with the dinosaur logo company? Not nearly enough of what I’ve wanted or possibly initially thought I would. I’ve faced a lot of tough bugs and hard challenges and I’ve landed and backed out changes all through-out this period. But I like to think that it is a net gain and even when running head first into a wall, that can be educational and we can learn from it and then when we take a few steps back and race forwards again we can use that knowledge and make better decision for the future.
Future you say? Yeah, I’m heading on in the same style, without raising my focus point very much and continuously looking for my next thing very close in time. I grab issues to work on with as little foresight as possible but I completely assume they will continue to be tough nuts to crack and there will be new networking issues to conquer going forward as well. I’ll keep working on open source, open standards and a better internet for users. I really enjoy working for Mozilla!