Category Archives: Mozilla

A workshop Monday

http workshopI decided I’d show up a little early at the Sheraton as I’ve been handling the interactions with hotel locally here in Stockholm where the workshop will run for the coming three days. Things were on track, if we ignore how they got the wrong name of the workshop on the info screens in the lobby, instead saying “Haxx Ab”…

Mark welcomed us with a quick overview of what we’re here for and quick run-through of the rough planning for the days. Our schedule is deliberately loose and open to allow for changes and adaptations as we go along.

Patrick talked about the 1 1/2 years of HTTP/2 working in Firefox so far, and we discussed a lot around the numbers and telemetry. What do they mean and why do they look like this etc. HTTP/2 is now at 44% of all HTTPS requests and connections using HTTP/2 are used for more than 8 requests on median (compared to slightly over 1 in the HTTP/1 case). What’s almost not used at all? HTTP/2 server push, Alt-Svc and HTTP 308 responses. Patrick’s presentation triggered a lot of good discussions. His slides are here.

RTT distribution for Firefox running on desktop and mobile, from Patrick’s slide set:

rtt-dist

The lunch was lovely.

Vlad then continued to talk about experiences from implementing and providing server push at Cloudflare. It and the associated discussions helped emphasize that we need better help for users on how to use server push and there might be reasons for browsers to change how they are stored in the current “secondary cache”. Also, discussions around how to access pushed resources and get information about pushes from javascript were briefly touched on.

After a break with some sweets and coffee, Kazuho continued to describe cache digests and how this concept can help making servers do better or more accurate server pushes. Back to more discussions around push and what it actually solved, how much complexity it is worth and so on. I thought I could sense hesitation in the room on whether this is really something to proceed with.

We intend to have a set of lightning talks after lunch each day and we have already have twelve such suggested talks listed in the workshop wiki, but the discussions were so lively and extensive that we missed them today and we even had to postpone the last talk of today until tomorrow. I can already sense how these three days will not be enough for us to cover everything we have listed and planned…

We ended the evening with a great dinner sponsored by Mozilla. I’d say it was a great first day. I’m looking forward to day 2!

HTTP Workshop 2016, day -1

http workshop The HTTP Workshop 2016 will take place in Stockholm starting tomorrow Monday, as I’ve mentioned before. Today we’ll start off slowly by having a few pre workshop drinks and say hello to old and new friends.

I did a casual count, and out of the 40 attendees coming, I believe slightly less than half are newcomers that didn’t attend the workshop last year. We’ll see browser people come, more independent HTTP implementers, CDN representatives, server and intermediary developers as well as some friends from large HTTP operators/sites. I personally view my attendance to be primarily with my curl hat on rather than my Firefox one. Firmly standing in the client side trenches anyway.

Visitors to Stockholm these days are also lucky enough to arrive when the weather is possibly as good as it can get here with the warmest period through the summer so far with lots of sun and really long bright summer days.

News this year includes the @http_workshop twitter account. If you have questions or concerns for HTTP workshoppers, do send them that way and they might get addressed or at least noticed.

I’ll try to take notes and post summaries of each workshop day here. Of course I will fully respect our conference rules about what to reveal or not.

stockholm castle and ship

everybody runs this code all the time

I was invited to talk about curl at the recent FOSS North conference in Gothenburg on May 26th. It was the first time the conference ran, but I think it went smooth and the ~110 visitors seemed to have a good time. It was a single track and there was a fairly good and interesting mix of talkers and subjects I think. They’re already planning to make it return again in spring 2017, so if you’re into FOSS and you’re in the Nordic region, consider this event next year…

I took on the subject of talking about my hacker ring^W^Wcurl project insights. Here’s my slide set:

At the event I sat down and had a chat with Simon Campanello, a reporter at IDG Techworld here in Sweden who subsequently posted this article about curl (in Swedish) and how our code has ended up getting used so widely.

photo of me from the Techworld article

My URL isn’t your URL

URLs

When I started the precursor to the curl project, httpget, back in 1996, I wrote my first URL parser. Back then, the universal address was still called URL: Uniform Resource Locators. That spec was published by the IETF in 1994. The term “URL” was then used as source for inspiration when naming the tool and project curl.

The term URL was later effectively changed to become URI, Uniform Resource Identifiers (published in 2005) but the basic point remained: a syntax for a string to specify a resource online and which protocol to use to get it. We claim curl accepts “URLs” as defined by this spec, the RFC 3986. I’ll explain below why it isn’t strictly true.

There was also a companion RFC posted for IRI: Internationalized Resource Identifiers. They are basically URIs but allowing non-ascii characters to be used.

The WHATWG consortium later produced their own URL spec, basically mixing formats and ideas from URIs and IRIs with a (not surprisingly) strong focus on browsers. One of their expressed goals is to “Align RFC 3986 and RFC 3987 with contemporary implementations and obsolete them in the process“. They want to go back and use the term “URL” as they rightfully state, the terms URI and IRI are just confusing and no humans ever really understood them (or often even knew they exist).

The WHATWG spec follows the good old browser mantra of being very liberal in what it accepts and trying to guess what the users mean and bending backwards trying to fulfill. (Even though we all know by now that Postel’s Law is the wrong way to go about this.) It means it’ll handle too many slashes, embedded white space as well as non-ASCII characters.

From my point of view, the spec is also very hard to read and follow due to it not describing the syntax or format very much but focuses far too much on mandating a parsing algorithm. To test my claim: figure out what their spec says about a trailing dot after the host name in a URL.

On top of all these standards and specs, browsers offer an “address bar” (a piece of UI that often goes under other names) that allows users to enter all sorts of fun strings and they get converted over to a URL. If you enter “http://localhost/%41” in the address bar, it’ll convert the percent encoded part to an ‘A’ there for you (since 41 in hex is a capital A in ASCII) but if you type “http://localhost/A A” it’ll actually send “/A%20A” (with a percent encoded space) in the outgoing HTTP GET request. I’m mentioning this since people will often think of what you can enter there as a “URL”.

The above is basically my (skewed) perspective of what specs and standards we have so far to work with. Now we add reality and let’s take a look at what sort of problems we get when my URL isn’t your URL.

So what  is a URL?

Or more specifically, how do we write them. What syntax do we use.

I think one of the biggest mistakes the WHATWG spec has made (and why you will find me argue against their spec in its current form with fierce conviction that they are wrong), is that they seem to believe that URLs are theirs to define and work with and they limit their view of URLs for browsers, HTML and their address bars. Sure, they are the big companies behind the browsers almost everyone uses and URLs are widely used by browsers, but URLs are still much bigger than so.

The WHATWG view of a URL is not widely adopted outside of browsers.

colon-slash-slash

If we ask users, ordinary people with no particular protocol or web expertise, what a URL is what would they answer? While it was probably more notable years ago when the browsers displayed it more prominently, the :// (colon-slash-slash) sequence will be high on the list. Seeing that marks the string as a URL.

Heck, going beyond users, there are email clients, terminal emulators, text editors, perl scripts and a bazillion other things out there in the world already that detects URLs for us and allows operations on that. It could be to open that URL in a browser, to convert it to a clickable link in generated HTML and more. A vast amount of said scripts and programs will use the colon-slash-slash sequence as a trigger.

The WHATWG spec says it has to be one slash and that a parser must accept an indefinite amount of slashes. “http:/example.com” and “http:////////////////////////////////////example.com” are both equally fine. RFC 3986 and many others would disagree. Heck, most people I’ve confronted the last few days, even people working with the web, seem to say, think and believe that a URL has two slashes. Just look closer at the google picture search screen shot at the top of this article, which shows the top images for “URL” google gave me.

We just know a URL has two slashes there (and yeah, file: URLs most have three but lets ignore that for now). Not one. Not three. Two. But the WHATWG doesn’t agree.

“Is there really any reason for accepting more than two slashes for non-file: URLs?” (my annoyed question to the WHATWG)

“The fact that all browsers do.”

The spec says so because browsers have implemented the spec.

No better explanation has been provided, not even after I pointed out that the statement is wrong and far from all browsers do. You may find reading that thread educational.

In the curl project, we’ve just recently started debating how to deal with “URLs” having another amount of slashes than two because it turns out there are servers sending back such URLs in Location: headers, and some browsers are happy to oblige. curl is not and neither is a lot of other libraries and command line tools. Who do we stand up for?

Spaces

A space character (the ASCII code 32, 0x20 in hex) cannot be part of a URL. If you want it sent, you percent encode it like you do with any other illegal character you want to be part of the URL. Percent encoding is the byte value in hexadecimal with a percent sign in front of it. %20 thus means space. It also means that a parser that for example scans for URLs in a text knows that it reaches the end of the URL when the parser encounters a character that isn’t allowed. Like space.

Browsers typically show the address in their address bars with all %20 instances converted to space for appearance. If you copy the address there into your clipboard and then paste it again in your text editor you still normally get the spaces as %20 like you want them.

I’m not sure if that is the reason, but browsers also accept spaces as part of URLs when for example receiving a redirect in a HTTP response. That’s passed from a server to a client using a Location: header with the URL in it. The browsers happily allow spaces in that URL, encode them as %20 and send out the next request. This forced curl into accepting spaces in redirected “URLs”.

Non-ASCII

Making URLs support non-ASCII languages is of course important, especially for non-western societies and I’ve understood that the IRI spec was never good enough. I personally am far from an expert on these internationalization (i18n) issues so I just go by what I’ve heard from others. But of course users of non-latin alphabets and typing systems need to be able to write their “internet addresses” to resources and use as links as well.

In an ideal world, we would have the i18n version shown to users and there would be the encoded ASCII based version below, to get sent over the wire.

For international domain names, the name gets converted over to “punycode” so that it can be resolved using the normal system name resolvers that know nothing about non-ascii names. URIs have no IDN names, IRIs do and WHATWG URLs do. curl supports IDN host names.

WHATWG states that URLs are specified as UTF-8 while URIs are just ASCII. curl gets confused by non-ASCII letters in the path part but percent encodes such byte values in the outgoing requests – which causes “interesting” side-effects when the non-ASCII characters are provided in other encodings than UTF-8 which for example is standard on Windows…

Similar to what I’ve written above, this leads to servers passing back non-ASCII byte codes in HTTP headers that browsers gladly accept, and non-browsers need to deal with…

No URL standard

I’ve not tried to write a conclusive list of problems or differences, just a bunch of things I’ve fallen over recently. A “URL” given in one place is certainly not certain to be accepted or understood as a “URL” in another place.

Not even curl follows any published spec very closely these days, as we’re slowly digressing for the sake of “web compatibility”.

There’s no unified URL standard and there’s no work in progress towards that. I don’t count WHATWG’s spec as a real effort either, as it is written by a closed group with no real attempts to get the wider community involved.

My affiliation

I’m employed by Mozilla and Mozilla is a member of WHATWG and I have colleagues working on the WHATWG URL spec and other work items of theirs but it makes absolutely no difference to what I’ve written here. I also participate in the IETF and I consider myself friends with authors of RFC 1738, RFC 3986 and others but that doesn’t matter here either. My opinions are my own and this is my personal blog.

HTTP/2 in April 2016

On April 12 I had the pleasure of doing another talk in the Google Tech Talk series arranged in the Google Stockholm offices. I had given it the title “HTTP/2 is upon us, and here’s what you need to know about it.” in the invitation.

The room seated 70 persons but we had the amazing amount of over 300 people in the waiting line who unfortunately didn’t manage to get a seat. To those, and to anyone else who cares, here’s the video recording of the event.

If you’ve seen me talk about HTTP/2 before, you might notice that I’ve refreshed the material somewhat since before.

Two years of Mozilla

Today marks my two year anniversary of being employed by one of the greatest companies I’m aware of.

I get to work with open source all day, every day. I get to work for a company that isn’t driven by handing over profits to its owners for some sort of return on investment. I get to work on curl as part of my job. I get to work with internetworking, which is awesomely fun, hard, thrilling and hair-tearing all at once. I get to work with protocol standards like within the IETF and my employer can let me go to meetings. In the struggle for good, against evil and for the users of the world, I think I’m on the right side. For users, for privacy, for openness, for inclusiveness. I feel I’m a mozillian now.

So what did I achieve during my first two years with the dinosaur logo company? Not nearly enough of what I’ve wanted or possibly initially thought I would. I’ve faced a lot of tough bugs and hard challenges and I’ve landed and backed out changes all through-out this period. But I like to think that it is a net gain and even when running head first into a wall, that can be educational and we can learn from it and then when we take a few steps back and race forwards again we can use that knowledge and make better decision for the future.

Future you say? Yeah, I’m heading on in the same style, without raising my focus point very much and continuously looking for my next thing very close in time. I grab issues to work on with as little foresight as possible but I completely assume they will continue to be tough nuts to crack and there will be new networking issues to conquer going forward as well. I’ll keep working on open source, open standards and a better internet for users. I really enjoy working for Mozilla!

Mozilla dinosaur head logo

A 2015 retrospective

Wow, another year has passed. Summing up some things I did this year.

Commits

I don’t really have good global commit count for the year, but github counts 1300 commits and I believe the vast majority of my commits are hosted there. Most of them in curl and curl-oriented projects.

We did 8 curl releases during the year featuring a total of 575 bug fixes. The almost 1,200 commits were authored by 107 different individuals.

Books

I continued working on http2 explained during the year, and after having changed to markdown format it is now available in more languages than ever thanks to our awesome translators!

I started my second book project in the fall of 2015, using the working title everything curl, which is a much larger book effort than the HTTP/2 book and after having just passed 23,500 words that create over 110 pages in the PDF version, almost half of the planned sections are still left to write…

Twitter

I almost doubled my number of twitter followers during this year, now at 2,850 something. While this is a pointless number, reaching out slightly further does have the advantage that I get better responses and that makes me appreciate and get more out of twitter.

Stackoverflow

I’ve continued to respond to questions there, and my total count is now at 550 answers, out of which I wrote about 80 this year. The top scored answer I wrote during 2015 is for a question that isn’t phrased like one: Apache and HTTP2.

Keyboard use

I’ve pressed a bit over 6.4 million keys on my primary keyboard during the year, and 10.7% of the keys were pressed on weekends.

During the 2900+ hours when at least one key press were registered, I averaged on 2206 key presses per hour.

The most excessive key banging hour of the year started  September 21 at 14:00 and ended with me reaching 10,875 key presses.

The most excessive day was June 9, during which I pushed 63,757 keys.

Talks

This is all the 16 opportunities where I’ve talked in front of an audience during 2015. As you will see, the list of topics were fairly limited…

Daniel talking at Apachecon 2015

This post was not bought

coinsAt times I post blog articles that get the view counter go up to and beyond 50,000 views. This puts me in a position where I get offers from companies to mention them or to “cooperate” on further blog posts that would somehow push their agenda or businesses.

I also get the more simple offers of adding random ads or “text only information” on specific individual pages on my sites that some SEO person out there figured out could potentially attract audience that search for specific terms.

I’ve even gotten offers from a company to sell off my server logs. Allegedly to help them work on anti-fraud so possibly for a good cause, but still…

This is by no counts a “big” blog or site, yet I get a steady stream of individuals and companies offering me money to give up a piece of my soul. I can only imagine what more popular sites get and it is clear that someone with a less strict standpoint than mine could easily make an extra income that way.

I turn down all those examples of “easy money”.

I want to be able to look you, my dear readers, straight in the eyes when I say that what’s written here are my own words and the opinions revealed are my own – even if of course you may not agree with me and I may do mistakes and be completely wrong at times or even many times. You can rest assured that I did the mistakes on my own and I was not paid by anyone to do them.

I’ve also removed ads from most of my sites and I don’t run external analytic scripts, minimizing the privacy intrusions and optimizing the contents: the stuff downloaded from my sites are what your browser needs to render the page. Not heaps of useless crap to show ads or to help anyone track you (in order to show more targeted ads).

I don’t judge others’ actions based on how I decide to run my blog. I’m in a fortunate position to take this stand, I realize that.

Still biased of course

This all said, I’m still employed by a company (Mozilla) that pays my salary and I work on several projects that are dear to me so of course I will show bias to some subjects. I don’t claim to have an objective view on things and I don’t even try to have that. When I write posts here, they come colored by my background and by what I am.