Archive for the ‘Open Source’ Category

curl on the NASDAQ tower

Friday, April 24th, 2015

Apigee posted this lovely picture over at twitter. A curl command line on the NASDAQ tower.

curl-nasdaq-cropped

Summing up the birthday festivities

Sunday, March 22nd, 2015

I blogged about curl’s 17th birthday on March 20th 2015. I’ve done similar posts in the past and they normally pass by mostly undetected and hardly discussed. This time, something else happened.

Primarily, the blog post quickly became the single most viewed blog entry I’ve ever written – and I’ve been doing it for many many years. Already in the first day it was up, I counted more than 65,000 views.

The blog post got more comments than on any other blog post I’ve ever done. Right now they have probably stopped but there are 60 of them now, almost everyone one of them saying congratulations and/or thanks.

The posting also got discussed on both hacker news and reddit, totaling in more than 260 comments. Most of those in positive spirit.

The initial tweet I made about my blog post is the most retweeted and stared tweet I’ve ever posted. At least 87 retweets and 49 favorites (it might even grow a bit more over time). Others subsequently also tweeted the link hundreds of times. I got numerous replies and friendly call-outs on twitter saying “congrats” and “thanks” in many variations.

Spontaneously (ie not initiated or requested by me but most probably because of a comment on hacker news), I also suddenly started to get donations from the curl web site’s donation web page (to paypal). Within 24 hours from my post, I had received 35 donations from friendly fans who donated a total sum of  445 USD. A quick count revealed that the total number of donations ever through the history of curl’s lifetime was 43 before this day. In one day we had basically gotten as many as we had gotten the first 17 years.

Interesting data from this donation “race”: I got donations varying from 1 USD (yes one dollar) to 50 USD and the average donation was then 12.7 USD.

Let me end this summary by thanking everyone who in various ways made the curl birthday extra fun by being nice and friendly and some even donating some of their hard earned money. I am honestly touched by the attention and all the warmth and positiveness. Thank you for proving internet comments can be this good!

curl, 17 years old today

Friday, March 20th, 2015

Today we celebrate the fact that it is exactly 17 years since the first public release of curl. I have always been the lead developer and maintainer of the project.

Birthdaycake

When I released that first version in the spring of 1998, we had only a handful of users and a handful of contributors. curl was just a little tool and we were still a few years out before libcurl would become a thing of its own.

The tool we had been working on for a while was still called urlget in the beginning of 1998 but as we just recently added FTP upload capabilities that name turned wrong and I decided cURL would be more suitable. I picked ‘cURL’ because the word contains URL and already then the tool worked primarily with URLs, and I thought that it was fun to partly make it a real English word “curl” but also that you could pronounce it “see URL” as the tool would display the contents of a URL.

Much later, someone (I forget who) came up with the “backronym” Curl URL Request Library which of course is totally awesome.

17 years are 6209 days. During this time we’ve done more than 150 public releases containing more than 2600 bug fixes!

We started out GPL licensed, switched to MPL and then landed in MIT. We started out using RCS for version control, switched to CVS and then git. But it has stayed written in good old C the entire time.

The term “Open Source” was coined 1998 when the Open Source Initiative was started just the month before curl was born, which was superseded with just a few days by the announcement from Netscape that they would free their browser code and make an open browser.

We’ve hosted parts of our project on servers run by the various companies I’ve worked for and we’ve been on and off various free services. Things come and go. Virtually nothing stays the same so we better just move with the rest of the world. These days we’re on github a lot. Who knows how long that will last…

We have grown to support a ridiculous amount of protocols and curl can be built to run on virtually every modern operating system and CPU architecture.

The list of helpful souls who have contributed to make curl into what it is now have grown at a steady pace all through the years and it now holds more than 1200 names.

Employments

In 1998, I was employed by a company named Frontec Tekniksystem. I would later leave that company and today there’s nothing left in Sweden using that name as it was sold and most employees later fled away to other places. After Frontec I joined Contactor for many years until I started working for my own company, Haxx (which we started on the side many years before that), during 2009. Today, I am employed by my forth company during curl’s life time: Mozilla. All through this project’s lifetime, I’ve kept my work situation separate and I believe I haven’t allowed it to disturb our project too much. Mozilla is however the first one that actually allows me to spend a part of my time on curl and still get paid for it!

The Netscape announcement which was made 2 months before curl was born later became Mozilla and the Firefox browser. Where I work now…

Future

I’m not one of those who spend time glazing toward the horizon dreaming of future grandness and making up plans on how to go there. I work on stuff right now to work tomorrow. I have no idea what we’ll do and work on a year from now. I know a bunch of things I want to work on next, but I’m not sure I’ll ever get to them or whether they will actually ship or if they perhaps will be replaced by other things in that list before I get to them.

The world, the Internet and transfers are all constantly changing and we’re adapting. No long-term dreams other than sticking to the very simple and single plan: we do file-oriented internet transfers using application layer protocols.

Rough estimates say we may have a billion users already. Chances are, if things don’t change too drastically without us being able to keep up, that we will have even more in the future.

1000 million users

It has to feel good, right?

I will of course point out that I did not take curl to this point on my own, but that aside the ego-boost this level of success brings is beyond imagination. Thinking about that my code has ended up in so many places, and is driving so many little pieces of modern network technology is truly mind-boggling. When I specifically sit down or get a reason to think about it at least.

Most of the days however, I tear my hair when fixing bugs, or I try to rephrase my emails to no sound old and bitter (even though I can very well be that) when I once again try to explain things to users who can be extremely unfriendly and whining. I spend late evenings on curl when my wife and kids are asleep. I escape my family and rob them of my company to improve curl even on weekends and vacations. Alone in the dark (mostly) with my text editor and debugger.

There’s no glory and there’s no eternal bright light shining down on me. I have not climbed up onto a level where I have a special status. I’m still the same old me, hacking away on code for the project I like and that I want to be as good as possible. Obviously I love working on curl so much I’ve been doing it for over seventeen years already and I don’t plan on stopping.

Celebrations!

Yeps. I’ll get myself an extra drink tonight and I hope you’ll join me. But only one, we’ll get back to work again afterward. There are bugs to fix, tests to write and features to add. Join in the fun! My backlog is only growing…

Video: My curl talk from FOSDEM 2015

Friday, March 13th, 2015

I mentioned the talk before, and now the video has been made available. About 25 minutes with me presenting curl.

Internet all the things

cURL

TLS in HTTP/2

Friday, March 6th, 2015

SSL padlockI’ve written the http2 explained document and I’ve done several talks about HTTP/2. I’ve gotten a lot of questions about TLS in association with HTTP/2 due to this, and I want to address some of them here.

TLS is not mandatory

In the HTTP/2 specification that has been approved and that is about to become an official RFC any day now, there is no language that mandates the use of TLS for securing the protocol. On the contrary, the spec clearly explains how to use it both in clear text (over plain TCP) as well as over TLS. TLS is not mandatory for HTTP/2.

TLS mandatory in effect

While the spec doesn’t force anyone to implement HTTP/2 over TLS but allows you to do it over clear text TCP, representatives from both the Firefox and the Chrome development teams have expressed their intents to only implement HTTP/2 over TLS. This means HTTPS:// URLs are the only ones that will enable HTTP/2 for these browsers. Internet Explorer people have expressed that they intend to also support the new protocol without TLS, but when they shipped their first test version as part of the Windows 10 tech preview, that browser also only supported HTTP/2 over TLS. As of this writing, there has been no browser released to the public that speaks clear text HTTP/2. Most existing servers only speak HTTP/2 over TLS.

The difference between what the spec allows and what browsers will provide is the key here, and browsers and all other user-agents are all allowed and expected to each select their own chosen path forward.

If you’re implementing and deploying a server for HTTP/2, you pretty much have to do it for HTTPS to get users. And your clear text implementation will not be as tested…

A valid remark would be that browsers are not the only HTTP/2 user-agents and there are several such non-browser implementations that implement the non-TLS version of the protocol, but I still believe that the browsers’ impact on this will be notable.

Stricter TLS

When opting to speak HTTP/2 over TLS, the spec mandates stricter TLS requirements than what most clients ever have enforced for normal HTTP 1.1 over TLS.

It says TLS 1.2 or later is a MUST. It forbids compression and renegotiation. It specifies fairly detailed “worst acceptable” key sizes and cipher suites. HTTP/2 will simply use safer TLS.

Another detail here is that HTTP/2 over TLS requires the use of ALPN which is a relatively new TLS extension, RFC 7301, which helps us negotiate the new HTTP version without losing valuable time or network packet round-trips.

TLS-only encourages more HTTPS

Since browsers only speak HTTP/2 over TLS (so far at least), sites that want HTTP/2 enabled must do it over HTTPS to get users. It provides a gentle pressure on sites to offer proper HTTPS. It pushes more people over to end-to-end TLS encrypted connections.

This (more HTTPS) is generally considered a good thing by me and us who are concerned about users and users’ right to privacy and right to avoid mass surveillance.

Why not mandatory TLS?

The fact that it didn’t get in the spec as mandatory was because quite simply there was never a consensus that it was a good idea for the protocol. A large enough part of the working group’s participants spoke up against the notion of mandatory TLS for HTTP/2. TLS was not mandatory before so the starting point was without mandatory TLS and we didn’t manage to get to another stand-point.

When I mention this in discussions with people the immediate follow-up question is…

No really, why not mandatory TLS?

The motivations why anyone would be against TLS for HTTP/2 are plentiful. Let me address the ones I hear most commonly, in an order that I think shows the importance of the arguments from those who argued them.

1. A desire to inspect HTTP traffic

looking-glassThere is a claimed “need” to inspect or intercept HTTP traffic for various reasons. Prisons, schools, anti-virus, IPR-protection, local law requirements, whatever are mentioned. The absolute requirement to cache things in a proxy is also often bundled with this, saying that you can never build a decent network on an airplane or with a satellite link etc without caching that has to be done with intercepts.

Of course, MITMing proxies that terminate SSL traffic are not even rare these days and HTTP/2 can’t do much about limiting the use of such mechanisms.

2. Think of the little ones

small-big-dogSmall devices cannot handle the extra TLS burden“. Either because of the extra CPU load that comes with TLS or because of the cert management in a billion printers/fridges/routers etc. Certificates also expire regularly and need to be updated in the field.

Of course there will be a least acceptable system performance required to do TLS decently and there will always be systems that fall below that threshold.

3. Certificates are too expensive

The price of certificates for servers are historically often brought up as an argument against TLS even it isn’t really HTTP/2 related and I don’t think it was ever an argument that was particularly strong against TLS within HTTP/2. Several CAs now offer zero-cost or very close to zero-cost certificates these days and with the upcoming efforts like letsencrypt.com, chances are it’ll become even better in the not so distant future.

pile-of-moneyRecently someone even claimed that HTTPS limits the freedom of users since you need to give personal information away (he said) in order to get a certificate for your server. This was not a price he was willing to pay apparently. This is however simply not true for the simplest kinds of certificates. For Domain Validated (DV) certificates you usually only have to prove that you “control” the domain in question in some way. Usually by being able to receive email to a specific receiver within the domain.

4. The CA system is broken

TLS of today requires a PKI system where there are trusted certificate authorities that sign certificates and this leads to a situation where all modern browsers trust several hundred CAs to do this right. I don’t think a lot of people are happy with this and believe this is the ultimate security solution. There’s a portion of the Internet that advocates for DANE (DNSSEC) to address parts of the problem, while others work on gradual band-aids like Certificate Transparency and OCSP stapling to make it suck less.

please trust me

My personal belief is that rejecting TLS on the grounds that it isn’t good enough or not perfect is a weak argument. TLS and HTTPS are the best way we currently have to secure web sites. I wouldn’t mind seeing it improved in all sorts of ways but I don’t believe running protocols clear text until we have designed and deployed the next generation secure protocol is a good idea – and I think it will take a long time (if ever) until we see a TLS replacement.

Who were against mandatory TLS?

Yeah, lots of people ask me this, but I will refrain from naming specific people or companies here since I have no plans on getting into debates with them about details and subtleties in the way I portrait their arguments. You can find them yourself if you just want to and you can most certainly make educated guesses without even doing so.

What about opportunistic security?

A text about TLS in HTTP/2 can’t be complete without mentioning this part. A lot of work in the IETF these days are going on around introducing and making sure opportunistic security is used for protocols. It was also included in the HTTP/2 draft for a while but was moved out from the core spec in the name of simplification and because it could be done anyway without being part of the spec. Also, far from everyone believes opportunistic security is a good idea. The opponents tend to say that it will hinder the adoption of “real” HTTPS for sites. I don’t believe that, but I respect that opinion because it is a guess as to how users will act just as well as my guess is they won’t act like that!

Opportunistic security for HTTP is now being pursued outside of the HTTP/2 spec and allows clients to upgrade plain TCP connections to instead do “unauthenticated TLS” connections. And yes, it should always be emphasized: with opportunistic security, there should never be a “padlock” symbol or anything that would suggest that the connection is “secure”.

Firefox supports opportunistic security for HTTP and it will be enabled by default from Firefox 37.

Translations

Пост доступен на сайте softdroid.net: Восстановление: TLS в HTTP/2. (Russian)

curl: embracing github more

Tuesday, March 3rd, 2015

Pull requests and issues filed on github are most welcome!

The curl project has been around for a long time by now and we’ve been through several different version control systems. The most recent switch was when we switched to git from CVS back in 2010. We were late switchers but then we’re conservative in several regards.

When we switched to git we also switched to github for the hosting, after having been self-hosted for many years before that. By using github we got a lot of services, goodies and reliable hosting at no cost. We’ve been enjoying that ever since.

cURLHowever, as we have been a traditional mailing list driving project for a long time, I have previously not properly embraced and appreciated pull requests and issues filed at github since they don’t really follow the old model very good.

Just very recently I decided to stop fighting those methods and instead go with them. A quick poll among my fellow team mates showed no strong opposition and we are now instead going full force ahead in a more github embracing style. I hope that this will lower the barrier and remove friction for newcomers and allow more people to contribute easier.

As an effect of this, I would also like to encourage each and everyone who is interested in this project as a user of libcurl or as a contributor to and hacker of libcurl, to skip over to the curl github home and press the ‘watch’ button to get notified when future requests and issues appear.

We also offer this helpful guide on how to contribute to the curl project!

More HTTP framing attempts

Monday, March 2nd, 2015

Previously, in my exciting series “improving the HTTP framing checks in Firefox” we learned that I landed a patch, got it backed out, struggled to improve the checks and finally landed the fixed version only to eventually get that one backed out as well.

And now I’ve landed my third version. The amendment I did this time:

When receiving HTTP content that is content-encoded and compressed I learned that when receiving deflate compression there is basically no good way for us to know if the content gets prematurely cut off. They seem to lack the footer too often for it to make any sense in checking for that. gzip streams however end with a footer so they are easier to reliably detect when they are incomplete. (As was discovered before, the Content-Length: is far too often not updated by the server so it is instead wrongly showing the uncompressed size.)

This (deflate vs gzip) knowledge is now used by the patch, meaning that deflate compressed downloads can be cut off without the browser noticing…

Will this version of the fix actually stick? I don’t know. There’s lots of bad voodoo out there in the HTTP world and I’m putting my finger right in the middle of some of it with this change. I’m pretty sure I’ve not written my last blog post on this topic just yet… If it sticks this time, it should show up in Firefox 39.

bolt-cutter

curl, smiley-URLs and libc

Tuesday, February 24th, 2015

Some interesting Unicode URLs have recently been seen used in the wild – like in this billboard ad campaign from Coca Cola, and a friend of mine asked me about curl in reference to these and how it deals with such URLs.

emojicoke-by-stevecoleuk-450

(Picture by stevencoleuk)

I ran some tests and decided to blog my observations since they are a bit curious. The exact URL I tried was ‘www.😃.ws’ (not the same smiley as shown on this billboard: 😂) – it is really hard to enter by hand so now is the time to appreciate your ability to cut and paste! It appears they registered several domains for a set of different smileys.

These smileys are not really allowed IDN (where IDN means International Domain Names) symbols which make these domains a bit different. They should not (see below for details) be converted to punycode before getting resolved but instead I assume that the pure UTF-8 sequence should or at least will be fed into the name resolver function. Well, either way it should either pass in punycode or the UTF-8 string.

If curl was built to use libidn, it still won’t convert this to punycode and the verbose output says “Failed to convert www.😃.ws to ACE; String preparation failed

curl (exact version doesn’t matter) using the stock threaded resolver

  • Debian Linux (glibc 2.19) – FAIL
  • Windows 7 - FAIL
  • Mac OS X 10.9 – SUCCESS

But then also perhaps to no surprise, the exact same results are shown if I try to ping those host names on these systems. It works on the mac, it fails on Linux and Windows. Wget 1.16 also fails on my Debian systems (just as a reference and I didn’t try it on any of the other platforms).

My curl build on Linux that uses c-ares for name resolving instead of glibc succeeds perfectly. host, nslookup and dig all work fine with it on Linux too (as well as nslookup on Windows):

$ host www.😃.ws
www.\240\159\152\131.ws has address 64.70.19.202
$ ping www.😃.ws
ping: unknown host www.😃.ws

While the same command sequence on the mac shows:

$ host www.😃.ws
www.\240\159\152\131.ws has address 64.70.19.202
$ ping www.😃.ws
PING www.😃.ws (64.70.19.202): 56 data bytes
64 bytes from 64.70.19.202: icmp_seq=0 ttl=44 time=191.689 ms
64 bytes from 64.70.19.202: icmp_seq=1 ttl=44 time=191.124 ms

Slightly interesting additional tidbit: if I rebuild curl to use gethostbyname_r() instead of getaddrinfo() it works just like on the mac, so clearly this is glibc having an opinion on how this should work when given this UTF-8 hostname.

Pasting in the URL into Firefox and Chrome works just fine. They both convert the name to punycode and use “www.xn--h28h.ws” which then resolves to the same IPv4 address.

Update: as was pointed out in a comment below, the “64.70.19.202″ IP address is not the correct IP for the site. It is just the registrar’s landing page so it sends back that response to any host or domain name in the .ws domain that doesn’t exist!

What do the IDN specs say?

The U-263A smileyThis is not my area of expertise. I had to consult Patrik Fältström here to get this straightened out (but please if I got something wrong here the mistake is still all mine). Apparently this smiley is allowed in RFC 3940 (IDNA2003), but that has been replaced by RFC 5890-5892 (IDNA2008) where this is DISALLOWED. If you read the spec, this is 263A.

So, depending on which spec you follow it was a valid IDN character or it isn’t anymore.

What does the libc docs say?

The POSIX docs for getaddrinfo doesn’t contain enough info to tell who’s right but it doesn’t forbid UTF-8 encoded strings. The regular glibc docs for getaddrinfo also doesn’t say anything and interestingly, the Apple Mac OS X version of the docs says just as little.

With this complete lack of guidance, it is hardly any additional surprise that the glibc gethostbyname docs also doesn’t mention what it does in this case but clearly it doesn’t do the same as getaddrinfo in the glibc case at least.

What’s on the actual site?

A redirect to www.emoticoke.com which shows a rather boring page.

emoticoke

Who’s right?

I don’t know. What do you think?

Bug finding is slow in spite of many eyeballs

Monday, February 23rd, 2015

“given enough eyeballs, all bugs are shallow”

The saying (also known as Linus’ law) doesn’t say that the bugs are found fast and neither does it say who finds them. My version of the law would be much more cynical, something like: “eventually, bugs are found“, emphasizing the ‘eventually’ part.

(Jim Zemlin apparently said the other day that it can work the Linus way, if we just fund the eyeballs to watch. I don’t think that’s the way the saying originally intended.)

Because in reality, many many bugs are never really found by all those given “eyeballs” in the first place. They are found when someone trips over a problem and is annoyed enough to go searching for the culprit, the reason for the malfunction. Even if the code is open and has been around for years it doesn’t necessarily mean that any of all the people who casually read the code or single-stepped over it will actually ever discover the flaws in the logic. The last few years several world-shaking bugs turned out to have existed for decades until discovered. In code that had been read by lots of people – over and over.

So sure, in the end the bugs were found and fixed. I would argue though that it wasn’t because the projects or problems were given enough eyeballs. Some of those problems were found in extremely popular and widely used projects. They were found because eventually someone accidentally ran into a problem and started digging for the reason.

Time until discovery in the curl project

I decided to see how it looks in the curl project. A project near and dear to me. To take it up a notch, we’ll look only at security flaws. Not only because they are the probably most important bugs we’ve had but also because those are the ones we have the most carefully noted meta-data for. Like when they were reported, when they were introduced and when they were fixed.

We have no less than 30 logged vulnerabilities for curl and libcurl so far through-out our history, spread out over the past 16 years. I’ve spent some time going through them to see if there’s a pattern or something that sticks out that we should put some extra attention to in order to improve our processes and code. While doing this I gathered some random info about what we’ve found so far.

On average, each security problem had been present in the code for 2100 days when fixed – that’s more than five and a half years. On average! That means they survived about 30 releases each. If bugs truly are shallow, it is still certainly not a fast processes.

Perhaps you think these 30 bugs are really tricky, deeply hidden and complicated logic monsters that would explain the time they took to get found? Nope, I would say that every single one of them are pretty obvious once you spot them and none of them take a very long time for a reviewer to understand.

Vulnerability ages

This first graph (click it for the large version) shows the period each problem remained in the code for the 30 different problems, in number of days. The leftmost bar is the most recent flaw and the bar on the right the oldest vulnerability. The red line shows the trend and the green is the average.

The trend is clearly that the bugs are around longer before they are found, but since the project is also growing older all the time it sort of comes naturally and isn’t necessarily a sign of us getting worse at finding them. The average age of flaws is aging slower than the project itself.

Reports per year

How have the reports been distributed over the years? We have a  fairly linear increase in number of lines of code but yet the reports were submitted like this (now it goes from oldest to the left and most recent on the right – click for the large version):

vuln-trend

Compare that to this chart below over lines of code added in the project (chart from openhub and shows blanks in green, comments in grey and code in blue, click it for the large version):

curl source code growth

We received twice as many security reports in 2014 as in 2013 and we got half of all our reports during the last two years. Clearly we have gotten more eyes on the code or perhaps users pay more attention to problems or are generally more likely to see the security angle of problems? It is hard to say but clearly the frequency of security reports has increased a lot lately. (Note that I here count the report year, not the year we announced the particular problems, as they sometimes were done on the following year if the report happened late in the year.)

On average, we publish information about a found flaw 19 days after it was reported to us. We seem to have became slightly worse at this over time, the last two years the average has been 25 days.

Did people find the problems by reading code?

In general, no. Sure people read code but the typical pattern seems to be that people run into some sort of problem first, then dive in to investigate the root of it and then eventually they spot or learn about the security problem.

(This conclusion is based on my understanding from how people have reported the problems, I have not explicitly asked them about these details.)

Common patterns among the problems?

I went over the bugs and marked them with a bunch of descriptive keywords for each flaw, and then I wrote up a script to see how the frequent the keywords are used. This turned out to describe the flaws more than how they ended up in the code. Out of the 30 flaws, the 10 most used keywords ended up like this, showing number of flaws and the keyword:

9 TLS
9 HTTP
8 cert-check
8 buffer-overflow

6 info-leak
3 URL-parsing
3 openssl
3 NTLM
3 http-headers
3 cookie

I don’t think it is surprising that TLS, HTTP or certificate checking are common areas of security problems. TLS and certs are complicated, HTTP is huge and not easy to get right. curl is mostly C so buffer overflows is a mistake that sneaks in, and I don’t think 27% of the problems tells us that this is a problem we need to handle better. Also, only 2 of the last 15 flaws (13%) were buffer overflows.

The discussion following this blog post is on hacker news.

Tightening Firefox’s HTTP framing – again

Thursday, February 12th, 2015

An old http1.1 frameCall me crazy, but I’m at it again. First a little resume from our previous episodes in this exciting saga:

Chapter 1: I closed the 10+ year old bug that made the Firefox download manager not detect failed downloads, simply because Firefox didn’t care if the HTTP 1.1 Content-Length was larger than what was actually saved – after the connection potentially was cut off for example. There were additional details, but that was the bigger part.

Chapter 2: After having been included all the way to public release, we got a whole slew of bug reports immediately when Firefox 33 shipped and we had to revert parts of the fix I did.

Chapter 3.

Will it land before it turns 11 years old? The bug was originally submitted 2004-03-16.

Since chapter two of this drama brought back the original bugs again we still have to do something about them. I fully understand if not that many readers of this can even keep up of all this back and forth and juggling of HTTP protocol details, but this time we’re putting back the stricter frame checks with a few extra conditions to allow a few violations to remain but detect and react on others!

Here’s how I addressed this issue. I wanted to make the checks stricter but still allow some common protocol violations.

In particular I needed to allow two particular flaws that have proven to be somewhat common in the wild and were the reasons for the previous fix being backed out again:

A – HTTP chunk-encoded responses that lack the final 0-sized chunk.

B – HTTP gzipped responses where the Content-Length is not the same as the actual contents.

So, in order to allow A + B and yet be able to detect prematurely cut off transfers I decided to:

  1. Detect incomplete chunks then the transfer has ended. So, if a chunk-encoded transfer ends on exactly a chunk boundary we consider that fine. Good: This will allow case (A) to be considered fine. Bad: It will make us not detect a certain amount of cut-offs.
  2. When receiving a gzipped response, we consider a gzip stream that doesn’t end fine according to the gzip decompressing state machine to be a partial transfer. IOW: if a gzipped transfer ends fine according to the decompressor, we do not check for size misalignment. This allows case (B) as long as the content could be decoded.
  3. When receiving HTTP that isn’t content-encoded/compressed (like in case 2) and not chunked (like in case 1), perform the size comparison between Content-Length: and the actual size received and consider a mismatch to mean a NS_ERROR_NET_PARTIAL_TRANSFER error.

Firefox BallPrefs

When my first fix was backed out, it was actually not removed but was just put behind a config string (pref as we call it) named “network.http.enforce-framing.http1“. If you set that to true, Firefox will behave as it did with my original fix applied. It makes the HTTP1.1 framing fairly strict and standard compliant. In order to not mess with that setting that now has been around for a while (and I’ve also had it set to true for a while in my browser and I have not seen any problems with doing it this way), I decided to introduce my new changes pref’ed behind a separate variable.

network.http.enforce-framing.soft” is the new pref that is set to true by default with my patch. It will make Firefox do the detections outlined in 1 – 3 and setting it to false will disable those checks again.

Now I only hope there won’t ever be any chapter 4 in this story… If things go well, this will appear in Firefox 38.

Chromium

But how do they solve these problems in the Chromium project? They have slightly different heuristics (with the small disclaimer that I haven’t read their code for this in a while so details may have changed). First of all, they do not allow a missing final 0-chunk. Then, they basically allow any sort of misaligned size when the content is gzipped.

Update: this patch was subsequently backed out again due to several bug reports about it. I have yet to analyze exactly what went wrong.