Less plain-text is better. Right?

Every connection and every user on the Internet is being monitored and snooped at to at least some extent every now and then. Everything from the casual firesheep user in your coffee shop, an admin in your ISP, your parents/kids on your wifi network, your employer on the company network, your country’s intelligence service in a national network hub or just a random rogue person somewhere in the middle of all this.

My involvement in HTTP make me mostly view and participate in this discussion with this protocol primarily in mind, but the discussion goes well beyond HTTP and the concepts can (and will?) be applied to most Internet protocols in the future. You can follow some of these discussions in the httpbis group, the UTA group, the tcpcrypt list on twitter and elsewhere.

IETF just published RFC 7258 which states:

Pervasive Monitoring Is a Widespread Attack on Privacy

Passive monitoring

Most networking surveillance can be done entirely passively by just running the correct software and listening in on the correct cable. Because most internet traffic is still plain-text and readable by anyone who wants to read it when the bytes come flying by. Like your postman can read your postcards.

Opportunistic?

Recently there’s been a fierce discussion going on both inside and outside of IETF and other protocol and standards groups about doing “opportunistic encryption” (OE) and its merits and drawbacks. The term, which in itself is being debated and often is said to be better called “opportunistic keying” (OK) instead, is about having protocols transparently (invisible to the user) upgrade plain-text versions to TLS unauthenticated encrypted versions of the protocols. I’m emphasizing the unauthenticated word there because that’s a key to the debate. Recently I’ve been told that the term “opportunistic security” is the term to use instead…

In the way of real security?

Basically the argument against opportunistic approaches tends to be like this: by opportunistically upgrading plain-text to unauthenticated encrypted communication, sysadmins and users in the world will consider that good enough and they will then not switch to using proper, strong and secure authentication encryption technologies. The less good alternative will hamper the adoption of the secure alternative. Server admins should just as well buy a cert for 10 USD and use proper HTTPS. Also, listeners can still listen in on or man-in-the-middle unauthenticated connections if they capture everything from the start of the connection, including the initial key exchange. Or the passive listener will just change to become an active party and this unauthenticated way doesn’t detect that. OE doesn’t prevent snooping.

Isn’t it better than plain text?

The argument for opportunism here is that there will be nothing to the user that shows that it is “upgrading” to something less bad than plain text. Browsers will not show the padlock, clients will not treat the connection as “secure”. It will just silently and transparently make passive monitoring of networks much harder and it will force actors who truly want to snoop on specific traffic to up their game and probably switch to active monitoring for more cases. Something that’s much more expensive for the listener. It isn’t about the cost of a cert. It is about setting up and keeping the cert up-to-date, about SNI not being widely enough adopted and that we can see only 30% of all sites on the Internet today use HTTPS – for these reasons and others.

HTTP:// over TLS

In the httpbis work group in IETF the outcome of this debate is that there is a way being defined on how to do HTTP as specified with a HTTP:// URL – that we’ve learned is plain-text – over TLS, as part of the http2 work. Alt-Svc is the way. (The header can also be used to just load balance HTTP etc but I’ll ignore that for now)

Mozilla and Firefox is basically the only team that initially stands behind the idea of implementing this in a browser. HTTP:// done over TLS will not be seen nor considered any more secure than ordinary HTTP is and users will not be aware if that happens or not. Only true HTTPS connections will get the padlock, secure cookies and the other goodies true HTTPS sites are known and expected to get and show.

HTTP:// over TLS will just silently send everything through TLS (assuming that it can actually negotiate such a connection), thus making passive monitoring of the network less easy.

Ideally, future http2 capable servers will only require a config entry to be set TRUE to make it possible for clients to do OE on them.

HTTPS is the secure protocol

HTTP:// over TLS is not secure. If you want security and privacy, you should use HTTPS. This said, MITMing HTTPS transfers is still a widespread practice in certain network setups…

TCPcrypt

I find this initiative rather interesting. If implemented, it removes the need for all these application level protocols to do anything about opportunistic approaches and it could instead be handled transparently on TCP level! It still has a long way to go though before we will see anything like this fly in real life.

The future will tell

Is this just a fad that will get no adoption and go away or is it the beginning of something that will change how we do protocols in the future? Time will tell. Many harsh words are being exchanged over this topic in many a debate right now…

(I’m trying to stick to “HTTP:// over TLS” here when referring to doing HTTP OE/OK over TLS. This is partly because RFC2818 that describes how to do HTTPS uses the phrase “HTTP over TLS”…)

licensed to get shared

As my http2 presentation is about to get its 16,000th viewer over at Slideshare I just have to take a moment and reflect over that fact.

Sixteen thousand viewers. I’ve uploaded slides there before over the years but no other presentation has gotten even close to this amount of attention even though some of them have been collecting views for years by now.

http2 presentation screenshot

I wrote my http2 explained document largely due to the popularity of my presentation and the stream of questions and curiosity that brought to life. Within just a couple of days, that 27 page document had been downloaded more than 2,000 times and by now over 5000 times. This is almost 7MB of PDF which I believe raises the bar for the ordinary casual browser to not download it without having an interest and intention to at least glance through it. Of course I realize a large portion of said downloads are never really read.

Someone suggested to me (possibly in jest) that I should convert these into ebooks and “charge 1 USD a piece to get some profit out of them”. I really won’t and I would have a struggle to do that. It has been said before but in my case it is indeed true: I stand on the shoulders of giants. I’ve just collected information and written down texts that mostly are ideas, suggestions and conclusions others have already made in various other forums, lists or documents. I wouldn’t feel right charging for that nor depriving anyone the rights and freedoms to create derivatives and continue building on what I’ve done. I’m just the curator and janitor here. Besides, I already have an awesome job at an awesome company that allows me to work full time on open source – every day.

The next phase started thanks to the open license. A friendly volunteer named Vladimir Lettiev showed up and translated the entire document into Russian and now suddenly the reach of the text is vastly expanded into a territory where it previously just couldn’t penetrate. With using people’s native languages, information can really trickle down to a much larger audience. Especially in regions that aren’t very Englishified.

#MeraKrypto

A whole range of significant Swedish network organizations (ISOC, SNUS, DFRI and SUNET) organized a full-day event today, managed by the great mr Olle E Johansson. The event, called “MeraKrypto” (MoreCrypto would be the exact translation), was a day with introductions to TLS and a lot of talks around TLS and other encryption and security related topics.

I was there and held a talk on the topic of “curl and TLS” and I basically talked some basics around what curl and libcurl are, how we do TLS, some common problems and hwo verifying the server cert is a common usage mistake and then I continued on to quickly mention how http2 and TLS relate..See my slides below, but please be aware that as usual you may not grasp the whole thing only by the (English) slides. The event was fully booked so there was around one hundred peeps in the audience and there were a lot of interested minds that asked good questions proving they really understood the topics.

The discussion almost got heated during the talk about how companies do MITMing of SSL sessions and this guy from Bluecoat pretty much single-handedly argued for the need for this and how “it fills a useful purpose”.

It was a great afternoon!

The event was streamed live and recorded on video. I’ll post a link as soon as it gets available to me.

http2 explained

http2 front page

I’m hereby offering you all the first version of my document explaining http2, the protocol. It features explanations on the background, basic fundamentals, details on the wire format and something about existing implementations and what’s to expect for the future.

The full PDF currently boasts 27 pages at version 1.0, but I plan to keep up with the http2 development going further and I’m also kind of thinking that I will get at least some user feedback, and I’ll do subsequent updates to improve and extend the document over time. Of course time will tell how good that will work.

The document is edited in libreoffice and that file is available on github, but ODT is really not a format suitable for patches and merges so I hope we can sort out changes with filing issues and sending emails.

Wireshark dissector work

WiresharkRecently I cloned the Wireshark git repository and started updating the http2 dissector. That’s the piece of code that gets called to analyze a stream of data that Wireshark thinks is http2.

The current http2 dissector was left at draft-09 state, while the current draft at the time was number 11 and there have been several changes on the binary format since so any reasonably updated client or server would send or receive byte streams that Wireshark couldn’t properly display.

I never wrote any dissector code before but I must say Wireshark didn’t disappoint. It was straight forward and mostly downright easy to fix most of the wrong details. I’m not pretending to be a master at this nor is the dissector code anywhere near “finished” yet but I still enjoyed the API and how to write a thing like this.

I’ve since dissected plain-text http2 streams that I’ve done with curl+nghttp2 and I’ve also used the SSLKEYLOGFILE trick with Firefox to automatically decrypt the TLS session and have the dissector figure out the underlying http2 parts.

If there’s any little snag to mention, it is the fact that they insist on getting patches submitted directly to gerrit instead of any mailing list or similar. This required me to create a gerrit account, and really figure out how to push my stuff from git to there, instead of the more traditional and simpler approach of just sending my patch to a mailing list or possibly submitting it to a bug/patch tracker somewhere with my browser.

Call me old-style but in fact the hip way of today with a pull-request github style would also have been much easier. Here’s what my gerrit submission looks like. But I get it, gerrit does push a little more work over to the submitter and I figure that once a submitter such as myself finally has fixed all the nits in the patch it is very easy for the project to actually merge it. I actually got someone else to help me point out how to even find the link to view the code review after the first one was submitted on the site… (when I post this, my patch has not yet been accepted or merged into the wireshark git repo)

Here’s a basic screenshot showing a trace of Firefox requesting https://nghttp2.org using http2. Click it for the full thing.

wireshark-screenshot

.. and what happens this morning my time? There’s a brand new http2 draft-12 out with more changes on the on-the-wire format! Well to be honest, that really wasn’t a surprise. I’ll get the new stuff supported too, but I’ll do that in a separate patch as I prefer to hold off until I see a live stream by at least one implementation to test against.

curl and proxy headers

Starting in the next curl release, 7.37.0, the curl tool supports the new command line option –proxy-header. (Completely merged at this commit.)

It works exactly like –header does, but will only include the headers in requests sent to a proxy, while the opposite is true for –header: that will only be sent in requests that will go to the end server. But of course, if you use a HTTP proxy and do a normal GET for example, curl will include headers for both the proxy and the server in the request. The bigger difference is when using CONNECT to a proxy, which then only will use proxy headers.

libcurl

For libcurl, the story is slightly different and more complicated since we’re having things backwards compatible there. The new libcurl still works exactly like the former one by default.

CURLOPT_PROXYHEADER is the new option that is the new proxy header option that should be set up exactly like CURLOPT_HTTPHEADER is

CURLOPT_HEADEROPT is then what an application uses to set how libcurl should use the two header options. Again, by default libcurl will keep working like before and use the CURLOPT_HTTPHEADER list in all HTTP requests. To change that behavior and use the new functionality instead, set CURLOPT_HEADEROPT to CURLHEADER_SEPARATE.

Then, the header lists will be handled as separate. An application can then switch back to the old behavior with a unified header list by using CURLOPT_HEADEROPT set to CURLHEADER_UNIFIED.

curl and the road to IPv6

I’d like to comment Paul Saab’s presentation from the other day at the World IPv6 Congress titled “The Road To IPv6 – Bumpy“. Paul works for Facebook and in his talk he apparently mentioned curl (slide 24 of the PDF set).

Lots of my friends have since directed my attention to those slides and asked for my comment. I haven’t seen Paul’s actual presentation, only read the slides, but I have had a shorter twitter conversation with him about what he meant with his words.

The slide in question says exactly this:

Curl

  • Very hostile to the format of the IPv6 address
  • Wants everything bracket enclosed
  • Many IPv6 bugs that only recently were fixed

Let’s see what those mean. Very hostile to the format of the IPv6 address and Wants everything bracket enclosed are basically the same thing.

Paul makes a big point about the fact that if you want to write a URL with an IP address instead of a host name, you have to put that IP address within [brackets] when the IP address is an IPv6 one, which you don’t do if it is an IPv4 one.

Right. Sure. You do. That’s certainly an obstacle when converting slightly naive applications from IPv4 to IPv6 environments. This syntax is mandated by RFCs and standards (RFC3986 to be exact). curl follows the standards and you’ll do it the same way in other tools and clients that use URLs. The problem manifests itself if you use curl for your task, but if you’d use something else instead that something else would have the same issue if it follows the standards. The reason for the brackets requirements is of course that IPv6 numerical addresses contain colons and colons already have a reserved meaning in the host part of URLs so they had to come up with some way to handle that.

Then finally, Many IPv6 bugs that only recently were fixed he said.

I’m the main developer and maintainer of the curl project. This is news to me. Sure we always fix bugs and we always find stupid things we fix so there’s no doubt about that we’ve had IPv6 related bugs that we’ve fixed – and that we still have IPv6 related bugs we haven’t yet found – but saying that we fixed many such bugs recently? That isn’t something I’m aware of. My guess is that he’s talking about hiccups we’ve had after introducing happy eyeballs, a change we introduced in release 7.34.0 in December 2013.

curl has had IPv6 support since January 2001. We’re on that bumpy road to IPv6!

groups.google.com hates greylisting

Dear Google,

Here’s a Wikipedia article for you: Greylisting.

After you’ve read that, then consider the error message I always get for my groups.google.com account when you disable mail sending to me due to “bouncing”:

Bounce status Your email address is currently flagged as bouncing. For additional information or to correct this, view your email status here [link].

Following that link I get to read the reason:

“Google tried to deliver your message, but it was rejected by the server for the recipient domain haxx.se by [mailserver]. The error that the other server returned was: 451 4.7.1 Greylisting in action, please come back later”

See, even the error message spells out what it is all about!

Thanks to this feature of Google groups, I cannot participate in any such lists/groups for as long as I keep my greylisting activated since it’ll keep disabling mail delivery to me.

Enabling greylisting decreased my spam flood to roughly a third of the previous volume (and now I’m at 500-1000 spam emails/day) so I’m not ready to disable it yet. I just have to not use google groups.

Update: I threw in the towel and I now whitelist google.com servers to get around this problem…

Reducing the Public Suffix pain?

Let me introduce you to what I consider one of the worst hacks we have in current and modern internet protocols: the Public Suffix List (PSL). This is a list (maintained by Mozilla) with domains that have some kind administrative setup or arrangement that makes sub-domains independent. For example, you can’t be allowed to set cookies for “*.com” because .com is a TLD that has independent domains. But the same thing goes for “*.co.uk” and there’s no hint anywhere about this – except for the Public Suffix List. Then, take that simple little example and extrapolate to a domain system that grows with several new TLDs every month and more. The PSL is now several thousands of entries long.

And cookies isn’t the only thing this is used for. Another really common and perhaps even more important use case is for wildcard matches in TLS server certificates. You should not be allowed to buy and use a cert for “*.co.uk” but you can for “*.yourcompany.co.uk”…

Not really official but still…

If you read the cookie RFC or the spec for how to do TLS wildcard certificate matching you won’t read any line putting it crystal clear that the Suffix List is what you must use and I’m sure different browser solve this slightly differently but in practice and most unfortunately (if you ask me) you must either use the list or make your own to be fully compliant with how the web works 2014.

curl, wget and the PSL

In curl and libcurl, we have so far not taken the PSL into account which is by choice since I’ve not had any decent way to handle it and there are lots of embedded and other use cases that simply won’t be able to cope with that large PSL chunk.

Wget hasn’t had any PSL awareness either, but the recent weeks this has been brought up on the wget list and more attention has been given to this. Work has been initiated to do something about it, which has lead to…

libpsl

Tim Rühsen took the baton and started the libpsl project and its associated mailing list, as a foundation for something for Wget to use to get PSL awareness.

I’ve mostly cheered the effort so far and said that I wouldn’t mind building on this to enhance curl in the future if it just gets a suitable (liberal enough) license and it seems to go in that direction. For curl’s sake, I would like to get a conditional dependency on this so that people without particular size restrictions can use this, and people on more embedded and special-purpose situations can continue to build without PSL support.

If you’re interested in helping out in curl and libcurl in this area, feel most welcome!

dbound

Meanwhile, the IETF has set up a new mailing list called dbound for discussions around PSL and similar issues and it seems very timely!

curl, open source and networking