daniel.haxx.se episode 8

Monday, October 27th, 2014

Today I hesitated to make my new weekly video episode. I looked at the viewers number and how they basically have dwindled the last few weeks. I’m not making this video series interesting enough for a very large crowd of people. I’m re-evaluating if I should do them at all, or if I can do something to spice them up…

… or perhaps just not look at the viewers numbers at all and just do what think is fun?

I decided I’ll go with the latter for now. After all, I enjoy making these and they usually give me some interesting feedback and discussions even if the numbers are really low. What good is a number anyway?

This week’s episode:

Personal

Firefox

Fun

HTTP/2

TALKS

  • I’m offering two talks for FOSDEM

curl

  • release next Wednesday
  • bug fixing period
  • security advisory is pending

wget

Reducing the Public Suffix pain?

Monday, March 24th, 2014

Let me introduce you to what I consider one of the worst hacks we have in current and modern internet protocols: the Public Suffix List (PSL). This is a list (maintained by Mozilla) with domains that have some kind administrative setup or arrangement that makes sub-domains independent. For example, you can’t be allowed to set cookies for “*.com” because .com is a TLD that has independent domains. But the same thing goes for “*.co.uk” and there’s no hint anywhere about this – except for the Public Suffix List. Then, take that simple little example and extrapolate to a domain system that grows with several new TLDs every month and more. The PSL is now several thousands of entries long.

And cookies isn’t the only thing this is used for. Another really common and perhaps even more important use case is for wildcard matches in TLS server certificates. You should not be allowed to buy and use a cert for “*.co.uk” but you can for “*.yourcompany.co.uk”…

Not really official but still…

If you read the cookie RFC or the spec for how to do TLS wildcard certificate matching you won’t read any line putting it crystal clear that the Suffix List is what you must use and I’m sure different browser solve this slightly differently but in practice and most unfortunately (if you ask me) you must either use the list or make your own to be fully compliant with how the web works 2014.

curl, wget and the PSL

In curl and libcurl, we have so far not taken the PSL into account which is by choice since I’ve not had any decent way to handle it and there are lots of embedded and other use cases that simply won’t be able to cope with that large PSL chunk.

Wget hasn’t had any PSL awareness either, but the recent weeks this has been brought up on the wget list and more attention has been given to this. Work has been initiated to do something about it, which has lead to…

libpsl

Tim Rühsen took the baton and started the libpsl project and its associated mailing list, as a foundation for something for Wget to use to get PSL awareness.

I’ve mostly cheered the effort so far and said that I wouldn’t mind building on this to enhance curl in the future if it just gets a suitable (liberal enough) license and it seems to go in that direction. For curl’s sake, I would like to get a conditional dependency on this so that people without particular size restrictions can use this, and people on more embedded and special-purpose situations can continue to build without PSL support.

If you’re interested in helping out in curl and libcurl in this area, feel most welcome!

dbound

Meanwhile, the IETF has set up a new mailing list called dbound for discussions around PSL and similar issues and it seems very timely!

Is there a case for a unified SSL front?

Wednesday, July 4th, 2012

There are many, sorry very many, different SSL libraries today that various programs may want to use. In the open source world at least, it is more and more common that SSL (and other crypto) using programs offer build options to build with either at least OpenSSL or GnuTLS and very often they also offer optinal build with NSS and possibly a few other SSL libraries.

In the curl project we just added support for library number nine. In the libcurl source code we have an internal API that each SSL library backend must provide, and all the libcurl source code is internally using only that single and fixed API to do SSL and crypto operations without even knowing which backend library that is actually providing the functionality. I talked about libcurl’s internal SSL API before, and I asked about this on the libcurl list back in Feb 2011.

So, a common problem should be able to find a common solution. What if we fixed this in a way that would be possible for many projects to re-use? What if one project’s ability to select from 9 different provider libraries could be leveraged by others. A single SSL API with a simplified API but that still provides the functionality most “simple” SSL-using applications need?

Marc Hörsken and I have discussed this a bit, and pidgin/libpurble came up as a possible contender that could use such a single SSL. I’ve also talked about it with Claes Jakobsson and Magnus Hagander for postgresql and I know since before that wget certainly could use it. When I’ve performed my talks on the seven SSL libraries of libcurl I’ve been approached by several people who have expressed a desire in seeing such an externalized API, and I remember the guys from cyassl among those. I’m sure there will be a few other interested parities as well if this takes off.

What remains to be answered is if it is possible to make it reality in a decent way.

Upsides:

  1. will reduce code from the libcurl code base – in the amount of 10K or more lines of C code, and it should be a decreased amount of “own” code for all projects that would decide to use thins single SSL library
  2. will allow other projects to use one out of many SSL libraries, hopefully benefiting end users as a result
  3. should get more people involved in the code as more projects would use it, hopefully ending up in better tested and polished source code

There are several downsides with a unified library, from the angle of curl/libcurl and myself:

  1. it will undoubtedly lead to more code being added and implemented that curl/libcurl won’t need or use, thus grow the code base and the final binary
  2. the API of the unified library won’t be possible to be as tightly integrated with the libcurl internals as it is today, so there will be some added code and logic needed
  3. it will require that we produce a lot of documentation for all the functions and structs in the API which takes time and effort
  4. the above mention points will deduct time from my other projects (hopefully to benefit others, but still)

Some problems that would have to dealt with:

  1. how to deal with libraries that don’t provide functionality that the single SSL API does
  2. how to draw the line of what functionality to offer in the API, as the individual libraries will always provide richer and more complete APIs
  3. What is actually needed to get the work on this started for real? And if we do, what would we call the project…

PS, I know the term is more accurately “TLS” these days but somehow people (me included) seem to have gotten stuck with the word SSL to cover both SSL and TLS…

News flash! Tech terms used almost correctly!

Wednesday, February 2nd, 2011

Ok, The Social Network isn’t a new movie by any means at this time, but I happened to see it the other day. I’ll leave the entire story and whatever facts or not it did or didn’t portrait in a correct manner.

But I did spot the use of several at least basic technical terms used in the beginning that struck me as amazingly correctly used! The movie character Mark actually used wget to download images (at about 10:05 into the movie), and as you can see on my first screenshot the initial keystrokes we get to see on the command line also actually resembles a correct wget command line. You can click on these images to get a slightly larger version of the pics. I’m sorry I couldn’t get any higher quality ones, but I figure the point is still the same!

the-social-network-wget-cmdline

After having invoked wget, as is explained he gets many pictures downloaded and what do you know, the screen output actually looks like it could’ve been a wget that has downloaded a couple of files:

the-social-network-wget-output

He also mentioned the terms ‘Apache’, ‘emacs’ and ‘perl scripts’ in complete and correct sentences.

Where is the world heading?!

Update: Hrvoje Niksic, the founder of wget, helped out with some additional observations:

The options looked right to me, something like -r -A.jpg …

I was wondering about the historical accuracy of the progress bar, but it checks out. The movie takes place about a year and a half after the release of Wget 1.8, which added the feature. The department that takes care of these things did a good job. :)

Bjarni got the award 2010

Sunday, November 7th, 2010

The Nordic Free Software Award 2010 was given the Icelandic hacker Bjarni Rúnar Einarsson.

The formal handing over of the prize was done during the social event at FSCONS 2010, with hundreds of free software hackers attending and a lot joy. Bjarni was also immediately invited to participate in the NFSA jury for next year, in an attempt to start a tradition of getting former winners on the jury.

NFSA-award

I’m happy to say that I served in the jury for the award this year. We were a bunch of Nordic free software enthusiasts in there, involving several previous winners. The winner this year, Bjarni Rúnar, was selected by us having a nomination process in which we received I believe 11 names and then a subsequent voting within the jury.

I did the press release draft and Karsten from FSFE polished it into something much better. I think that will go out early this week and I am now even mentioned as press contact for Sweden about the award. The FSFE posted their announcement, including my last name wrongly spelled…

The social event then went on with lots of free software talks with cool people from the entire Nordic region, and I certainly met a whole bunch of friendly hackers I didn’t know before. It was also great fun to run into Giuseppe, the current wget maintainer.

(The picture might just be a fake.)

cookie order

Wednesday, January 20th, 2010

I’ve previously mentioned the IETF httpstate working group several times, and here’s some insights into one topic we’re currently discussing:

HTTP Cookie: sort order

The current httpstate draft for the updated cookie spec says that cookies that are sent to a server with the Cookie: header in a HTTP request need to be sorted. They must be sorted primarily on path length and secondary on creation-time.

According to others on the http-state mailinglist, that’s exactly what the major browsers do already and they thus think we should document that as that’s how cookies work.

During these discussions I’ve learned that cookies with the same name that have been specified with different paths, will all be sent in the request and thus they need to be sent in a particular order and in fact even the original Netscape “spec” said so. I agree with this and I’ve also modified libcurl to act accordingly.

I hold the position that only cookies that have identical names need this treatment and that’s what we must specify. Implementers however will most likely find that sorting all cookies at once will be easier.

The secondary sort key, the creation-time, is much more questionable. Why would the server care about that? How can they even rely on that?

Previously in this discussion (back in August 2009) I checked other open source cookie implementations to see how they deal with cookie sorting. I learned that perl’s LWP does sort them based on path length but on nothing else. The following tools did no sorting at all: wget, curl, libsoup, pavuk, lftp and aria2. I have no doubts that we can find more if we would search for more implementations in more languages and environments.

Specifying that sorting is a MUST on path length will still keep LWP and the next curl version compliant. Specifying that they MUST sort on creation-date as well will then make all of these projects non-conforming.

One problem I have in the cookie discussion in general is that the (5?) major browsers pretty much all behave the same and while the 5 major browsers have almost 100% of the web browser market for users, we cannot then automatically assume that they have 100% of the HTTP client or cookie-using client market. We just don’t know how many applications, tools and frameworks exist out there that aren’t actually browsers but still are using cookies.

Of course, I want to say that creation-time sorting is pointless but I have nothing to back that up with. No numbers. The other side of the discussion has a bunch of browsers that sort like this already, but no numbers or evidence if servers rely on this or how many that do.

Can any reader of this find a site out there that depends on the cookie order being sorted on creation-date?

(Reminder: the charter for the HTTPSTATE working group is to document existing widely used practices. It is not to solve problems or to fix problems in the existing cookie protocol. We all know and acknowledge that cookies as they currently work are quite flawed and painful.)

Code re-use is fun

Monday, March 23rd, 2009

Back in 2003 I wrote up support for the HTTP NTLM authentication method for libcurl. Happy with my achievement, I later that year donated a GPL licensed version of my code to the Wget project (which also was my first contact with the signed paper stuff with the GNU/FSF to waive my copyright claims and instead hand them over). What was perhaps not so amusing with this code was when both curl and Wget 2005 were discovered to have the same security flaw due to my mistakes in this code shared by both projects!

Just recently, the neon project seems to be interested in taking on the version I adjusted somewhat for them, so possibly the third HTTP code is soon using this. Yeah I posted it on their mailing list back then so it has been sitting there in the archives maturing for some 6 years by now…

I also happened to fall over the SSH Tunnel Creator tool, which I’ve never used myself, that apparently snatched my neon donation (quite according to what the license allowed of course) and used it in their tool to do NTLM!

It’s actually not until recent years I discovered libntlm, and while I don’t know how good it was back in the days when I wrote my first NTLM stuff I generally think using existing libs is the better idea…

murl for extended curlness

Monday, March 9th, 2009

I’m a firm believer in the old unix mantra of letting each tool do its job and do it well, and pass on the rest of the work to the next tool. I’ve always stated that curl should remain this way and that it should remain within its defined walls and not try to do everything.

But time passes and more and more ideas are thrown up in the air, or in some cases directly at me, and the list of things that we could do but don’t due to this philosophical limit of remaining focused has grown. It currently includes at least:

  • metalink support
  • recursive HTML downloads
  • recursive/wildcard FTP transfers
  • bittorrent support
  • automatic proxy configuration
  • simultaneous/parallel download support

Educated readers of course immediately detect that this list (if implemented) would make a tool that basically does what wget already does (and a lot more) and I’ve explicitly said for a decade that curl is not a wget clone. Maybe it is time for us (me?) to reevaluate that sentiment – at least in some sense.

I don’t want to sacrifice the concepts that have worked so fine for curl under so many years, so I’m still firmly against stuffing all this into curl (or libcurl). That simply will not happen with me at the wheel.

A much more interesting alternative would be to instead start working on a second tool within the curl project: murl. A tool that does basically everything that curl already does, but also opens the doors for adding just about everything else we can cram in and that is still related to data transfers. That would include, but not be restricted to, all the fancy stuff mentioned in the list above!

No the name murl is not set in stone, nor is this whole idea anything but plain and early thoughts thrown out at this point so it may or may not actually take off. It will probably depend on if I get support and help from fellow hackers to get started and moving along.

cURL

Getting cacerts for your tools

Monday, September 1st, 2008

As the primary curl author, I’m finding the comments here interesting. That blog entry “Teaching wget About Root Certificates” is about how you can get cacerts for wget by downloading them from curl’s web site, and people quickly point out how getting cacerts from an untrusted third party place of course is an ideal situation for an MITM “attack”.

Of course you can’t trust any files off a HTTP site or a HTTPS site without a “trusted” certificate, but thinking that the curl project would run one of those just to let random people load PEM files from our site seems a bit weird. Thus, we also provide the scripts we do all this with so that you can run them yourself with whatever input data you need, preferably something you trust. The more paranoid you are, the harder that gets of course.

On Fedora, curl does come with ca certs (at least I’m told recent Fedoras do) and even if it doesn’t, you can actually point curl to use whatever cacert you like and since most default installs of curl uses OpenSSL like wget does, you could tell curl to use the same cacert your wget install uses.

This last thing gets a little more complicated when one of the two gets compiled with a SSL library that doesn’t easily support PEM (read: NSS), but in the case of curl in recent Fedora they build it with NSS but with an additional patch that allows it to still be able to read PEM files.

Fresh CA Cert Bundle Anyone?

Tuesday, November 13th, 2007

cURLThe popular ca extract service on the curl web site converts the Firefox ca certs into a PEM file suitable for use with curl, wget or anything else OpenSSL-based that likes PEM formatted CA cert bundles.

The main script was fixed yesterday as it was previously getting a nightly source code snapshot to get the “magic” file to convert from, but I noticed they stopped updating the nightly source snapshots a good while ago so the updates had stopped!

Now, the script only gets the actually needed certdata file and converts it, so now it downloads a lot less data in vain and it also thus runs much faster. Now the PEM files offered on that page are up-to-date with the most recent Firefox.