curl and speced cookie order

I just posted this on the curl-library list, but I feel it suits to be mentioned here separately.

As I’ve mentioned before, I’m involved in the IETF http-state working group which is working to document how cookies are used in the wild today. The idea is to create a spec that new implementations can follow and that existing implementations can use to become more interoperable.

(If you’re interested in these matters, I can only urge you to join the http-state mailing list and participate in the discussions.)

The subject of how to order cookies in outgoing HTTP Cookie: headers have been much debated over the recent months and I’ve also blogged about it. Now, the issue has been closed and the decision went quite opposite to my standpoint and now the spec will say that while the servers SHOULD not rely on the order (yeah right, some obviously already do and with this specified like this even more will soon do the same) it will recommend clients to sort the cookies in a given way that is close to the way current Firefox does it[*].

This has the unfortunate side-effect that to become fully compatible with how the browsers do cookies, we will need to sort our cookies a bit more than what we just recently introduced. That in itself really isn’t very hard since once we introduced qsort() it is easy to sort on more/other keys.

The biggest problem we get with this, is that the sorting uses creation time of the cookies. libcurl and curl and others mostly use the Netscape cookie files to store cookies and keep state between invokes, and that file format doesn’t include creation time info! It is a simple text-based file format with TAB-separated columns and the last (7th) column is the cookie’s content.

In order to support the correct sorting between sessions, we need to invent a way to pass on the creation time. My thinking is that we do this in a way that allows older libcurls still understand the file but just not see/understand the creation time, while newer versions will be able to get it. This would be possible by extending the expires field (the 6th) as it is a numerical value that the existing code will parse as a number and it will stop at the first non-digit character. We could easily add a character separation and store the
creation time after. Like:

Old expire time:

2345678

New expire+creation time:

2345678/1234567

This format might even work with other readers of this file format if they do similar assumptions on the data, but the truth is that while we picked the format in the first place to be able to exchange cookies with a well known and well used browser, no current browser uses that format anymore. I assume there are still a bunch of other tools that do, like wget and friends.

Update: as my friend Micah Cowan explains: we can in fact use the order of the cookie file as “creation time” hint and use that as basis for sorting. Then we don’t need to modify the file format. We just need to make sure to store them in time-order… Internally we will need to keep a line number or something per cookie so that we can use that for sorting.

[*] – I believe it sorts on path length, domain length and time of creation, but as soon as the -06 draft goes online it will be easy to read the exact phrasing. The existing -05 draft exists at: http://tools.ietf.org/html/draft-ietf-httpstate-cookie-05

Apple – only 391 days behind

In the curl project, we take security seriously. We work hard to make sure we don’t open up for security problems of any kind and once we fail, we work hard at analyzing the problem and coming up with a proper fix as swiftly as possible to make our “customer” as little vulnerable as possible.

Recently I’ve been surprised and slightly shocked by the fact that a lot of open source operating systems didn’t release any security upgrades to our most recent security flaw until well over a month after we first publicized the flaw. I’m not sure why they all reacted so slowly. Possibly it is because vendor-sec isn’t quite working as they were informed prior to the notification, and of course I don’t really expect many security guys to be subscribed to the curl mailing lists. Slow distros include Debian and Mandriva while Redhat did great.

Today however, I got a mail from Apple (and no, I don’t know why they send these mails to me but I guess they think I need them or something) with the subject “APPLE-SA-2010-03-29-1 Security Update 2010-002 / Mac OS X v10.6.3“. Aha! Did Apple now also finally update their curl version you might think?

They did. But they did not fix this problem. They fixed two previous problems universally known as CVE-2009-0037 and CVE-2009-2417. Look at the date of that first one. March 3, 2009. Yes, a whopping 391 days after the problem was first made public, Apple sends out the security update. Cool. At least they eventually fixed the problem…

An FTP hash command

Anthony Bryan strikes again. This time his name is attached to a new standards draft for how to get a hash checksum of a given file when using the FTP protocol. draft-bryan-ftp-hash-00 was published just a few days ago.

The idea is basically to introduce a spec for a new command named ‘HASH’ that a client can issue to a server to get a hash checksum for a given file in order to know that the file has the exact same contents you want before you even start downloading it or otherwise consider it for actions.

The spec details how you can ask for different hash algorithms, how the server announces its support for this in its FEAT response etc.

I’ve already provided some initial feedback on this draft, and I’ll try to assist Anthony a bit more to get this draft pushed onwards.

curl goes git

Just a few days ago the curl project turned twelve years old, and I decided that it was time for us to ditch our trusty old CVS setup and switch over to use git instead for source code control.

Why Switch at All

I’ve been very content with CVS over the years and in our small project we don’t really have any particularly weird or high demands on the version control software.

Lately (like in recent years) I’ve dipped my toes into various projects that have been using git, and more and more over time I’ve learned to appreciate the little goodies that git does that CVS simply cannot. I’m then not even speaking about branches or merges etc that git does a whole lot better and easier than CVS, I’m in fact even more in love with git’s way to ease handling with diffs sent by email and its great way of keeping track of authors separately from the committer etc. git am and git commit –author are simply two very handy tools missing in CVS.

Why Git

So if we want to switch from CVS to another tool what would we chose? That wasn’t really the question in my case so I didn’t answer it. In my case, it was rather that I’ve been using git in several projects and it is used in some of the biggest projects I work with so it was some git’s features I wanted. I didn’t consider any of the other distributed version tools as quite frankly: they wouldn’t be much better for me than what CVS already is. I want to reduce the number of different tools I need, and I’m quite sure anyway that git is one of the top contenders even if I would do an actual comparison.

So the choice to go git was quite selfish and done by me, but I felt that quite a few guys in the curl community supported this decision and very few actually believe remaining with CVS was a better idea.

The fact that git itself uses libcurl for its HTTP access of course also proves its good taste! 🙂

How did the conversion go

Very easy and swiftly. First, as I mentioned above we never used branches much so we basically had a linear development with a set of tags. I did an rsync of the full repo to get a local copy to work with, then I ran ‘git cvsimport’ on that to created a new repo. I did run it a couple of times to make sure I had done a correct mapping of all CVS user names to their git equivalents. Converting >10 years of CVS commits took roughly 10 minutes on my desktop machine so it wasn’t that tedious even.

Once I had a local repo created with all authors looking good, I simply followed the instructions on github.com on how to add a remote origin to a local branch and when I pushed to that, git sent off all commits ever made to curl to the remote repo now exposed to the world from github.com.

cURL

When that part was done, I did a quick read on the ‘git help daemon’ docs and 30 seconds later I had a local repo setup that is a mirror of the github one, so that users can still opt to get the code from haxx.se.

Unchanged work flow

Git allows different ways of working with the code, but I’ve decided that at least as a start we won’t change the way we work. I’ll offer all committers push rights to the master branch on the repository and we will simply all push to that, as our head development branch.

We will prefer patches made with git format-patch sent to the mailing list, but as before you can still produce patches by diffing source code using extracted tarballs or whatever approach you prefer.

All details on how to get the code for curl using git is available online.

two competitors or one united

In a discussion on the libssh2 mailing list, the founder and maintainer of the libssh project got involved. The actual discussion is not what I want to talk about here, but something he touched on in one of his mails:

“I think a bit of competition in open source is fair and leads to innovation”

Recall that there are only two existing SSH libraries written in C that are free and open. These two libs have roughly the same features, the same goals and a lot of other similarities.

So, does it help open source projects to have an identified and known competitor to compete against and to try to steal users from and to try to outperform in other ways? Is there any research done that proves this theory?

Or is it just so that we have a number of developers with a certain amount of time, and if we divide those in two or more groups we therefore make sure that neither is advancing as fast as they could if all those persons would participate in the same project? Open Source projects are so often driven and developed by volunteers, and within a given area (say people interested in a SSH library) there is only a given amount of people and to make the best use of that limited set, wouldn’t it be better if they all would work together in a single project?

Would OpenBSD, NetBSD and FreeBSD  have been better off by not having split up? Would KDE and GNOME have reached further today if they had been a united project?

I don’t know the answers here, and most probably the answer isn’t very clear or binary applying to all cases anyway as I bet all situations are slightly different and thus should be considered separately.

In the history of FOSS, many forked projects end up getting merged again but we don’t often see two independently created projects merge. I guess the Compiz and Beryl fusion is an exception. I think the sense of “my pet project” is often too strong, not to mention that different licenses and different development cultures make mergers hard to take place.

Look, I’m not really advocating that libssh2 and libssh should merge at this point. I’m just playing with the idea and trying to see the issue from different angles.

There are of course several things that would speak against projects to merge: different views on what licenses that are suitable, religious things such as how to indent source code or what build system to use. Quite possibly also other social aspects: development and team “culture” and behavior and why not just the “Not Invented Here” syndrome – it isn’t always that easy to give up what you made yourself or to appreciate someone else’s work.

140 foss hackers

At last, the first meeting with our recently started foss-sthlm effort took place. The amount of attention and attendance we achieved by far surpassed our wildest assumptions, and around 130-140 persons interested in open source showed up (we don’t know the exact number). The facilities in Kista where we held the event, were graciously let to our use by Stockholm’s University (DSV) and they were very good. MSC and Nohup were our two sponsors who donated great sub sandwiches and drinks to all of us. Thank you!

I’m glad we manage to offer this event completely free of charge to the attendees, and hey we had quality talkers speaking up on really interesting subject that I think the audience appreciated on the subjects of PostgreSQL, Upstart, Open Source Sweden, Rockbox and Debian packaging.

I did a 20 minute talk about curl – in Swedish. The slides are available (they are thus in Swedish too), see below, and hopefully there will soon be video available online with my presentation.

I also hope that we will gather all the slides at one single point to offer on the foss-sthlm web site, so check that out later on if you want the lot!

And here are Björn‘s slides:

The Rockbox future is an app

We rarely discuss what the future of Rockbox looks like, as it rarely actually matters. We work on whatever we think is fun and all things are fine.

Now, I’d like to do something different and actually lift my nose and stare towards the horizon. What’s in the future for Rockbox?

Dedicated mp3 players vanish

I claim that the current world of portable mp3 and music players is dying. Within just a few years, there will only be a few low-end players left being manufactured and most portable music will be done on much more capable (CPU and OS wise) devices such as the current smartphones, be it Android, Maemo or iPhone based ones.

While maintaining Rockbox to work and prosper on the existing targets is not a bad thing, the end of the line as a stand-alone firmware is in sight. I say the only viable future for Rockbox when the players go very advanced, is not to make Rockbox handle networking etc, it is to make sure that it can run and function as an app on these new devices and to take advantage of their existing network stacks etc.

HTC Magic

An app for that?

Rockbox as an app has been a story we’ve told the kids around the campfires for a good while by now and yet we haven’t actually seen it take off in any significant way. I’m now building up my own interest in working on making this happen. In a chat after my Rockbox talk at Fosdem 2010, two other core Rockbox developers (Zagor and gevaerts) seemed to agree to the general view that a Rockbox future involves it running as an app.

Out of the existing systems mentioned above, I’d prefer to start this work focused on Android. It has the widest company backing combined with open source and it’s also the most used open phone OS. I don’t think there’s anything that will prevent us from working on all those platforms as the back-bone should be able to remain the same and portable code we already have and use. Heck, it could then also become more of a regular app for common desktops too.

Challenges

Changing Rockbox to become an application instead of a full operating system and application suite involves a lot of changes. Some of the things we need to solve include:

  • We need to make sure we can use the OS’ native threading without any particular performance loss. The simulator shows that there are might be problems with this right in the current code as it runs dog slow on a Nokia n900.
  • The full suite of plugins and games and so on doesn’t really fit that well when Rockbox is just an app in itself.
  • Different screen sizes. Lots of stuff in Rockbox assume a compile-time fixed screen size, while a good phone app would have to be able to work with whatever size the particular happens to feature. I don’t think we need to care about live resizing (yet).

Rockbox

Update, July 2010: The app part II.

My Rockbox talk at Fosdem 2010

As I’ve mentioned several times before in this blog, I did a talk about Rockbox and reverse engineering at Fosdem 2010 Feburary 6-7 in Brussels, and since there was no “pre-arranged” video recording of the talks in the embedded devroom, Peter D’Hoye stepped up and recorded the whole thing using his Nokia n900 phone.

I decided to not make the slides for this talk available separately, as they were more or less the same as the ones I used for my FSCONS 2009 talk, so you can go watch them instead if this video isn’t enough!

Rockbox-talk-Fosdem2010

To view it, I suggest you use VLC or similar and tell it to stream directly from one of these URLs, the file is a 1.1GB one with 848×480 resolution running for 51 minutes. Annoyingly, none of the usual free online video services allow this long ones.

http://www.qnapclub.be/rockbox/fosdem2010_rockbox.mp4

http://download.rockbox.org/movies/fosdem2010_rockbox.mp4

a big curl forward

We’re proudly presenting a major new release of curl and libcurl and we call it 7.20.0.

The primary reason we decided to bump the minor number this time was that we introduce a range of new protocols, but we also did some other rather big works. This is the biggest update to curl and libcurl that have been made in recent years. Let me mention some of the other noteworthy changes and bugfixes:

We fixed a potential security issue, that would occur if an application requested to download compressed HTTP content and told libcurl to automatically uncompress it (CURLOPT_ENCODING) as then libcurl could wrongly call the write callback (CURLOPT_WRITEFUNCTION) with a larger buffer than what is documented to be the maximum size.

TFTP was finally converted to a “proper” protocol internally. By that I mean that it can now be used with the multi interface in an asynchronous way and it has far less special treatments. It is now “just another protocol” basically and that is a good thing. Also, the BLKSIZE problem with TFTP that has haunted us for a while was fixed so I really think this is the best version ever for TFTP in libcurl.

In several different places in the code older versions of libcurl didn’t properly call the progress callback while waiting for some special event to happen. This made the curl tool’s progress meter less responding but perhaps more importantly it prevented apps that use libcurl to abort the transfer during those phases. The affected periods included the ftp connection phase (including the initial FTP commands and responses), waiting for the TCP connect to complete and resolving host names using c-ares.

The DNS cache was found to have at least two bugs that could make entries linger in the database eternally and in another case too long. For apps that use a lot of connections to a lot of hosts, these problems could result in some serious performance punishments when the DNS cache lookups got slower and slower over time.

Users of the funny ftp server drftpd will appreciate that (lib)curl now support the PRET command, which is needed when getting data off such servers in passive mode. It’s a bit of a hack, but what can we do? We didn’t invent it nor can we help that it’s a popular thing to use! 😉

cURL

curl, open source and networking