Category Archives: cURL and libcurl

curl and/or libcurl related

curl goes git

Just a few days ago the curl project turned twelve years old, and I decided that it was time for us to ditch our trusty old CVS setup and switch over to use git instead for source code control.

Why Switch at All

I’ve been very content with CVS over the years and in our small project we don’t really have any particularly weird or high demands on the version control software.

Lately (like in recent years) I’ve dipped my toes into various projects that have been using git, and more and more over time I’ve learned to appreciate the little goodies that git does that CVS simply cannot. I’m then not even speaking about branches or merges etc that git does a whole lot better and easier than CVS, I’m in fact even more in love with git’s way to ease handling with diffs sent by email and its great way of keeping track of authors separately from the committer etc. git am and git commit –author are simply two very handy tools missing in CVS.

Why Git

So if we want to switch from CVS to another tool what would we chose? That wasn’t really the question in my case so I didn’t answer it. In my case, it was rather that I’ve been using git in several projects and it is used in some of the biggest projects I work with so it was some git’s features I wanted. I didn’t consider any of the other distributed version tools as quite frankly: they wouldn’t be much better for me than what CVS already is. I want to reduce the number of different tools I need, and I’m quite sure anyway that git is one of the top contenders even if I would do an actual comparison.

So the choice to go git was quite selfish and done by me, but I felt that quite a few guys in the curl community supported this decision and very few actually believe remaining with CVS was a better idea.

The fact that git itself uses libcurl for its HTTP access of course also proves its good taste! 🙂

How did the conversion go

Very easy and swiftly. First, as I mentioned above we never used branches much so we basically had a linear development with a set of tags. I did an rsync of the full repo to get a local copy to work with, then I ran ‘git cvsimport’ on that to created a new repo. I did run it a couple of times to make sure I had done a correct mapping of all CVS user names to their git equivalents. Converting >10 years of CVS commits took roughly 10 minutes on my desktop machine so it wasn’t that tedious even.

Once I had a local repo created with all authors looking good, I simply followed the instructions on github.com on how to add a remote origin to a local branch and when I pushed to that, git sent off all commits ever made to curl to the remote repo now exposed to the world from github.com.

cURL

When that part was done, I did a quick read on the ‘git help daemon’ docs and 30 seconds later I had a local repo setup that is a mirror of the github one, so that users can still opt to get the code from haxx.se.

Unchanged work flow

Git allows different ways of working with the code, but I’ve decided that at least as a start we won’t change the way we work. I’ll offer all committers push rights to the master branch on the repository and we will simply all push to that, as our head development branch.

We will prefer patches made with git format-patch sent to the mailing list, but as before you can still produce patches by diffing source code using extracted tarballs or whatever approach you prefer.

All details on how to get the code for curl using git is available online.

140 foss hackers

At last, the first meeting with our recently started foss-sthlm effort took place. The amount of attention and attendance we achieved by far surpassed our wildest assumptions, and around 130-140 persons interested in open source showed up (we don’t know the exact number). The facilities in Kista where we held the event, were graciously let to our use by Stockholm’s University (DSV) and they were very good. MSC and Nohup were our two sponsors who donated great sub sandwiches and drinks to all of us. Thank you!

I’m glad we manage to offer this event completely free of charge to the attendees, and hey we had quality talkers speaking up on really interesting subject that I think the audience appreciated on the subjects of PostgreSQL, Upstart, Open Source Sweden, Rockbox and Debian packaging.

I did a 20 minute talk about curl – in Swedish. The slides are available (they are thus in Swedish too), see below, and hopefully there will soon be video available online with my presentation.

I also hope that we will gather all the slides at one single point to offer on the foss-sthlm web site, so check that out later on if you want the lot!

And here are Björn‘s slides:

a big curl forward

We’re proudly presenting a major new release of curl and libcurl and we call it 7.20.0.

The primary reason we decided to bump the minor number this time was that we introduce a range of new protocols, but we also did some other rather big works. This is the biggest update to curl and libcurl that have been made in recent years. Let me mention some of the other noteworthy changes and bugfixes:

We fixed a potential security issue, that would occur if an application requested to download compressed HTTP content and told libcurl to automatically uncompress it (CURLOPT_ENCODING) as then libcurl could wrongly call the write callback (CURLOPT_WRITEFUNCTION) with a larger buffer than what is documented to be the maximum size.

TFTP was finally converted to a “proper” protocol internally. By that I mean that it can now be used with the multi interface in an asynchronous way and it has far less special treatments. It is now “just another protocol” basically and that is a good thing. Also, the BLKSIZE problem with TFTP that has haunted us for a while was fixed so I really think this is the best version ever for TFTP in libcurl.

In several different places in the code older versions of libcurl didn’t properly call the progress callback while waiting for some special event to happen. This made the curl tool’s progress meter less responding but perhaps more importantly it prevented apps that use libcurl to abort the transfer during those phases. The affected periods included the ftp connection phase (including the initial FTP commands and responses), waiting for the TCP connect to complete and resolving host names using c-ares.

The DNS cache was found to have at least two bugs that could make entries linger in the database eternally and in another case too long. For apps that use a lot of connections to a lot of hosts, these problems could result in some serious performance punishments when the DNS cache lookups got slower and slower over time.

Users of the funny ftp server drftpd will appreciate that (lib)curl now support the PRET command, which is needed when getting data off such servers in passive mode. It’s a bit of a hack, but what can we do? We didn’t invent it nor can we help that it’s a popular thing to use! 😉

cURL

httpstate cookie domain pains

Back in 2008, I wrote about when Mozilla started to publish their effort the “public suffixes list” and I was a bit skeptic.

Well, the problem with domains in cookies of course didn’t suddenly go away and today when we’re working in the httpstate working group in the IETF with documenting how cookies work, we need to somehow describe how user agents tend to work with these and how they should work with them. It’s really painful.

The problem in short, is that a site that is called ‘www.example.com’ is allowed to set a cookie for ‘example.com’ but not for ‘com’ alone. That’s somewhat easy to understand. It gets more complicated at once when we consider the UK where ‘www.example.co.uk’ is fine but it cannot be allowed to set a cookie for ‘co.uk’.

So how does a user agent know that co.uk is magic?

Firefox (and Chrome I believe) uses the suffixes list mentioned above and IE is claimed to have its own (not published) version and Opera is using a trick where it tries to check if the domain name resolves to an IP address or not. (Although Yngve says Opera will soon do online lookups against the suffix list as well.) But if you want to avoid those tricks, Adam Barth’s description on the http-state mailing list really is creepy.

For me, being a libcurl hacker, I can’t of course but to think about Adam’s words in his last sentence: “If a user agent doesn’t care about security, then it can skip the public suffix check”.

Well. Ehm. We do care about security in the curl project but we still (currently) skip the public suffix check. I know, it is a bit of a contradiction but I guess I’m just too stuck in my opinion that the public suffixes list is a terrible solution but then I can’t figure anything that will work better and offer the same level of “safety”.

I’m thinking perhaps we should give in. Or what should we do? And how should we in the httpstate group document that existing and new user agents should behave to be optimally compliant and secure?

(in case you do take notice of details: yes the mailing list is named http-state while the working group is called httpstate without a dash!)

an application protocol monster

During the autumn 2009 I was sponsored by a company to work on some new protocols for libcurl to support – IMAP, POP3, SMTP and their TLS-powered versions. It took me a little while to get things working the way I’d like them to due work load elsewhere and other irrelevant distractions.

In the next curl and libcurl release, to be called version 7.20.0, which if everything runs fine and according to plan might happen at the end of January 2010 or possibly a little later, libcurl will truly have converted from a file transfer library into a full fledged application protocol layer monster.

The current incarnation of my development libcurl supports the following 18(!) protocols:

tftp ftp telnet dict ldap ldaps http file https ftps scp sftp imap imaps pop3 pop3s smtp smtps

While SSL versions of protocols are arguably not separate protocols, the 12 protocols in the list without SSL are still many in my view.

The fundamentals of libcurl remain the same though: specify a URL to operate against. Send or receive data. Sometimes both send and receive in the same request.

Internally in libcurl, I converted a big portion of the FTP-specific code into a more generic “pingpong”-generic code which is now designed to work similarly with all these new protocols that share many similarities with FTP. They are all sending commands to the server and get responses back, in similar ways.

As before, the ability to disable specific protocols when building libcurl remains so for those who don’t particularly care about these new ones and want to maintain building a library that is as small and lean as possible, there should be little extra weight due to this recent development. I’ve been pondering but I’ve not yet figured out the most perfect way to deal with such build options in the code so they are still a bit too #if and #ifdef intensive for my taste…

Meanwhile, Chris Conroy has been busy in his end implementing support for RTSP that also soon might be ripe for inclusion in the main code.

Not too surprisingly perhaps, the curl tool also then gets these protocol abilities so it becomes even more than before the Swiss army knife tool for internet protocols and then of course explicitly application layer ones.

cURL

curl talk at foss-sthlm

I’m now scheduled to do a short talk on curl and the project on the foss-sthlm meeting in Stockholm on February 24th! As you can see on the site, there’s also a set of other fun subjects around Free and Open Source Software.

The material on the site is all in Swedish and all talks are expected to be mostly in Swedish.

Our merry foss-sthlm effort has really taken off in a great way and more than 50 persons have already signed up to show up at the meeting and we have 5 other speakers apart from myself lined up. The program isn’t really fixed yet, but it certainly looks like it’ll end up at least mostly the way it currently looks.

If you are in the area and have an interest in FOSS, consider showing up!

Oh, and my brother Björn is scheduled to talk about Rockbox at the same event.

Adapting to being behind

For many years I’ve always kept up to speed with my commitments in my primary open source projects. I’ve managed to set aside enough time to close the bug reports as fast as they have poured in. This, while still having time to work on new features every now and then.

During this last year (or so) however, I’ve come to realize that I no longer can claim to be in that fortunate position and I now find myself seeing the pile of open bugs get bigger and bigger over time. I get more bug reports than I manage to close.

There are of course explanations for this. In both ends of the mix actually. I’ve got slightly less time due my recent decision to go working for Haxx full-time, and how I’ve decided to focus slightly more on paid work which thus leads to me having less time for the unpaid work I’m doing.

Also, I’ve seen activity raise in the curl project, in the libssh2 project and in the c-ares project. All of these projects have the same problem of various degrees: a lack of participating developers working on fixing bugs. Especially bugs reported by someone else.

Since this situation is still fairly new to me, I need to learn on how to adapt to it. How to deal with a stream of issues that is overwhelming and I must select what particular things I care about and what to “let through”. This of course isn’t ideal for the projects but I can’t do much more than proceed to the best of my ability, to try to make people aware of that this is happening and try to get more people involved to help out!

Don’t get fooled by my focus on “time” above. Sometimes I even plainly lack the energy necessary to pull through. It depends a lot on the tone or impression I get from the report or reporter how I feel, but when a reporter is rude or just too “demanding” (like constantly violating the mailing list etiquette or just leaving out details even when asked) I can’t but help to feel that at times working as a developer during my full-day paid hours can make it a bit hard to then work a couple of hours more in the late evening debugging further.

The upside, let’s try to see it as a positive thing, is that now I can actually “punish” those that clearly don’t deserve to get helped since I now focus on the nice people, the good reports, the ones which seem to be written by clever people with an actual interest to see their problems addressed. Those who don’t do their part I’ll just happily ignore until they shape up.

I will deliberately just let issues “slip through” and not get my attention and require that if they are important enough people will either report it again, someone else will step up and help fix them or perhaps someone will even consider paying for the fix.

curl and libcurl 7.19.7

Time again for a happy release event. Can you believe  this is in fact the 113th release?cURL

Run over to the curl download page to get it!

This time, we bring happiness with the best curl and libcurl release ever and it features four changes and a range of bug fixes. The changes to note this time include:

And a collection of bugs fixed since the previous release involves these issues:

  • The windows makefiles work again
  • libcurl-NSS acknowledges verifyhost
  • SIGSEGV when pipelined pipe unexpectedly breaks
  • data corruption issue with re-connected transfers
  • use after free if we’re completed but easy_conn not NULL (pipelined)
  • missing strdup() return code check
  • CURLOPT_PROXY_TRANSFER_MODE could pass along wrong syntax
  • configure –with-gnutls=PATH fixed
  • ftp response reader bug on failed control connections
  • improved NSS error message on failed host name verifications
  • ftp NOBODY on re-used connection hang
  • configure uses pkg-config for cross-compiles as well
  • improved NSS detection in configure
  • cookie expiry date at 1970-jan-1 00:00:00
  • libcurl-OpenSSL failed to verify some certs with Subject Alternative Name
  • libcurl-OpenSSL can load CRL files with more than one certificate inside
  • received cookies without explicit path got saved wrong if the URL had a query part
  • don’t shrink SO_SNDBUF on windows for those who have it set large already
  • connect next bug
  • invalid file name characters handling on Windows
  • double close() on the primary socket with libcurl-NSS
  • GSS negotiate infinite loop on bad credentials
  • memory leak in SCP/SFTP connections
  • use pkg-config to find out libssh2 installation details in configure
  • unparsable cookie expire dates make cookies get treated as session coookies
  • POST with Digest authentication and “Transfer-Encoding: chunked”
  • SCP connection re-use with wrong auth
  • CURLINFO_CONTENT_LENGTH_DOWNLOAD for 0 bytes transfers
  • CURLINFO_SIZE_DOWNLOAD for ldap transfers (-w size_download)

libcurl in version management

Already before, I’ve mentioned that libcurl is becoming popular within package management.

libcurllibcurl is a generic library for file transfers over a wide variety of protocols. Over the years, some of the recent ditributed version management softwares have learned about libcurl’s powers and they now use it:

darcs – was born in 2003 and is written in Haskell. I’m under the impression these guys wrote their own binding layer to interface libcurl from Haskell.

git – possibly best known for being created by Linus Torvalds and being used by the Linux kernel project, is using libcurl for HTTP(S) accesses.

bazaar – is written in Python and accordingly uses the pycurl binding for libcurl.

Anyone know of other version control systems using libcurl?

Ironies here include that libcurl itself is still kept within a CVS respository, and also quite possibly that the first version management project I myself participated is Subversion and that not only has two different HTTP dependencies, but none of those two are libcurl (they are neon and serf)…

Update: it seems that Mercurial is also using pycurl as an optional dependency.

How much for a bug?

no bugsWarning: blog post with no clear conclusion!

I offer support deals to companies that want to get help with Open Source programs I’ve contributed to. The deals I’ve made so far have primarily involved libcurl, c-ares or libssh2, but that’s basically because those are projects in which I participate a lot in (and maintain) so people find me easily in relation to those projects.

I wouldn’t mind accepting service and support deals for other projects or software products either, as long as they are products I know and am fairly familiar with already and I am not scared of digging in and fixing things under the hood when that is required.

In fact, I could very well consider to offer to fix bugs in any Open Source software. Like a general: if you have a bug in an open source project that you really want fixed and you can’t do it yourself I might be your man. Of course this would be limited to some certain kinds of projects and programs, but it could still include a wide range of software. A lot more than the ones I happen to be involved in at any particular point in time.

But while “a bug” is a fairly easily defined term to a user who can’t make something work in a given program it can be anything from dead simple to downright impossible for a developer to fix. The fact that users many times cannot determine if a “bug” is hard or easy, if it’s a bug or a feature not working on purpose, makes such a business deal very hard to provide.

How to pay to get a bug fixed?

Fixed price per bug? Presumably only tricky bugs would be considered for this so it would require a fairly high fixed price. But then it’ll also never be used for simple bugs either since the fixed price would scare away such use cases. I don’t think a fixed-price scheme works very well for this.One dollar bill

Then we only have a variable price approach left. A common way for a consultant like me is to charge for my time spent on a project: I set an hourly rate, I fix the issue in N hours. I charge hourly rate * N. For smallish projects, this is less attractive to customers. If we have no previous relationship, there’s a trust issue where the customer might not just blindly accept that I worked 10 hours on a task they think sounds easy so they feel overcharged. Also, there’s the risk that I estimate the job to be 2 hours but end up spending 12. My conclusion is that per-hour pricing doesn’t work for this either.

A variable price approach based on something else than number of hours it took for me to fix the problem is therefore needed.

A bug fix is of course worth whatever someone is willing to pay for it. But we don’t know what they are prepared to pay. On the other end, a bug fix can get done by someone for the price he/she is willing to accept to get the job done. So where is the cross section of those two unknown graphs?

I don’t have the answer here. I’m very interested in feedback and suggestions though. If you would pay for a bug fix, how would you like to get the price set?