Category Archives: cURL and libcurl

curl and/or libcurl related

Three out of one hundred

If I’m not part of the solution, I’m part of the problem and I don’t want to be part of the problem. More specifically, I’m talking about female presence in tech and in particular in open source projects.

3 out of 100I’ve been an open source and free software hacker, contributor and maintainer for almost 20 years. I’m the perfect stereo-type too: a white, hetero, 40+ years old male living in a suburb of a west European city. (I just lack a beard.) I’ve done more than 20,000 commits in public open source code repositories. In the projects I maintain, and have a leading role in, and for the sake of this argument I’ll limit the discussion to curl, libssh2, and c-ares, we’re certainly no better than the ordinary average male-dominated open source projects. We’re basically only men (boys?) talking to other men and virtually all the documentation, design and coding is done by male contributors (to a significant degree).

Sure, we have female contributors in all these projects, but for example in the curl case we have over 850 named contributors and while I’m certainly not sure who is a woman and who is not when I get contributions, there’s only like 10 names in the list that are typically western female names. Let’s say there are 20. or 30. Out of a total of 850 the proportions are devastating no matter what. It might be up to 3%. Three. THREE. I know women are under-represented in technology in general and in open source in particular, but I think 3% is even lower than the already low bad average open source number. (Although, some reports claim the number of female developers in foss is as low as just above 1%, geekfeminism says 1-5%).

Numbers

Three percent. (In a project that’s been alive and kicking for thirteen years…) At this level after this long time, there’s already a bad precedent and it of course doesn’t make it easier to change now. It is also three percent of the contributors when we consider all contributors alike. If we’d count the number of female persons in leading roles in these projects, the amount would be even less.

It could be worth noting that we don’t really have any recent reliable stats for “real world” female share either. Most sources that I find on the Internet and people have quoted in talks tend to repeat old numbers that were extracted using debatable means and questions. The comparisons I’ve seen repeated many times on female participation in FOSS vs commercial software, are very often based on stats that are really not comparable. If someone has reliable and somewhat fresh data, please point them out for me!

“Ghosh, R. A.; Glott, R.; Krieger, B.;
Robles, G. 2002. Free/Libre and Open Source Software: Survey and Study. Part
IV: Survey of Developers. Maastricht: International Institute of Infonomics
/Merit.

A design problem of “the system”

I would blame “the system”. I’m working in embedded systems professionally as a consultant and contract developer. I’ve worked as a professional developer for some 20 years. In my niche, there’s not even 10% female developers. A while ago I went through my past assignments in order to find the last female developer that I’ve worked with, in a project, physically located in the same office. The last time I met a fellow developer at work who was female was early 2007. I’ve worked in 17 (seventeen!) projects since then, without even once having had a single female developer colleague. I usually work in smaller projects with like 5-10 people. So one female in 18 projects makes it something like one out of 130 or so. I’m not saying this is a number that is anything to draw any conclusions from since it just me and my guesstimates. It does however hint that the problem is far beyond “just” FOSS. It is a tech problem. Engineering? Software? Embedded software? Software development? I don’t know, but I know it is present both in my professional life as well as in my volunteer open source work.

Geekfeminism says the share is 10-30% in the “tech industry”. My experience says the share gets smaller and smaller the closer to “the metal” and low level programming you get – but I don’t have any explanation for it.

Fixing the problems

What are we (I) doing wrong? Am I at fault? Is the the way I talk or the way we run these projects in some subtle – or obvious – ways not friendly enough or downright hostile to women? What can or should we change in these projects to make us less hostile? The sad reality is that I don’t think we have any such fatal flaws in our projects that create the obstacles. I don’t think many females ever show up near enough the projects to even get mistreated in the first place.

I have a son and I have a daughter – they’re both still young and unaware of this kind of differences and problems. I hope I will be able to motivate and push and raise them equally. I don’t want to live in a world where my daughter will have a hard time to get into tech just because she’s a girl.

libspdy

SPDY is a neat new protocol and possible contender to replace HTTP – at least in some areas and for some use cases. SPDY has been invented and developed mostly by Google engineers.

SPDY allows better usage of fewer TCP connections (since it sends multiple logical streams over a single physical TCP connection) and it helps clients overcome problems with TCP (like how a new connection starts slowly) while at the same time reducing latency and bandwidth requirements. Very similar to how channels are handled over an SSH connection.SPDY

Chrome of course already supports SPDY and Firefox has some early experimental support being worked on.

Of course there are also legitimate criticisms against SPDY as well, including subjects like how it makes caching proxies impossible (because everything goes over SSL), how it makes debugging a lot harder by using compressed headers, how it is impossible to extract just a single header from the stream due to its compression approach and how the compression state buffers make each individual stream use more memory than plain old HTTP (plain TCP) ones.

We can expect SPDY<=>HTTP gateways to appear so that nobody gets locked into either side of these protocols.

SPDY will provide faster transfers. libcurl is currently used for speed reasons in many cases. To me, it makes perfect sense to have libcurl use and try to use SPDY instead of HTTP exactly like how the browsers are starting to do it, so that the libcurl using applications will get their contents transferred faster.

My thinking is that we introduce some new magic option(s) that makes libcurl use SPDY, and for normal easy interface transfers it will remain to use a single connection for each new SPDY transfer, but if you use the multi interface and you enable pipelining you’ll instead make libcurl do multiple transfers over the same single SPDY connection (as long as you speak with the same server and port etc). From an application’s stand-point it shouldn’t make any difference, apart from being faster than otherwise. Just like we want it!

Implementation wise, I would like to use a reliable and efficient third-party library for the actual SPDY implementation. If there doesn’t exist any, we make one and run that one independently. I found libspdy, but I found some concerns about it (no mailing list, looks like one-man project, not C89 compliant, no API docs etc). I mailed the libspdy author, I hoping we’d sort out my doubts and then I’d base my continued work on that library.

After some time Thomas Roth, primary libspdy author, responded and during our subsequent email exchange I’ve gotten a restored faith and belief in this library and its direction. Not only did he fix the C89 compliance pretty quickly, he is also promising rather big changes that are pending to get committed within a week or so.

Comforted by what I’ve learned from Thomas, I’ll wait for his upcoming changes and I’ll join the soon to be created mailing list for the libspdy project and I’ll contribute some ideas and efforts to help shape it into the fine SPDY library we all want. I can only encourage other fellow SPDY library interested persons to do the same!

Updated: Join the SPDY library development

curl meetup at Fosdem 2012

The FOSDEM 2012 dates were recently revealed (4-5 February 2012).

A pint of guinness

I’d be happy to arrange a get-together for libcurl hackers at Fosdem this year. To me, Brussels, Belgium seems mid-europe enough to be able to attract a bunch of us:

  • libcurl application users/authors
  • libcurl binding hackers
  • libcurl contributors
  • … and everyone else who’s doing related activities or who just is interested

Potential subjects to discuss at such a meeting:

  • what’s the most important stuff libcurl still lacks?
  • what’s the least documented/understood parts of libcurl?
  • are there shared problems several/many libcurl bindings have to solve?
  • can we improve how we work/develop libcurl and bindings?
  • what kind of beer is best at a curl meetup?
  • [fill in your own curl related subject]

I would like at least 4-5 people voicing interest for this to be worthwhile for me to actually try to do anything. Please speak up on the libcurl mailing list, tweet me or mail me privately! The more people that are interested, the more planning and stuff we’ll do for it.

curl 7.22.0

Another release of curl and libcurl just happened. 7.22.0 is released.

Apart from the 28 something documented bug fixes, we introduce a range of changes that could be noteworthy:

  • Added CURLOPT_GSSAPI_DELEGATION – remember that we explicitly disabled GSSAPI delegation in our previous release due to a security problem. Now we introduce an option for the application to control exactly how to behave.
  • Added support for NTLM delegation to Samba’s winbind daemon helper ntlm_auth. This lets libcurl use the external helper program to do things like NTLM single-sign on.
  • Display notes from setup file in testcurl.pl – provides a way for test clients to provide more information back to the centralized test summary on the primary server.
  • BSD-style lwIP TCP/IP stack experimental support on Windows – there are still flaws in lwIP on windows that prevents it from working properly
  • OpenSSL: Use SSL_MODE_RELEASE_BUFFERS if available – this is basically a way to ask OpenSSL to use less memory
  • –delegation was added to set CURLOPT_GSSAPI_DELEGATION – simply the new option exported to the command line tool
  • nss: start with no database if the selected database is broken – a slightly modified behavior
  • telnet: allow programatic use on Windows – basically making the windows implementation in sync with how the non-windows version already has worked for quite some time

This release is this great thanks to 25 friendly contributors.

cURL

A libcurl postergirl?

google for libcurl

If you click the image you’ll see a full-resolution screendump for my recent search for “libcurl” on google. Where did that (image of a) girl come from? Judging from where it appears on the results page right next to the information about the cURL project you can’t but assume that she’s somehow related to the project.

That’s of course not true. When moving the mouse over the image I get a tooltip with a funny “hair curling” URL and that’s also where a click on the image takes me.

A mighty weird way of presenting a search result if you ask me!

US patent 6,098,180

(I am not a lawyer, this is not legal advice and these are not legal analyses, just my personal observations and ramblings. Please correct me where I’m wrong or add info if you have any!)

At 3:45 pm on March 18th 2011, the company Content Delivery Solutions LLC filed a complaint in a court in Texas, USA. The defendants are several bigwigs and the list includes several big and known names of the Internet:

  • Akamai
  • AOL
  • AT&T
  • CD Networks
  • Globalscape
  • Google
  • Limelight Networks
  • Peer 1 Network
  • Research In Motion
  • Savvis
  • Verizon
  • Yahoo!

The complaint was later amended with an additional patent (filed on April 18th), making it list three patents that these companies are claimed to violate (I can’t find the amended version online though). Two of the patents ( 6,393,471 and 6,058,418) are for marketing data and how to use client info to present ads basically. The third is about file transfer resumes.

I was contacted by a person involved in the case at one of the defendants’. This unspecified company makes one or more products that use “curl“. I don’t actually know if they use the command line tool or the library – but I figure that’s not too important here. curl gets all its superpowers from libcurl anyway.

This Patent Troll thus basically claims that curl violates a patent on resumed file transfers!

The patent in question that would be one that curl would violate is the US patent 6,098,180 which basically claims to protect this idea:

A system is provided for the safe transfer of large data files over an unreliable network link in which the connection can be interrupted for a long period of time.

The patent describes several ways in how it may detect how it should continue the transfer from such a break. As curl only does transfer resumes based on file name and an offset, as told by the user/application, that could be the only method that they can say curl would violate of their patent.

The patent goes into detail in how a client first sends a “signature” and after an interruption when the file transfer is about to continue, the client would ask the server about details of what to send in the continuation. With a very vivid imagination, that could possibly equal the response to a FTP SIZE command or the Content-Length: response in a HTTP GET or HEAD request.

A more normal reader would rather say that no modern file transfer protocol works as described in that patent and we should go with “defendant is not infringing, move on nothing to see here”.

But for the sake of the argument, let’s pretend that the patent actually describes a method of file transfer resuming that curl uses.

The ‘180 (it is referred to with that name within the court documents) patent was filed at February 18th 1997 (and issued on August 1, 2000). Apparently we need to find prior art that was around no later than February 17th 1996, that is to say one year before the filing of the stupid thing. (This I’ve been told, I had no idea it could work like this and it seems shockingly weird to me.)

What existing tools and protocols did resumed transfers in February 1996 based on a file name and a file offset?

Lots!

Thank you all friends for the pointers and references you’ve brought to me.

  • The FTP spec RFC 959 was published in October 1985. FTP has a REST command that tells at what offset to “restart” the transfer at. This was being used by FTP clients long before 1996, and an example is the known Kermit FTP client that did offset-based file resumed transfer in 1995.
  • The HTTP header Range: introduces this kind of offset-based resumed transfer, although with a slightly fancier twist. The Range: header was discussed before the magic date, as also can be seen on the internet already in this old mailing list post from December 1995.
  • One of the protocols from the old days that those of us who used modems and BBSes in the old days remember is zmodem. Zmodem was developed in 1986 and there’s this zmodem spec from 1988 describing how to do file transfer resumes.
  • A slightly more modern protocol that I’ve unfortunately found no history for before our cut-off date is rsync, as I could only find the release mail for rsync 1.0 from June 1996. Still long before the patent was filed obviously, and also clearly showing that the one year margin is silly as for all we know they could’ve come up with the patent idea after reading the rsync releases notes and still rsync can’t be counted as prior art.
  • Someone suggested GetRight as a client doing this, but GetRight wasn’t released in 1.0 until Febrary 1997 so unfortunately that didn’t help our case even if it seems to have done it at the time.
  • curl itself does not pre-date the patent filing. curl was first released in March 1998, and the predecessor was started around summer-time 1997. I don’t have any remaining proofs of that, and it still wasn’t before “the date” so I don’t think it matters much now.

At the time of this writing I don’t know where this will end up or what’s going to happen. Time will tell.

This Software patent obviously is a concern mostly to US-based companies and those selling products in the US. I am neither a US citizen nor do I have or run any companies based in the US. However, since curl and libcurl are widely used products that are being used by several hundred companies already, I want to help bring out as much light as possible onto this problem.

The patent itself is of course utterly stupid and silly and it should never have been accepted as it describes trivially thought out ideas and concepts that have been thought of and implemented already decades before this patent was filed or granted although I claim that the exact way explained in the patent is not frequently used. Possibly the protocol using a method that is closed to the description of the patent is zmodem.

I guess I don’t have to mention what I think about software patents.

I’m convinced that most or all download tools and browsers these days know how to resume a previously interrupted transfer this way. Why wouldn’t these guys also approach one of the big guys (with thick wallets) who also use this procedure? Surely we can think of a few additional major players with file tools that can resume file transfers and who weren’t targeted in this suit!

I don’t know why. Clearly they’ve not backed down from attacking some of the biggest tech and software companies.

patent drawing

(Illustration from the ‘180 patent.)

libcurl’s name resolving

Recently we’ve put in some efforts into remodeling libcurl’s code that handles name resolves, and then in particular the two asynchronous name resolver backends that we support: c-ares and threaded.

Name resolving in general in libcurl

libcurl can be built to do name resolves using different means. The primary difference between them is that they are either synchronous or asynchronous. The synchronous way makes the operation block during name resolves and there’s no “decent” way to abort the resolves if they take longer time than the program wants to allow it (other than using signals and that’s not what we consider a decent way).

Asynch resolving in libcurl

This is done using one of two ways: by building libcurl with c-ares support or by building libcurl and tell it to use threads to solve the problem. libcurl can be built using either mechanism on just about all platforms, but on Windows the build defaults to using the threaded resolver.

The c-ares solution

c-ares’ primary benefit is that it is an asynchronous name resolver library so it can do name resolves without blocking without requiring a new thread. It makes it use less resources and remain a perfect choice even if you’d scale up your application up to and beyond an insane number of simultaneous connections. Its primary drawback is that since it isn’t based on the system default name resolver functions, they don’t work exactly like the system name resolver functions and that causes trouble at times.

The threaded solution

By making sure the system functions are still used, this makes name resolving work exactly as with the synchronous solution, but thanks to the threading it doesn’t block. The downside here is of course that it uses a new thread for every name resolve, which in some cases can become quite a large number and of course creating and killing threads at a high rate is much more costly than sticking with the single thread.

Pluggable

Now we’ve made sure that we have an internal API that both our asynchronous name resolvers implement, and all code internally use this API. It makes the code a lot cleaner than the previous #ifdef maze for the different approaches, and it has the side-effect that it should allow much easier pluggable backends in case someone would like to make libcurl support another asynchronous name resolver or system.

This is all brand new in the master branch so please try it out and help us polish the initial quirks that may still exist in the code.

There is no current plan to allow this plugging to happen run-time or using any kind of external plugins. I don’t see any particular benefit for us to do that, but it would give us a lot more work and responsibilities.

cURL

HTTP transfer compression

HTTP is a protocol that looks simple in its simplest form and its readability can easily fool you into believing an implementation is straight forward and quickly done.

That’s not the reality though. HTTP is a very big protocol with lots of corners and twisting mazes that one can get lost in. Even after having been the primary author of curl for 13+ years, there are still lots of HTTP things I don’t master.

To name an example of an area with little known quirks, there’s a funny situation when it comes to how HTTP supports and doesn’t support compression of data and compression of data in transfer.

No header compression

A little flaw in HTTP in regards to compression is that there’s no way to compress headers, in either direction. No matter what we do, we must send the text as-is and both requests and responses are sometimes very big these days. Especially taken into account how cookies are always inserted in requests if they match. Anyway, this flaw is nothing we can do anything about in HTTP 1.1 so we need to live with it.

On the other side, compression of the response body is supported.

Compressing data

Compression of data can be done in two ways: either the actual transfer is compressed or the body data is compressed. The difference is subtle, but when the body data is compressed there’s really nothing that mandates that the client has to uncompress it for the end user, and if the transfer is compressed the receiver must uncompress it in order to deal with the transfer properly.

For reasons that are unknown to me, HTTP clients and servers started out supporting compression only using the Content-Encoding style. It means that the client tells the server what kind of content encodings it supports (using Accept-Encoding:) and the server then sends the response data using one of the supported encodings. The client then decides on its own that if it gets the content in one of the compressed formats that it said it can handle, it will automatically uncompress that on arrival.

The HTTP protocol designers however intended this kind of automatic compression and subsequent uncompress to be done using Transfer-Encoding, as the end result is the completely transparent and the uncompress action is implied and intended by the protocol design. This is done by the client telling the server what transfer encodings it supports with the TE: header and the server adds a Transfer-Encoding: header in the response telling how the transfer is encoded.

HTTP 1.1 introduced a mandatory encoding that all servers can use whenever they feel like it: chunked encoding, so all HTTP 1.1 clients already have to deal with Transfer-Encoding to some degree.

Surely curl is better than all those other guys, right?

Not really. Not yet anyway.

curl has a long history of copying its behavior from what the browsers do, in order to allow users to basically script anything imaginable that is HTTP-like with curl. In this vein, we implemented compression support the same way as all the browsers did it: the content encoding style. (I have reason to believe that at least Opera actually supports or used to support compressed Transfer-Encoding.)

Starting now (code pushed to git repo just after the 7.21.5 release), we’ve taken steps to improve things. We’re changing gears and we’re introducing support for asking for and using compressed Transfer-Encoding. This will start out as an optional feature/flag (–tr-encoding / CURLOPT_TRANSFER_ENCODING) so that we can start out and see how servers in the wild behave and that we can deal with them properly. Then possibly we can switch the default in the future to always ask for compressed transfers. At least for the command line tool.

We know from the little tests we are aware of, that there are at least one known little problem or shall we call it a little detail to keep on eye at, with introducing compressed Transfer-Encoding. As has been so fine reported several years ago in the opera blog Browser sniffing gone wrong (again): Cars.com, there are cases where this may cause the server to send data that gets compressed twice (using both Content and Transfer Encoding) and that needs to be taken care of properly by the client.

At the time of this writing, I’ve not yet taken care of the double-compress case in the code, but I intend to get on to it within shortly.

I’m otherwise very interested in hearing what kind of experience people will have from this. What servers and sites will support this as documented and intended?

Shipping curl 7.21.5

I don’t usually post anything here when we do curl releases, pretty much because we do them bimonthly on a fairly steady schedule so there should be little surprise to anyone interested by the time they get public.

But hey, this is hard work and just to remind you all what’s going on I thought I’d throw in a mention of what we’ve spent the last two months doing. curl and libcurl 7.21.5 is released today.

The five notable changes introduced this time include:

The CURLOPT_SOCKOPTFUNCTION callback can now return information back to libcurl that the socket libcurl operates on is already connected. This is useful for applications that do a lot of fiddling on their own and possibly provide its own socket to start with using the CURLOPT_OPENSOCKETFUNCTION.

curl the tool got support for the –netrc-file option, that allows a user to point out a specific .netrc file instead of always forcing the user to use the fixed $HOME/.netrc one.

Brand new support for building libcurl with the cyassl library for SSL/TLS support. Previously curl only had support for the older OpenSSL emulation API that cyassl used to provide, but starting now we’re using cyassl directly and it is now a proper SSL citizen among the seven SSL libraries curl supports.

Since the previous release when we shipped the first support for TLS-SRP that required GnuTLS, the OpenSSL project accepted patches that introduced TLS-SRP into their official version as well and accordingly we have received patches that now allow users to use TLS-SRP with libcurl built against (a new enough) OpenSSL as well.

We have started to re-use two error codes a bit differently within libcurl, so that it now can return: CURLE_NOT_BUILT_IN (4) when an application tries to use a feature that was missing or was explicitly disabled at build-time and CURLE_UNKNOWN_OPTION (48) when the application has passed in an option that isn’t known or recognized.

And we’re counting more than 40 bugfixes worth mentioning. The most important ones are possibly:

If using the multi interface doing RTSP, libcurl could crash when trying to re-use a previous connection.

POP3 didn’t do TLS properly, it issued the wrong command to start TLS and it didn’t send the password correctly once it did switch to TLS!

When using the multi interface, there could be times when the timeout didn’t trigger so it wouldn’t close lingering connections even when asked to do so.

SFTP and SFP with the multi_socket interface were not working correctly and would very easily end up with stalled transfers due to the application being told to wait for the wrong action (or none at all).

If told to use the CCC command (which is used with FTP-SSL when the client asks the server to switch off from an SSL connection back to plain TCP again), curl would disable SSL on the connection but then use the wrong socket reader function and crash.

… but of course, if you’ve suffered from a particular bug in a previous release I’m sure you’ll consider the exact bug fix that corrects your problem to be the most important one!

Not to forget, the great people apart from yours truly that have contributed with code and insights since the previous release. Without them, the above list of changes and bugfixes just wouldn’t exist. The friends we have to thank are (in no particular order):

Mike Crowe, Kamil Dudka, Julien Chaffraix, Hoi-Ho Chan, Ben Noordhuis, Dan Fandrich, Henry Ludemann, Karl M, Manuel Massing, Marcus Sundberg, Stefan Krause, Todd A Ouska, Saqib Ali, Andre Guibert de Bruet, Tor Arntsen, Vincent Torri, Dave Reisner, Chris Smowton, Tinus van den Berg, Hongli Lai, Gisle Vanem, Andrei Benea, Mehmet Bozkurt

… and now back to working towards the next release. To be expected in roughly two months. Repeat.

curlyears plus equals one

BirthdaycakeThirteen years ago I released the first version of curl to the world – on March 20 1998. curl is now a teenage project and there’s no slowdown or end in sight.

So what does a project like ours introduce after having existed for so long? The recent year has been full of activities in the project, and here’s a run down with some of the stuff that has been going on:

We switched source code versioning system from CVS to git

gopher support got back into curl

support for RTMP was added

Two additional SSL libraries are now supported: PolarSSL and axTLS, making it a total of seven

70 persons provided code into the git repository, making the THANKS document now list 854 names.

About 1000 commits were made – out of a total of almost 14000 (counted from Dec 1999) so it makes this year slightly under average in terms of commit rate.

6 releases were shipped with a total of 179 bugfixes

1 security flaw was found and fixed

More than 4900 mails were posted to the curl mailing lists

We introduced a unit test system

Over 1400GB of data was downloaded from the curl web site

During the end of this period, 45% of our web visitors used Firefox, 23% used Chrome and less than 20% used IE. 75% were on Windows, 13% on Linux and 10% on Mac.

cURL