Tag Archives: cURL and libcurl

Pointless respecifying FTP URI

There’s this person wiIETFthin IETF who seems to possess endless energy and a never-ending wish to clean up tiny details within the IETF processes. He continuously digs up specifications that need to be registered or submitted again somewhere due to some process. Often under loud protests from fellow IETFers since it steals time and energy from people on the lists for discussions and reviews – only to satisfy some administrative detail. Time and energy perhaps better spent on things like new protocols and interesting new technologies.

This time, he has deemed that the FTP a FILE URI specs need to be registered properly, and alas he has submitted his first suggested update of the FTP URI.

From my work with curl I have of course figured out a few problems with RFC1738 that I don’t think we should just repeat in a new version of the spec. It turns out I’m not alone in thinking this work isn’t really good like this, and I posted a second mail to clarify my points.

We’re not working on fixing the problems with FTP URIs that are present in RFC1738 so just rephrasing those into a new spec is a bad idea.

We could possibly start the work on fixing the problems, but so far I’ve seen no such will or efforts and I don’t plan on pushing for that myself either.

Please tell me or the ftpext2 group where I or the others are wrong or right or whatever!

11 years of me

On May 11th 2000 I posted by first blog entry that is still available online on advogato.org. No surprise but it was curl-related.

The full post was:

I was made aware of the fact that curl is not really dealing well with the directory part of an ftp URL.

I was gonna quote the appropriate text piece from RFC1738 (yes, it is obsoleted by RFC2396 although 1738 has more detailed info about particular protocols like ftp) to someone when I noticed that I had interpreted it wrong when I read it before.

The difference between getting a file relative the login directory or with absolute path. It turns out you have to get a path like ftp.site.com/%2etmp/ if you want have the absolute path “/tmp”. Oh well, I have it support my old way as well even if that isn’t following the RFC just to allow people using that way to be able to use the new one unmodifed…

… which I guess proves that even though lots of time has passed, I still occupy myself with the same kind of hobbies and side- projects…

US patent 6,098,180

(I am not a lawyer, this is not legal advice and these are not legal analyses, just my personal observations and ramblings. Please correct me where I’m wrong or add info if you have any!)

At 3:45 pm on March 18th 2011, the company Content Delivery Solutions LLC filed a complaint in a court in Texas, USA. The defendants are several bigwigs and the list includes several big and known names of the Internet:

  • Akamai
  • AOL
  • AT&T
  • CD Networks
  • Globalscape
  • Google
  • Limelight Networks
  • Peer 1 Network
  • Research In Motion
  • Savvis
  • Verizon
  • Yahoo!

The complaint was later amended with an additional patent (filed on April 18th), making it list three patents that these companies are claimed to violate (I can’t find the amended version online though). Two of the patents ( 6,393,471 and 6,058,418) are for marketing data and how to use client info to present ads basically. The third is about file transfer resumes.

I was contacted by a person involved in the case at one of the defendants’. This unspecified company makes one or more products that use “curl“. I don’t actually know if they use the command line tool or the library – but I figure that’s not too important here. curl gets all its superpowers from libcurl anyway.

This Patent Troll thus basically claims that curl violates a patent on resumed file transfers!

The patent in question that would be one that curl would violate is the US patent 6,098,180 which basically claims to protect this idea:

A system is provided for the safe transfer of large data files over an unreliable network link in which the connection can be interrupted for a long period of time.

The patent describes several ways in how it may detect how it should continue the transfer from such a break. As curl only does transfer resumes based on file name and an offset, as told by the user/application, that could be the only method that they can say curl would violate of their patent.

The patent goes into detail in how a client first sends a “signature” and after an interruption when the file transfer is about to continue, the client would ask the server about details of what to send in the continuation. With a very vivid imagination, that could possibly equal the response to a FTP SIZE command or the Content-Length: response in a HTTP GET or HEAD request.

A more normal reader would rather say that no modern file transfer protocol works as described in that patent and we should go with “defendant is not infringing, move on nothing to see here”.

But for the sake of the argument, let’s pretend that the patent actually describes a method of file transfer resuming that curl uses.

The ‘180 (it is referred to with that name within the court documents) patent was filed at February 18th 1997 (and issued on August 1, 2000). Apparently we need to find prior art that was around no later than February 17th 1996, that is to say one year before the filing of the stupid thing. (This I’ve been told, I had no idea it could work like this and it seems shockingly weird to me.)

What existing tools and protocols did resumed transfers in February 1996 based on a file name and a file offset?

Lots!

Thank you all friends for the pointers and references you’ve brought to me.

  • The FTP spec RFC 959 was published in October 1985. FTP has a REST command that tells at what offset to “restart” the transfer at. This was being used by FTP clients long before 1996, and an example is the known Kermit FTP client that did offset-based file resumed transfer in 1995.
  • The HTTP header Range: introduces this kind of offset-based resumed transfer, although with a slightly fancier twist. The Range: header was discussed before the magic date, as also can be seen on the internet already in this old mailing list post from December 1995.
  • One of the protocols from the old days that those of us who used modems and BBSes in the old days remember is zmodem. Zmodem was developed in 1986 and there’s this zmodem spec from 1988 describing how to do file transfer resumes.
  • A slightly more modern protocol that I’ve unfortunately found no history for before our cut-off date is rsync, as I could only find the release mail for rsync 1.0 from June 1996. Still long before the patent was filed obviously, and also clearly showing that the one year margin is silly as for all we know they could’ve come up with the patent idea after reading the rsync releases notes and still rsync can’t be counted as prior art.
  • Someone suggested GetRight as a client doing this, but GetRight wasn’t released in 1.0 until Febrary 1997 so unfortunately that didn’t help our case even if it seems to have done it at the time.
  • curl itself does not pre-date the patent filing. curl was first released in March 1998, and the predecessor was started around summer-time 1997. I don’t have any remaining proofs of that, and it still wasn’t before “the date” so I don’t think it matters much now.

At the time of this writing I don’t know where this will end up or what’s going to happen. Time will tell.

This Software patent obviously is a concern mostly to US-based companies and those selling products in the US. I am neither a US citizen nor do I have or run any companies based in the US. However, since curl and libcurl are widely used products that are being used by several hundred companies already, I want to help bring out as much light as possible onto this problem.

The patent itself is of course utterly stupid and silly and it should never have been accepted as it describes trivially thought out ideas and concepts that have been thought of and implemented already decades before this patent was filed or granted although I claim that the exact way explained in the patent is not frequently used. Possibly the protocol using a method that is closed to the description of the patent is zmodem.

I guess I don’t have to mention what I think about software patents.

I’m convinced that most or all download tools and browsers these days know how to resume a previously interrupted transfer this way. Why wouldn’t these guys also approach one of the big guys (with thick wallets) who also use this procedure? Surely we can think of a few additional major players with file tools that can resume file transfers and who weren’t targeted in this suit!

I don’t know why. Clearly they’ve not backed down from attacking some of the biggest tech and software companies.

patent drawing

(Illustration from the ‘180 patent.)

libcurl’s name resolving

Recently we’ve put in some efforts into remodeling libcurl’s code that handles name resolves, and then in particular the two asynchronous name resolver backends that we support: c-ares and threaded.

Name resolving in general in libcurl

libcurl can be built to do name resolves using different means. The primary difference between them is that they are either synchronous or asynchronous. The synchronous way makes the operation block during name resolves and there’s no “decent” way to abort the resolves if they take longer time than the program wants to allow it (other than using signals and that’s not what we consider a decent way).

Asynch resolving in libcurl

This is done using one of two ways: by building libcurl with c-ares support or by building libcurl and tell it to use threads to solve the problem. libcurl can be built using either mechanism on just about all platforms, but on Windows the build defaults to using the threaded resolver.

The c-ares solution

c-ares’ primary benefit is that it is an asynchronous name resolver library so it can do name resolves without blocking without requiring a new thread. It makes it use less resources and remain a perfect choice even if you’d scale up your application up to and beyond an insane number of simultaneous connections. Its primary drawback is that since it isn’t based on the system default name resolver functions, they don’t work exactly like the system name resolver functions and that causes trouble at times.

The threaded solution

By making sure the system functions are still used, this makes name resolving work exactly as with the synchronous solution, but thanks to the threading it doesn’t block. The downside here is of course that it uses a new thread for every name resolve, which in some cases can become quite a large number and of course creating and killing threads at a high rate is much more costly than sticking with the single thread.

Pluggable

Now we’ve made sure that we have an internal API that both our asynchronous name resolvers implement, and all code internally use this API. It makes the code a lot cleaner than the previous #ifdef maze for the different approaches, and it has the side-effect that it should allow much easier pluggable backends in case someone would like to make libcurl support another asynchronous name resolver or system.

This is all brand new in the master branch so please try it out and help us polish the initial quirks that may still exist in the code.

There is no current plan to allow this plugging to happen run-time or using any kind of external plugins. I don’t see any particular benefit for us to do that, but it would give us a lot more work and responsibilities.

cURL

HTTP transfer compression

HTTP is a protocol that looks simple in its simplest form and its readability can easily fool you into believing an implementation is straight forward and quickly done.

That’s not the reality though. HTTP is a very big protocol with lots of corners and twisting mazes that one can get lost in. Even after having been the primary author of curl for 13+ years, there are still lots of HTTP things I don’t master.

To name an example of an area with little known quirks, there’s a funny situation when it comes to how HTTP supports and doesn’t support compression of data and compression of data in transfer.

No header compression

A little flaw in HTTP in regards to compression is that there’s no way to compress headers, in either direction. No matter what we do, we must send the text as-is and both requests and responses are sometimes very big these days. Especially taken into account how cookies are always inserted in requests if they match. Anyway, this flaw is nothing we can do anything about in HTTP 1.1 so we need to live with it.

On the other side, compression of the response body is supported.

Compressing data

Compression of data can be done in two ways: either the actual transfer is compressed or the body data is compressed. The difference is subtle, but when the body data is compressed there’s really nothing that mandates that the client has to uncompress it for the end user, and if the transfer is compressed the receiver must uncompress it in order to deal with the transfer properly.

For reasons that are unknown to me, HTTP clients and servers started out supporting compression only using the Content-Encoding style. It means that the client tells the server what kind of content encodings it supports (using Accept-Encoding:) and the server then sends the response data using one of the supported encodings. The client then decides on its own that if it gets the content in one of the compressed formats that it said it can handle, it will automatically uncompress that on arrival.

The HTTP protocol designers however intended this kind of automatic compression and subsequent uncompress to be done using Transfer-Encoding, as the end result is the completely transparent and the uncompress action is implied and intended by the protocol design. This is done by the client telling the server what transfer encodings it supports with the TE: header and the server adds a Transfer-Encoding: header in the response telling how the transfer is encoded.

HTTP 1.1 introduced a mandatory encoding that all servers can use whenever they feel like it: chunked encoding, so all HTTP 1.1 clients already have to deal with Transfer-Encoding to some degree.

Surely curl is better than all those other guys, right?

Not really. Not yet anyway.

curl has a long history of copying its behavior from what the browsers do, in order to allow users to basically script anything imaginable that is HTTP-like with curl. In this vein, we implemented compression support the same way as all the browsers did it: the content encoding style. (I have reason to believe that at least Opera actually supports or used to support compressed Transfer-Encoding.)

Starting now (code pushed to git repo just after the 7.21.5 release), we’ve taken steps to improve things. We’re changing gears and we’re introducing support for asking for and using compressed Transfer-Encoding. This will start out as an optional feature/flag (–tr-encoding / CURLOPT_TRANSFER_ENCODING) so that we can start out and see how servers in the wild behave and that we can deal with them properly. Then possibly we can switch the default in the future to always ask for compressed transfers. At least for the command line tool.

We know from the little tests we are aware of, that there are at least one known little problem or shall we call it a little detail to keep on eye at, with introducing compressed Transfer-Encoding. As has been so fine reported several years ago in the opera blog Browser sniffing gone wrong (again): Cars.com, there are cases where this may cause the server to send data that gets compressed twice (using both Content and Transfer Encoding) and that needs to be taken care of properly by the client.

At the time of this writing, I’ve not yet taken care of the double-compress case in the code, but I intend to get on to it within shortly.

I’m otherwise very interested in hearing what kind of experience people will have from this. What servers and sites will support this as documented and intended?

Shipping curl 7.21.5

I don’t usually post anything here when we do curl releases, pretty much because we do them bimonthly on a fairly steady schedule so there should be little surprise to anyone interested by the time they get public.

But hey, this is hard work and just to remind you all what’s going on I thought I’d throw in a mention of what we’ve spent the last two months doing. curl and libcurl 7.21.5 is released today.

The five notable changes introduced this time include:

The CURLOPT_SOCKOPTFUNCTION callback can now return information back to libcurl that the socket libcurl operates on is already connected. This is useful for applications that do a lot of fiddling on their own and possibly provide its own socket to start with using the CURLOPT_OPENSOCKETFUNCTION.

curl the tool got support for the –netrc-file option, that allows a user to point out a specific .netrc file instead of always forcing the user to use the fixed $HOME/.netrc one.

Brand new support for building libcurl with the cyassl library for SSL/TLS support. Previously curl only had support for the older OpenSSL emulation API that cyassl used to provide, but starting now we’re using cyassl directly and it is now a proper SSL citizen among the seven SSL libraries curl supports.

Since the previous release when we shipped the first support for TLS-SRP that required GnuTLS, the OpenSSL project accepted patches that introduced TLS-SRP into their official version as well and accordingly we have received patches that now allow users to use TLS-SRP with libcurl built against (a new enough) OpenSSL as well.

We have started to re-use two error codes a bit differently within libcurl, so that it now can return: CURLE_NOT_BUILT_IN (4) when an application tries to use a feature that was missing or was explicitly disabled at build-time and CURLE_UNKNOWN_OPTION (48) when the application has passed in an option that isn’t known or recognized.

And we’re counting more than 40 bugfixes worth mentioning. The most important ones are possibly:

If using the multi interface doing RTSP, libcurl could crash when trying to re-use a previous connection.

POP3 didn’t do TLS properly, it issued the wrong command to start TLS and it didn’t send the password correctly once it did switch to TLS!

When using the multi interface, there could be times when the timeout didn’t trigger so it wouldn’t close lingering connections even when asked to do so.

SFTP and SFP with the multi_socket interface were not working correctly and would very easily end up with stalled transfers due to the application being told to wait for the wrong action (or none at all).

If told to use the CCC command (which is used with FTP-SSL when the client asks the server to switch off from an SSL connection back to plain TCP again), curl would disable SSL on the connection but then use the wrong socket reader function and crash.

… but of course, if you’ve suffered from a particular bug in a previous release I’m sure you’ll consider the exact bug fix that corrects your problem to be the most important one!

Not to forget, the great people apart from yours truly that have contributed with code and insights since the previous release. Without them, the above list of changes and bugfixes just wouldn’t exist. The friends we have to thank are (in no particular order):

Mike Crowe, Kamil Dudka, Julien Chaffraix, Hoi-Ho Chan, Ben Noordhuis, Dan Fandrich, Henry Ludemann, Karl M, Manuel Massing, Marcus Sundberg, Stefan Krause, Todd A Ouska, Saqib Ali, Andre Guibert de Bruet, Tor Arntsen, Vincent Torri, Dave Reisner, Chris Smowton, Tinus van den Berg, Hongli Lai, Gisle Vanem, Andrei Benea, Mehmet Bozkurt

… and now back to working towards the next release. To be expected in roughly two months. Repeat.

curlyears plus equals one

BirthdaycakeThirteen years ago I released the first version of curl to the world – on March 20 1998. curl is now a teenage project and there’s no slowdown or end in sight.

So what does a project like ours introduce after having existed for so long? The recent year has been full of activities in the project, and here’s a run down with some of the stuff that has been going on:

We switched source code versioning system from CVS to git

gopher support got back into curl

support for RTMP was added

Two additional SSL libraries are now supported: PolarSSL and axTLS, making it a total of seven

70 persons provided code into the git repository, making the THANKS document now list 854 names.

About 1000 commits were made – out of a total of almost 14000 (counted from Dec 1999) so it makes this year slightly under average in terms of commit rate.

6 releases were shipped with a total of 179 bugfixes

1 security flaw was found and fixed

More than 4900 mails were posted to the curl mailing lists

We introduced a unit test system

Over 1400GB of data was downloaded from the curl web site

During the end of this period, 45% of our web visitors used Firefox, 23% used Chrome and less than 20% used IE. 75% were on Windows, 13% on Linux and 10% on Mac.

cURL

localhost hack on Windows

There's no place like 127.0.0.1

Readers of my blog and friends in general know that I’m not really a Windows guy. I never use it and I never develop things explicitly for windows – but I do my best in making sure my portable code also builds and runs on windows. This blog post is about a new detail that I’ve just learned and that I think I could help shed the light on, to help my fellow hackers. The other day I was contacted by a user of libcurl because he was using it on Windows and he noticed that when wanting to transfer data from the loopback device (where he had a service of his own), and he accessed it using “localhost” in the URL passed to libcurl, he would spot a DNS request for the address of that host name while when he used regular windows tools he would not see that! After some mails back and forth, the details got clear:

Windows has a default /etc/hosts version (conveniently instead put at “c:\WINDOWS\system32\drivers\etc\hosts”) and that default  /etc/hosts alternative used to have an entry for “localhost” in it that would point to 127.0.0.1.

When Windows 7 was released, Microsoft had removed the localhost entry from the /etc/hosts file. Reading sources on the net, it might be related to them supporting IPv6 for real but it’s not at all clear what the connection between those two actions would be.

getaddrinfo() in Windows has since then, and it is unclear exactly at which point in time it started to do this, been made to know about the specific string “localhost” and is documented to always return “all loopback addresses on the local computer”.

So, a custom resolver such as c-ares that doesn’t use Windows’ functions to resolve names but does it all by itself, that has been made to look in the /etc/host file etc now suddenly no longer finds “localhost” in a local file but ends up asking the DNS server for info about it… A case that is far from ideal. Most servers won’t have an entry for it and others might simply provide the wrong address.

I think we’ll have to give in and provide this hack in c-ares as well, just the way Windows itself does.

Oh, and as a bonus there’s even an additional hack mentioned in the getaddrinfo docs: On Windows Server 2003 and later if the pNodeName parameter points to a string equal to “..localmachine”, all registered addresses on the local computer are returned.

Fosdem 2011: my libcurl talk on video

Kai Engert was good enough to capture all the talks in the security devroom at Fosdem 2011, and while I’m seeding the full torrent I’ve made my own talk available as a direct download from here:

Fosdem 2011: security-room at 14:15 by Daniel Stenberg

The thing is about 107MB big, 640×480 resolution and is roughly 26 minutes playing time. WebM format.

libcurl, seven SSL libs and one SSH lib

I did a talk today at Fosdem with this title. The room only had 48 seats and it was completely packed with people standing everywhere it was possible around the seated guys.

The English slides from my talk are below. It was also recorded on video so I hope I’ll be able to post once it becomes available online