Tag Archives: cURL and libcurl

libspdy

October 18, 2011 Daniel Stenberg 8 Comments

SPDY is a neat new protocol and possible contender to replace HTTP – at least in some areas and for some use cases. SPDY has been invented and developed mostly by Google engineers.

SPDY allows better usage of fewer TCP connections (since it sends multiple logical streams over a single physical TCP connection) and it helps clients overcome problems with TCP (like how a new connection starts slowly) while at the same time reducing latency and bandwidth requirements. Very similar to how channels are handled over an SSH connection.

Chrome of course already supports SPDY and Firefox has some early experimental support being worked on.

Of course there are also legitimate criticisms against SPDY as well, including subjects like how it makes caching proxies impossible (because everything goes over SSL), how it makes debugging a lot harder by using compressed headers, how it is impossible to extract just a single header from the stream due to its compression approach and how the compression state buffers make each individual stream use more memory than plain old HTTP (plain TCP) ones.

We can expect SPDY<=>HTTP gateways to appear so that nobody gets locked into either side of these protocols.

SPDY will provide faster transfers. libcurl is currently used for speed reasons in many cases. To me, it makes perfect sense to have libcurl use and try to use SPDY instead of HTTP exactly like how the browsers are starting to do it, so that the libcurl using applications will get their contents transferred faster.

My thinking is that we introduce some new magic option(s) that makes libcurl use SPDY, and for normal easy interface transfers it will remain to use a single connection for each new SPDY transfer, but if you use the multi interface and you enable pipelining you’ll instead make libcurl do multiple transfers over the same single SPDY connection (as long as you speak with the same server and port etc). From an application’s stand-point it shouldn’t make any difference, apart from being faster than otherwise. Just like we want it!

Implementation wise, I would like to use a reliable and efficient third-party library for the actual SPDY implementation. If there doesn’t exist any, we make one and run that one independently. I found libspdy, but I found some concerns about it (noÂ mailing list, looks like one-man project, not C89 compliant, no API docs etc). I mailed the libspdy author, I hoping we’d sort out my doubts and then I’d base my continued work on that library.

After some time Thomas Roth, primary libspdy author, responded and during our subsequent email exchange I’ve gotten a restored faith and belief in this library and its direction. Not only did he fix the C89 compliance pretty quickly, he is also promising rather big changes that are pending to get committed within a week or so.

Comforted by what I’ve learned from Thomas, I’ll wait for his upcoming changes and I’ll join the soon to be created mailing list for the libspdy project and I’ll contribute some ideas and efforts to help shape it into the fine SPDY library we all want. I can only encourage other fellow SPDY library interested persons to do the same!

Updated: Join the SPDY library development

cURL and libcurl

curl 7.22.0

September 13, 2011 Daniel Stenberg

Another release of curl and libcurl just happened. 7.22.0 is released.

Apart from the 28 something documented bug fixes, we introduce a range of changes that could be noteworthy:

Added CURLOPT_GSSAPI_DELEGATION – remember that we explicitly disabled GSSAPI delegation in our previous release due to a security problem. Now we introduce an option for the application to control exactly how to behave.
Added support for NTLM delegation to Samba’s winbind daemon helper ntlm_auth. This lets libcurl use the external helper program to do things like NTLM single-sign on.
Display notes from setup file in testcurl.pl – provides a way for test clients to provide more information back to the centralized test summary on the primary server.
BSD-style lwIP TCP/IP stack experimental support on Windows – there are still flaws in lwIP on windows that prevents it from working properly
OpenSSL: Use SSL_MODE_RELEASE_BUFFERS if available – this is basically a way to ask OpenSSL to use less memory
–delegation was added to set CURLOPT_GSSAPI_DELEGATION – simply the new option exported to the command line tool
nss: start with no database if the selected database is broken – a slightly modified behavior
telnet: allow programatic use on Windows – basically making the windows implementation in sync with how the non-windows version already has worked for quite some time

This release is this great thanks to 25 friendly contributors.

cURL and libcurl

A libcurl postergirl?

August 27, 2011 Daniel Stenberg 1 Comment

If you click the image you’ll see a full-resolution screendump for my recent search for “libcurl” on google. Where did that (image of a) girl come from? Judging from where it appears on the results page right next to the information about the cURL project you can’t but assume that she’s somehow related to the project.

That’s of course not true. When moving the mouse over the image I get a tooltip with a funny “hair curling” URL and that’s also where a click on the image takes me.

A mighty weird way of presenting a search result if you ask me!

Mail

Remove your software

May 27, 2011 Daniel Stenberg

Another one of those emails arrived in my inbox today:

Subject: Remove:

Your software needs to be removed from my work computer. I did not install it, I do not want it and did not request it.
[name redacted]

No mention what software or indication of what platform or what might’ve happened when my software allegedly ended up in the person’s computer. Not very friendly either.

Again I suspect that there’s some software that uses curl in some way, but I can’t tell for sure…

I replied to it, saying that I didn’t install anything on his computer.

Network

Pointless respecifying FTP URI

May 24, 2011 Daniel Stenberg

There’s this person within IETF who seems to possess endless energy and a never-ending wish to clean up tiny details within the IETF processes. He continuously digs up specifications that need to be registered or submitted again somewhere due to some process. Often under loud protests from fellow IETFers since it steals time and energy from people on the lists for discussions and reviews – only to satisfy some administrative detail. Time and energy perhaps better spent on things like new protocols and interesting new technologies.

This time, he has deemed that the FTP a FILE URI specs need to be registered properly, and alas he has submitted his first suggested update of the FTP URI.

From my work with curl I have of course figured out a few problems with RFC1738 that I don’t think we should just repeat in a new version of the spec. It turns out I’m not alone in thinking this work isn’t really good like this, and I posted a second mail to clarify my points.

We’re not working on fixing the problems with FTP URIs that are present in RFC1738 so just rephrasing those into a new spec is a bad idea.

We could possibly start the work on fixing the problems, but so far I’ve seen no such will or efforts and I don’t plan on pushing for that myself either.

Please tell me or the ftpext2 group where I or the others are wrong or right or whatever!

Blogging

11 years of me

May 15, 2011 Daniel Stenberg

On May 11th 2000 I posted by first blog entry that is still available online on advogato.org. No surprise but it was curl-related.

The full post was:

I was made aware of the fact that curl is not really dealing well with the directory part of an ftp URL.

I was gonna quote the appropriate text piece from RFC1738 (yes, it is obsoleted by RFC2396 although 1738 has more detailed info about particular protocols like ftp) to someone when I noticed that I had interpreted it wrong when I read it before.

The difference between getting a file relative the login directory or with absolute path. It turns out you have to get a path like ftp.site.com/%2etmp/ if you want have the absolute path “/tmp”. Oh well, I have it support my old way as well even if that isn’t following the RFC just to allow people using that way to be able to use the new one unmodifed…

… which I guess proves that even though lots of time has passed, I still occupy myself with the same kind of hobbies and side- projects…

cURL and libcurl, Technology

US patent 6,098,180

May 11, 2011 Daniel Stenberg 5 Comments

(I am not a lawyer, this is not legal advice and these are not legal analyses, just my personal observations and ramblings. Please correct me where I’m wrong or add info if you have any!)

At 3:45 pm on March 18th 2011, the company Content Delivery Solutions LLC filed a complaint in a court in Texas, USA. The defendants are several bigwigs and the list includes several big and known names of the Internet:

Akamai
AOL
AT&T
CD Networks
Globalscape
Google
Limelight Networks
Peer 1 Network
Research In Motion
Savvis
Verizon
Yahoo!

The complaint was later amended with an additional patent (filed on April 18th), making it list three patents that these companies are claimed to violate (I can’t find the amended version online though). Two of the patents ( 6,393,471 and 6,058,418) are for marketing data and how to use client info to present ads basically. The third is about file transfer resumes.

I was contacted by a person involved in the case at one of the defendants’. This unspecified company makes one or more products that use “curl“. I don’t actually know if they use the command line tool or the library – but I figure that’s not too important here. curl gets all its superpowers from libcurl anyway.

This Patent Troll thus basically claims that curl violates a patent on resumed file transfers!

The patent in question that would be one that curl would violate is the US patent 6,098,180 which basically claims to protect this idea:

A system is provided for the safe transfer of large data files over an unreliable network link in which the connection can be interrupted for a long period of time.

The patent describes several ways in how it may detect how it should continue the transfer from such a break. As curl only does transfer resumes based on file name and an offset, as told by the user/application, that could be the only method that they can say curl would violate of their patent.

The patent goes into detail in how a client first sends a “signature” and after an interruption when the file transfer is about to continue, the client would ask the server about details of what to send in the continuation. With a very vivid imagination, that could possibly equal the response to a FTP SIZE command or the Content-Length: response in a HTTP GET or HEAD request.

A more normal reader would rather say that no modern file transfer protocol works as described in that patent and we should go with “defendant is not infringing, move on nothing to see here”.

But for the sake of the argument, let’s pretend that the patent actually describes a method of file transfer resuming that curl uses.

The ‘180 (it is referred to with that name within the court documents) patent was filed at February 18th 1997 (and issued on August 1, 2000). Apparently we need to find prior art that was around no later than February 17th 1996, that is to say one year before the filing of the stupid thing. (This I’ve been told, I had no idea it could work like this and it seems shockingly weird to me.)

What existing tools and protocols did resumed transfers in February 1996 based on a file name and a file offset?

Lots!

Thank you all friends for the pointers and references you’ve brought to me.

The FTP spec RFC 959 was published in October 1985. FTP has a REST command that tells at what offset to “restart” the transfer at. This was being used by FTP clients long before 1996, and an example is the known Kermit FTP client that did offset-based file resumed transfer in 1995.
The HTTP header Range: introduces this kind of offset-based resumed transfer, although with a slightly fancier twist. The Range: header was discussed before the magic date, as also can be seen on the internet already in this old mailing list post from December 1995.
One of the protocols from the old days that those of us who used modems and BBSes in the old days remember is zmodem. Zmodem was developed in 1986 and there’s this zmodem spec from 1988 describing how to do file transfer resumes.
A slightly more modern protocol that I’ve unfortunately found no history for before our cut-off date is rsync, as I could only find the release mail for rsync 1.0 from June 1996. Still long before the patent was filed obviously, and also clearly showing that the one year margin is silly as for all we know they could’ve come up with the patent idea after reading the rsync releases notes and still rsync can’t be counted as prior art.
Someone suggested GetRight as a client doing this, but GetRight wasn’t released in 1.0 until Febrary 1997 so unfortunately that didn’t help our case even if it seems to have done it at the time.
curl itself does not pre-date the patent filing. curl was first released in March 1998, and the predecessor was started around summer-time 1997. I don’t have any remaining proofs of that, and it still wasn’t before “the date” so I don’t think it matters much now.

At the time of this writing I don’t know where this will end up or what’s going to happen. Time will tell.

This Software patent obviously is a concern mostly to US-based companies and those selling products in the US. I am neither a US citizen nor do I have or run any companies based in the US. However, since curl and libcurl are widely used products that are being used by several hundred companies already, I want to help bring out as much light as possible onto this problem.

The patent itself is of course utterly stupid and silly and it should never have been accepted as it describes trivially thought out ideas and concepts that have been thought of and implemented already decades before this patent was filed or granted although I claim that the exact way explained in the patent is not frequently used. Possibly the protocol using a method that is closed to the description of the patent is zmodem.

I guess I don’t have to mention what I think about software patents.

I’m convinced that most or all download tools and browsers these days know how to resume a previously interrupted transfer this way. Why wouldn’t these guys also approach one of the big guys (with thick wallets) who also use this procedure? Surely we can think of a few additional major players with file tools that can resume file transfers and who weren’t targeted in this suit!

I don’t know why. Clearly they’ve not backed down from attacking some of the biggest tech and software companies.

patent drawing

(Illustration from the ‘180 patent.)

c-ares, cURL and libcurl

libcurl’s name resolving

April 25, 2011 Daniel Stenberg

Recently we’ve put in some efforts into remodeling libcurl’s code that handles name resolves, and then in particular the two asynchronous name resolver backends that we support: c-ares and threaded.

Name resolving in general in libcurl

libcurl can be built to do name resolves using different means. The primary difference between them is that they are either synchronous or asynchronous. The synchronous way makes the operation block during name resolves and there’s no “decent” way to abort the resolves if they take longer time than the program wants to allow it (other than using signals and that’s not what we consider a decent way).

Asynch resolving in libcurl

This is done using one of two ways: by building libcurl with c-ares support or by building libcurl and tell it to use threads to solve the problem. libcurl can be built using either mechanism on just about all platforms, but on Windows the build defaults to using the threaded resolver.

The c-ares solution

c-ares’ primary benefit is that it is an asynchronous name resolver library so it can do name resolves without blocking without requiring a new thread. It makes it use less resources and remain a perfect choice even if you’d scale up your application up to and beyond an insane number of simultaneous connections. Its primary drawback is that since it isn’t based on the system default name resolver functions, they don’t work exactly like the system name resolver functions and that causes trouble at times.

The threaded solution

By making sure the system functions are still used, this makes name resolving work exactly as with the synchronous solution, but thanks to the threading it doesn’t block. The downside here is of course that it uses a new thread for every name resolve, which in some cases can become quite a large number and of course creating and killing threads at a high rate is much more costly than sticking with the single thread.

Pluggable

Now we’ve made sure that we have an internal API that both our asynchronous name resolvers implement, and all code internally use this API. It makes the code a lot cleaner than the previous #ifdef maze for the different approaches, and it has the side-effect that it should allow much easier pluggable backends in case someone would like to make libcurl support another asynchronous name resolver or system.

This is all brand new in the master branch so please try it out and help us polish the initial quirks that may still exist in the code.

There is no current plan to allow this plugging to happen run-time or using any kind of external plugins. I don’t see any particular benefit for us to do that, but it would give us a lot more work and responsibilities.

cURL and libcurl, Network, Web

HTTP transfer compression

April 18, 2011 Daniel Stenberg

HTTP is a protocol that looks simple in its simplest form and its readability can easily fool you into believing an implementation is straight forward and quickly done.

That’s not the reality though. HTTP is a very big protocol with lots of corners and twisting mazes that one can get lost in. Even after having been the primary author of curl for 13+ years, there are still lots of HTTP things I don’t master.

To name an example of an area with little known quirks, there’s a funny situation when it comes to how HTTP supports and doesn’t support compression of data and compression of data in transfer.

No header compression

A little flaw in HTTP in regards to compression is that there’s no way to compress headers, in either direction. No matter what we do, we must send the text as-is and both requests and responses are sometimes very big these days. Especially taken into account how cookies are always inserted in requests if they match. Anyway, this flaw is nothing we can do anything about in HTTP 1.1 so we need to live with it.

On the other side, compression of the response body is supported.

Compressing data

Compression of data can be done in two ways: either the actual transfer is compressed or the body data is compressed. The difference is subtle, but when the body data is compressed there’s really nothing that mandates that the client has to uncompress it for the end user, and if the transfer is compressed the receiver must uncompress it in order to deal with the transfer properly.

For reasons that are unknown to me, HTTP clients and servers started out supporting compression only using the Content-Encoding style. It means that the client tells the server what kind of content encodings it supports (using Accept-Encoding:) and the server then sends the response data using one of the supported encodings. The client then decides on its own that if it gets the content in one of the compressed formats that it said it can handle, it will automatically uncompress that on arrival.

The HTTP protocol designers however intended this kind of automatic compression and subsequent uncompress to be done using Transfer-Encoding, as the end result is the completely transparent and the uncompress action is implied and intended by the protocol design. This is done by the client telling the server what transfer encodings it supports with the TE: header and the server adds a Transfer-Encoding: header in the response telling how the transfer is encoded.

HTTP 1.1 introduced a mandatory encoding that all servers can use whenever they feel like it: chunked encoding, so all HTTP 1.1 clients already have to deal with Transfer-Encoding to some degree.

Surely curl is better than all those other guys, right?

Not really. Not yet anyway.

curl has a long history of copying its behavior from what the browsers do, in order to allow users to basically script anything imaginable that is HTTP-like with curl. In this vein, we implemented compression support the same way as all the browsers did it: the content encoding style. (I have reason to believe that at least Opera actually supports or used to support compressed Transfer-Encoding.)

Starting now (code pushed to git repo just after the 7.21.5 release), we’ve taken steps to improve things. We’re changing gears and we’re introducing support for asking for and using compressed Transfer-Encoding. This will start out as an optional feature/flag (–tr-encoding / CURLOPT_TRANSFER_ENCODING) so that we can start out and see how servers in the wild behave and that we can deal with them properly. Then possibly we can switch the default in the future to always ask for compressed transfers. At least for the command line tool.

We know from the little tests we are aware of, that there are at least one known little problem or shall we call it a little detail to keep on eye at, with introducing compressed Transfer-Encoding. As has been so fine reported several years ago in the opera blog Browser sniffing gone wrong (again): Cars.com, there are cases where this may cause the server to send data that gets compressed twice (using both Content and Transfer Encoding) and that needs to be taken care of properly by the client.

At the time of this writing, I’ve not yet taken care of the double-compress case in the code, but I intend to get on to it within shortly.

I’m otherwise very interested in hearing what kind of experience people will have from this. What servers and sites will support this as documented and intended?

daniel.haxx.se