Byte ranges for FTP

Monday, December 20th, 2010

In the IETF ftpext2 working group there have been some talks around clients’ and servers’ ability to do and support “ranged” file transfers, that is transferring only a piece of any given file. FTP supports the REST command and has done so since the dawn of man (RFC765 – June 1980), and using that command, a client can set the starting point for a transfer but there is no way to set the end point. HTTP has supported the Range: header since the first HTTP 1.1 spec back in January 1997, and that supports both a start and an end point. The HTTP header does in fact support multiple ranges within the same header, but let’s not overdo it here!

Currently, to avoid getting an entire file a client would simply close the data connection when it has got all the data it wants. The unfortunate reality is that some servers don’t notice clients doing this, so in order for this to work reliably a client also has to send ABOR, and after this command has been sent there is no way for the client to reliably figure out the state of the control connection so it has to get closed as well (which is crap in case more files are to be transferred to or from the same host). It primarily becomes unreliable because when ABOR is sent, the client gets one or two responses back due to a race condition between the closing and the actual end of transfer etc, and it isn’t possible to tell exactly how to continue.

A solution for the future is being worked on. I’ve joined up the effort to write a spec that will suggest a new FTP command that sets the end point for a transfer in the same vein REST sets the start point. For the moment, we’ve named our suggested command RANG (as short for range). “We” in this context means Tatsuhiro Tsujikawa, Anthony Bryan and myself but we of course hope to get further valuable feedback by the great ftpext2 people.

There already are use cases that want range request for FTP. The people behind metalinks for example want to download the same file from many servers, and then it makes sense to be able to download little pieces from different sources.

The people who found the libcurl bugs I linked to above use libcurl as part of the Fedora/Redhat installer Anaconda, and if I understand things right they use this feature to just get the beginning of some files to check them out and avoid having to download the full file before it knows it truly wants it. Thus it saves lots of bandwidth.

In short, the use-cases for ranged FTP retrievals are quite likely pretty much the same ones as they are for HTTP!

The first RANG draft is now available.

murl for extended curlness

Monday, March 9th, 2009

I’m a firm believer in the old unix mantra of letting each tool do its job and do it well, and pass on the rest of the work to the next tool. I’ve always stated that curl should remain this way and that it should remain within its defined walls and not try to do everything.

But time passes and more and more ideas are thrown up in the air, or in some cases directly at me, and the list of things that we could do but don’t due to this philosophical limit of remaining focused has grown. It currently includes at least:

  • metalink support
  • recursive HTML downloads
  • recursive/wildcard FTP transfers
  • bittorrent support
  • automatic proxy configuration
  • simultaneous/parallel download support

Educated readers of course immediately detect that this list (if implemented) would make a tool that basically does what wget already does (and a lot more) and I’ve explicitly said for a decade that curl is not a wget clone. Maybe it is time for us (me?) to reevaluate that sentiment – at least in some sense.

I don’t want to sacrifice the concepts that have worked so fine for curl under so many years, so I’m still firmly against stuffing all this into curl (or libcurl). That simply will not happen with me at the wheel.

A much more interesting alternative would be to instead start working on a second tool within the curl project: murl. A tool that does basically everything that curl already does, but also opens the doors for adding just about everything else we can cram in and that is still related to data transfers. That would include, but not be restricted to, all the fancy stuff mentioned in the list above!

No the name murl is not set in stone, nor is this whole idea anything but plain and early thoughts thrown out at this point so it may or may not actually take off. It will probably depend on if I get support and help from fellow hackers to get started and moving along.

cURL

Metalink in curl bounty

Sunday, November 9th, 2008

The Metalink guys host a list of project ideas and one of those ideas is to add metalink support to curl, and I recently bumped the stakes a bit by raising the bounty with an additional 200 USD so that the offer is now 500 USD for the person or team that brings the feature as described.

My primary motivation for doing this is that I like the metalink idea and I’d like to help making sure it gets used more widely.

Metalink Internet Draft

Friday, September 5th, 2008

Anthony Bryan seems to have worked hard lately and we’ve seen him submitting his Internet Draft for Metalink XML Download Description Format on the http-wg mailing list, and now the 02 (zero two) version is up for public browsing and commenting at

http://tools.ietf.org/html/draft-bryan-metalink-02

… and his interim versions are also browsable here.

Metalink

Interview with me

Friday, July 18th, 2008

I spoke to Anthony Bryan from the metalink project over Skype the other day, and the 16 minute recorded interview was recently posted so I thought I’d just announce my local copy of the 14MB file.

The topics should be of no surprise to readers of my blog: me, curl, Rockbox and metalink basically.

No metalink in libcurl

Tuesday, April 8th, 2008

It’s been a while since we had this discussion so I figure it is about time to re-iterate it and this time I thought I’d do a little blog post to put the lights on my stand-point regarding this issue:

metalink support in libcurl

I’ve had this discussion at length with Anthony Bryan (the main man behind the metalink format) privately in the past and I’ve bounced back a lot of feedback on the actual XML format to him and I believe some of that were taken into account and changed the format. Of course this was before it “settled” and started to get adopted. I think metalink is a great idea and the file format is (the last time I checked it out, I can’t seem to find the docs now) mostly making sense.

libcurlI have little to no understanding for the idea that libcurl should add support for this natively. metalink is just an XML format that sets up resources for an application to where and how it can download files, and libcurl does indeed support most of the protocols that such URLs can use. libcurl is a data transfer library that is oriented around a given URL and the URL in question has a 1:1 relationship to what protocol it is and it is always content-agnostic.

metalink is application layer, not transport. Adding metalink to libcurl would mean that all of a sudden libcurl would transfer a file and actually parse the (XML!) contents of that file and then get (possibly) multiple streams using multiple protocols based on what that parsing gave. It is just so many new things and violations against key libcurl concepts that I cannot see this done.

Metalink isn’t even a standard so we would then more or less open the gates for further random efforts to introduce similar ideas and whatnot and where would we draw the line? Currently I think we have a pretty solid border drawn in the sand and we don’t cross that line (on purpose).

And frankly, there is only one and one reason only (mentioned and that I can think of) for libcurl to support this feature and I that is because libcurl is already widely adopted it would be easier for metalink to conquer the world by sneaking in the back-door with libcurl as then a large amount of applications would support it with no additional efforts at all. But sorry, I don’t think that’s a good enough reason to break or change these important key concepts/limits of libcurl. (Actually, I think it is a bit foolish to think that adding metalink to libcurl would make all these applications automatically support metalink as there would be several arguments against that too.)

As I’ve said before, I think one of our biggest challenges in this project is to limit what libcurl does, to not allow it to grow in all directions, to keep the scope and to maintain focus.

A metalink file transfer library could be made as a layer on top of libcurl, and I think that is the only logical and sensible way.

Adding metalink support to the curl tool however, seems like a good idea to me…