Tag Archives: metalink

curling the metalink

Back in 2005 Anthony Bryan started to work with his metalink idea, as can be read in this early 2006 article. Very simplified, Metalink is a way to tell a client how to download the same identical file from many places potentially in parallel. Anthony tells me he had the idea much earlier than so, going back to a bad experience trying to download a Fedora ISO from a download mirror...metalink_logo

Anthony's and my discussions about metalink started in September 2006 and we've bounced countless of mails and ideas back and forth since then. Even more, we've become friends and we've worked together on several related subjects as well, including several Internet Drafts within the IETF.

We had a metalink discussion on the libcurl mailing list back in April 2008 about whether to have libcurl support it natively or not, but we (I) ended up with the conclusion that it wasn't fit for libcurl. Basically because metalink is a layer on top of the application protocols that libcurl supports.

I wasn't quite prepared at that time to accept the patches for the curl tool since I didn't like all the XML stuff it would bring in and as I recall it I felt that I wasn't prepared to deal with that extra work load at the time. I think I told the guys I wanted to wait and see and try it more at a later point.

In September that same year I blogged about Anthony's work on getting an internet draft done for metalink. That would later in 2010 get released as RFC5854 and a year later RFC6249 came out with a way to provide all the info in HTTP headers instead of XML as the previous document was for. (Both RFCs contain acknowledgement to yours truly as contributor.)

Today

While I said metalink wasn't really fit for libcurl, it was always fit for curl - the command line client that uses libcurl but is more of a transfer tool. During the spring 2012 Anthony and super-hacker Tatsuhiro Tsujikawa approached me and asked if perhaps we were ready for metalink in curl this time?

Yes!

Since the last time, metalink has developed as a standard and there's now a libmetalink project to use and I felt it was a good time development wise as well. Tatsuhiro whipped up a refreshed patch in no time and soon we were polishing off the last little edges around the corners and the metalink patch set was merged into curl 7.27.0! Anthony's and Tatsuhiro's persistence and patience over the years are impressive. Thanks a lot my friends! That's a little over five and a half years since the first approach until it got merged into the mainline sources. That's nothing but pure dedication.

Usage

So, starting with curl 7.27.0 and assuming you built curl with the correct set of prereqs installed, this is how you use it:

curl --metalink [URL]

Where the URL is a URL that points to a metalink file, and then curl will download the file from one of the URLs mentioned. curl will at this point try them serially if there are multiple ones specified and not in parallel. Room for future improvements.

curl 7.27.0 will probably be released in the end of July 2012, but you can already get an early test version as a daily snapshot. We'll appreciate all feedback you can give us!

No summer of Rockbox 2012

For the first summer in many years I'm not doing any admin or mentor work for an organization for Google's Summer of Code program this year.

I've been mentoring, co-mentoring and admined within the Rockbox project the last... 4-5(?) summers and as a result I now have a good collection of t-shirts. 🙂 This year, the project sadly came to the conclusion that there was not a good enough number of mentors and projects ideas gathered for it to apply to become a mentor organization.

Taking care of a student for full-time work during many weeks is not something to take lightly. To do it properly you need a dedicated and qualified mentor. To provide a good starting point for students to figure out and come up with a good project proposal you need an really good and detailed list of ideas.

The gsoc task is hard enough as it is with many mentors and many good ideas, so when there's a sign of us not being able to fill up both lists we thought it better not to waste anyone's' time or energy. We also value and treasure Google's very fine help with open source over the years thanks to gsoc, and we would hate to end up looking like we try to just take advantage of our role of having been accepted as mentor organization for many years in a row in the past.

In the other end, I was very happy to see that my friends in the metalink project finally after having applied many years got accepted as a mentor organization. I'd like to think that perhaps we (as in the Rockbox project) by standing back this year can let others get the chance to shine and join in the fun.

There is nothing said or planned for Rockbox for next year. If people want to mentor and if we manage to get a good pile of ideas I'm sure we will apply to be a mentor organization again. If not, well then I'm sure other organizations will still participate in the program and possibly I will find myself involved in there via another project. I am involved in a bunch of other open source projects, but none of the ones I'm very active in have applied nor participated as mentor org in gsoc so far.

Byte ranges for FTP

In the IETF ftpext2 working group there have been some talks around clients' and servers' ability to do and support "ranged" file transfers, that is transferring only a piece of any given file. FTP supports the REST command and has done so since the dawn of man (RFC765 - June 1980), and using that command, a client can set the starting point for a transfer but there is no way to set the end point. HTTP has supported the Range: header since the first HTTP 1.1 spec back in January 1997, and that supports both a start and an end point. The HTTP header does in fact support multiple ranges within the same header, but let's not overdo it here!

Currently, to avoid getting an entire file a client would simply close the data connection when it has got all the data it wants. The unfortunate reality is that some servers don't notice clients doing this, so in order for this to work reliably a client also has to send ABOR, and after this command has been sent there is no way for the client to reliably figure out the state of the control connection so it has to get closed as well (which is crap in case more files are to be transferred to or from the same host). It primarily becomes unreliable because when ABOR is sent, the client gets one or two responses back due to a race condition between the closing and the actual end of transfer etc, and it isn't possible to tell exactly how to continue.

A solution for the future is being worked on. I've joined up the effort to write a spec that will suggest a new FTP command that sets the end point for a transfer in the same vein REST sets the start point. For the moment, we've named our suggested command RANG (as short for range). "We" in this context means Tatsuhiro Tsujikawa, Anthony Bryan and myself but we of course hope to get further valuable feedback by the great ftpext2 people.

There already are use cases that want range request for FTP. The people behind metalinks for example want to download the same file from many servers, and then it makes sense to be able to download little pieces from different sources.

The people who found the libcurl bugs I linked to above use libcurl as part of the Fedora/Redhat installer Anaconda, and if I understand things right they use this feature to just get the beginning of some files to check them out and avoid having to download the full file before it knows it truly wants it. Thus it saves lots of bandwidth.

In short, the use-cases for ranged FTP retrievals are quite likely pretty much the same ones as they are for HTTP!

The first RANG draft is now available.

murl for extended curlness

I'm a firm believer in the old unix mantra of letting each tool do its job and do it well, and pass on the rest of the work to the next tool. I've always stated that curl should remain this way and that it should remain within its defined walls and not try to do everything.

But time passes and more and more ideas are thrown up in the air, or in some cases directly at me, and the list of things that we could do but don't due to this philosophical limit of remaining focused has grown. It currently includes at least:

  • metalink support
  • recursive HTML downloads
  • recursive/wildcard FTP transfers
  • bittorrent support
  • automatic proxy configuration
  • simultaneous/parallel download support

Educated readers of course immediately detect that this list (if implemented) would make a tool that basically does what wget already does (and a lot more) and I've explicitly said for a decade that curl is not a wget clone. Maybe it is time for us (me?) to reevaluate that sentiment - at least in some sense.

I don't want to sacrifice the concepts that have worked so fine for curl under so many years, so I'm still firmly against stuffing all this into curl (or libcurl). That simply will not happen with me at the wheel.

A much more interesting alternative would be to instead start working on a second tool within the curl project: murl. A tool that does basically everything that curl already does, but also opens the doors for adding just about everything else we can cram in and that is still related to data transfers. That would include, but not be restricted to, all the fancy stuff mentioned in the list above!

No the name murl is not set in stone, nor is this whole idea anything but plain and early thoughts thrown out at this point so it may or may not actually take off. It will probably depend on if I get support and help from fellow hackers to get started and moving along.

cURL

Metalink in curl bounty

The Metalink guys host a list of project ideas and one of those ideas is to add metalink support to curl, and I recently bumped the stakes a bit by raising the bounty with an additional 200 USD so that the offer is now 500 USD for the person or team that brings the feature as described.

My primary motivation for doing this is that I like the metalink idea and I'd like to help making sure it gets used more widely.

No metalink in libcurl

It's been a while since we had this discussion so I figure it is about time to re-iterate it and this time I thought I'd do a little blog post to put the lights on my stand-point regarding this issue:

metalink support in libcurl

I've had this discussion at length with Anthony Bryan (the main man behind the metalink format) privately in the past and I've bounced back a lot of feedback on the actual XML format to him and I believe some of that were taken into account and changed the format. Of course this was before it "settled" and started to get adopted. I think metalink is a great idea and the file format is (the last time I checked it out, I can't seem to find the docs now) mostly making sense.

libcurlI have little to no understanding for the idea that libcurl should add support for this natively. metalink is just an XML format that sets up resources for an application to where and how it can download files, and libcurl does indeed support most of the protocols that such URLs can use. libcurl is a data transfer library that is oriented around a given URL and the URL in question has a 1:1 relationship to what protocol it is and it is always content-agnostic.

metalink is application layer, not transport. Adding metalink to libcurl would mean that all of a sudden libcurl would transfer a file and actually parse the (XML!) contents of that file and then get (possibly) multiple streams using multiple protocols based on what that parsing gave. It is just so many new things and violations against key libcurl concepts that I cannot see this done.

Metalink isn't even a standard so we would then more or less open the gates for further random efforts to introduce similar ideas and whatnot and where would we draw the line? Currently I think we have a pretty solid border drawn in the sand and we don't cross that line (on purpose).

And frankly, there is only one and one reason only (mentioned and that I can think of) for libcurl to support this feature and I that is because libcurl is already widely adopted it would be easier for metalink to conquer the world by sneaking in the back-door with libcurl as then a large amount of applications would support it with no additional efforts at all. But sorry, I don't think that's a good enough reason to break or change these important key concepts/limits of libcurl. (Actually, I think it is a bit foolish to think that adding metalink to libcurl would make all these applications automatically support metalink as there would be several arguments against that too.)

As I've said before, I think one of our biggest challenges in this project is to limit what libcurl does, to not allow it to grow in all directions, to keep the scope and to maintain focus.

A metalink file transfer library could be made as a layer on top of libcurl, and I think that is the only logical and sensible way.

Adding metalink support to the curl tool however, seems like a good idea to me...