Tag Archives: FTP

My first 20 years of HTTP

During the autumn 1996 I took my first swim in the ocean known as HTTP. Twenty years ago now.

I had previously worked with writing an IRC bot in C, and IRC is a pretty simple text based protocol over TCP so I could use some experiences from that when I started to look into HTTP. That IRC bot was my first real application distributed to the world that was using TCP/IP. It was portable to most unixes and Amiga and it was open source.

1996 was the year the movie Independence Day premiered and the single hit song that plagued the world more than others that year was called Macarena. AOL, Webcrawler and Netscape were the most popular websites on the Internet. There were less than 300,000 web sites on the Internet (compared to some 900 million today).

I decided I should spice up the bot and make it offer a currency exchange rate service so that people who were chatting could ask the bot what 200 SEK is when converted to USD or what 50 AUD might be in DEM. - Right, there was no Euro currency yet back then!

I simply had to fetch the currency rates at a regular interval and keep them in the same server that ran the bot. I just needed a little tool to download the rates over HTTP. How hard can that be? I googled around (this was before Google existed so that was not the search engine I could use!) and found a tool named 'httpget' that made pretty much what I wanted. It truly was tiny - a few hundred nokia-1610lines of code.

I don't have an exact date saved or recorded for when this happened, only the general time frame. You know, we had no smart phones, no Google calendar and no digital cameras. I sported my first mobile phone back then, the sexy Nokia 1610 - viewed in the picture on the right here.

The HTTP/1.0 RFC had just recently came out - which was the first ever real spec published for HTTP. RFC 1945 was published in May 1996, but I was blissfully unaware of the youth of the standard and I plunged into my little project. This was the first published HTTP spec and it says:

HTTP has been in use by the World-Wide Web global information initiative since 1990. This specification reflects common usage of the protocol referred too as "HTTP/1.0". This specification describes the features that seem to be consistently implemented in most HTTP/1.0 clients and servers.

Many years after that point in time, I have learned that already at this time when I first searched for a HTTP tool to use, wget already existed. I can't recall that I found that in my searches, and if I had found it maybe history would've made a different turn for me. Or maybe I found it and discarded for a reason I can't remember now.

I wasn't the original author of httpget; Rafael Sagula was. But I started contributing fixes and changes and soon I was the maintainer of it. Unfortunately I've lost my emails and source code history from those earliest years so I cannot easily show my first steps. Even the oldest changelogs show that we very soon got help and contributions from users.

The earliest saved code archive I have from those days, is from after we had added support for Gopher and FTP and renamed the tool 'urlget'. urlget-3.5.zip was released on January 20 1998 which thus was more than a year later my involvement in httpget started.

The original httpget/urlget/curl code was stored in CVS and it was licensed under the GPL. I did most of the early development on SunOS and Solaris machines as my first experiments with Linux didn't start until 97/98 something.

sparcstation-ipc

The first web page I know we have saved on archive.org is from December 1998 and by then the project had been renamed to curl already. Roughly two years after the start of the journey.

RFC 2068 was the first HTTP/1.1 spec. It was released already in January 1997, so not that long after the 1.0 spec shipped. In our project however we stuck with doing HTTP 1.0 for a few years longer and it wasn't until February 2001 we first started doing HTTP/1.1 requests. First shipped in curl 7.7. By then the follow-up spec to HTTP/1.1, RFC 2616, had already been published as well.

The IETF working group called HTTPbis was started in 2007 to once again refresh the HTTP/1.1 spec, but it took me a while until someone pointed out this to me and I realized that I too could join in there and do my part. Up until this point, I had not really considered that little me could actually participate in the protocol doings and bring my views and ideas to the table. At this point, I learned about IETF and how it works.

I posted my first emails on that list in the spring 2008. The 75th IETF meeting in the summer of 2009 was held in Stockholm, so for me still working  on HTTP only as a spare time project it was very fortunate and good timing. I could meet a lot of my HTTP heroes and HTTPbis participants in real life for the first time.

I have participated in the HTTPbis group ever since then, trying to uphold the views and standpoints of a command line tool and HTTP library - which often is not the same as the web browsers representatives' way of looking at things. Since I was employed by Mozilla in 2014, I am of course now also in the "web browser camp" to some extent, but I remain a protocol puritan as curl remains my first "child".

I go Mozilla

Mozilla dinosaur head logo

In January 2014, I start working for Mozilla

I've worked in open source projects for some 20 years and I've maintained curl and libcurl for over 15 years. I'm an internet protocol geek at heart and Mozilla seems like a perfect place for me to continue to explore this interest of mine and combine it with real open source in its purest form.

I plan to use my experiences from all my years of protocol fiddling and making stuff work on different platforms against random server implementations into the networking team at Mozilla and work on improving Firefox and more.

I'm putting my current embedded Linux focus to the side and I plunge into a worldwide known company with worldwide known brands to do open source within the internet protocols I enjoy so much. I'll be working out of my home, just outside Stockholm Sweden. Mozilla has no office in my country and I have no immediate plans of moving anywhere (with a family, kids and all established here).

I intend to bring my mindset on protocols and how to do things well into the Mozilla networking stack and world and I hope and expect that I will get inspiration and input from Mozilla and take that back and further improve curl over time. My agreement with Mozilla also gives me a perfect opportunity to increase my commitment to curl and curl development. I want to maintain and possibly increase my involvement in IETF and the httpbis work with http2 and related stuff. With one foot in Firefox and one in curl going forward, I think I may have a somewhat unique position and attitude toward especially HTTP.

I've not yet met another Swedish Mozillian but I know I'm not the only one located in Sweden. I guess I now have a reason to look them up and say hello when suitable.

Björn and Linus will continue to drive and run Haxx with me taking a step back into the shadows (Haxx-wise). I'll still be part of the collective Haxx just as I was for many years before I started working full-time for Haxx in 2009. My email address, my sites etc will remain on haxx.se.

I'm looking forward to 2014!

Pointless respecifying FTP URI

There's this person wiIETFthin IETF who seems to possess endless energy and a never-ending wish to clean up tiny details within the IETF processes. He continuously digs up specifications that need to be registered or submitted again somewhere due to some process. Often under loud protests from fellow IETFers since it steals time and energy from people on the lists for discussions and reviews - only to satisfy some administrative detail. Time and energy perhaps better spent on things like new protocols and interesting new technologies.

This time, he has deemed that the FTP a FILE URI specs need to be registered properly, and alas he has submitted his first suggested update of the FTP URI.

From my work with curl I have of course figured out a few problems with RFC1738 that I don't think we should just repeat in a new version of the spec. It turns out I'm not alone in thinking this work isn't really good like this, and I posted a second mail to clarify my points.

We're not working on fixing the problems with FTP URIs that are present in RFC1738 so just rephrasing those into a new spec is a bad idea.

We could possibly start the work on fixing the problems, but so far I've seen no such will or efforts and I don't plan on pushing for that myself either.

Please tell me or the ftpext2 group where I or the others are wrong or right or whatever!

US patent 6,098,180

(I am not a lawyer, this is not legal advice and these are not legal analyses, just my personal observations and ramblings. Please correct me where I'm wrong or add info if you have any!)

At 3:45 pm on March 18th 2011, the company Content Delivery Solutions LLC filed a complaint in a court in Texas, USA. The defendants are several bigwigs and the list includes several big and known names of the Internet:

  • Akamai
  • AOL
  • AT&T
  • CD Networks
  • Globalscape
  • Google
  • Limelight Networks
  • Peer 1 Network
  • Research In Motion
  • Savvis
  • Verizon
  • Yahoo!

The complaint was later amended with an additional patent (filed on April 18th), making it list three patents that these companies are claimed to violate (I can't find the amended version online though). Two of the patents ( 6,393,471 and 6,058,418) are for marketing data and how to use client info to present ads basically. The third is about file transfer resumes.

I was contacted by a person involved in the case at one of the defendants'. This unspecified company makes one or more products that use "curl". I don't actually know if they use the command line tool or the library - but I figure that's not too important here. curl gets all its superpowers from libcurl anyway.

This Patent Troll thus basically claims that curl violates a patent on resumed file transfers!

The patent in question that would be one that curl would violate is the US patent 6,098,180 which basically claims to protect this idea:

A system is provided for the safe transfer of large data files over an unreliable network link in which the connection can be interrupted for a long period of time.

The patent describes several ways in how it may detect how it should continue the transfer from such a break. As curl only does transfer resumes based on file name and an offset, as told by the user/application, that could be the only method that they can say curl would violate of their patent.

The patent goes into detail in how a client first sends a "signature" and after an interruption when the file transfer is about to continue, the client would ask the server about details of what to send in the continuation. With a very vivid imagination, that could possibly equal the response to a FTP SIZE command or the Content-Length: response in a HTTP GET or HEAD request.

A more normal reader would rather say that no modern file transfer protocol works as described in that patent and we should go with "defendant is not infringing, move on nothing to see here".

But for the sake of the argument, let's pretend that the patent actually describes a method of file transfer resuming that curl uses.

The '180 (it is referred to with that name within the court documents) patent was filed at February 18th 1997 (and issued on August 1, 2000). Apparently we need to find prior art that was around no later than February 17th 1996, that is to say one year before the filing of the stupid thing. (This I've been told, I had no idea it could work like this and it seems shockingly weird to me.)

What existing tools and protocols did resumed transfers in February 1996 based on a file name and a file offset?

Lots!

Thank you all friends for the pointers and references you've brought to me.

  • The FTP spec RFC 959 was published in October 1985. FTP has a REST command that tells at what offset to "restart" the transfer at. This was being used by FTP clients long before 1996, and an example is the known Kermit FTP client that did offset-based file resumed transfer in 1995.
  • The HTTP header Range: introduces this kind of offset-based resumed transfer, although with a slightly fancier twist. The Range: header was discussed before the magic date, as also can be seen on the internet already in this old mailing list post from December 1995.
  • One of the protocols from the old days that those of us who used modems and BBSes in the old days remember is zmodem. Zmodem was developed in 1986 and there's this zmodem spec from 1988 describing how to do file transfer resumes.
  • A slightly more modern protocol that I've unfortunately found no history for before our cut-off date is rsync, as I could only find the release mail for rsync 1.0 from June 1996. Still long before the patent was filed obviously, and also clearly showing that the one year margin is silly as for all we know they could've come up with the patent idea after reading the rsync releases notes and still rsync can't be counted as prior art.
  • Someone suggested GetRight as a client doing this, but GetRight wasn't released in 1.0 until Febrary 1997 so unfortunately that didn't help our case even if it seems to have done it at the time.
  • curl itself does not pre-date the patent filing. curl was first released in March 1998, and the predecessor was started around summer-time 1997. I don't have any remaining proofs of that, and it still wasn't before "the date" so I don't think it matters much now.

At the time of this writing I don't know where this will end up or what's going to happen. Time will tell.

This Software patent obviously is a concern mostly to US-based companies and those selling products in the US. I am neither a US citizen nor do I have or run any companies based in the US. However, since curl and libcurl are widely used products that are being used by several hundred companies already, I want to help bring out as much light as possible onto this problem.

The patent itself is of course utterly stupid and silly and it should never have been accepted as it describes trivially thought out ideas and concepts that have been thought of and implemented already decades before this patent was filed or granted although I claim that the exact way explained in the patent is not frequently used. Possibly the protocol using a method that is closed to the description of the patent is zmodem.

I guess I don't have to mention what I think about software patents.

I'm convinced that most or all download tools and browsers these days know how to resume a previously interrupted transfer this way. Why wouldn't these guys also approach one of the big guys (with thick wallets) who also use this procedure? Surely we can think of a few additional major players with file tools that can resume file transfers and who weren't targeted in this suit!

I don't know why. Clearly they've not backed down from attacking some of the biggest tech and software companies.

patent drawing

(Illustration from the '180 patent.)

“Hacking me”

If you ever wonder how clever it was of me to make an FTP tool that used the default anonymous password "curl_by_daniel@..." once upon a time and you want to know why I changed that to ftp@example.com instead? Here's a golden snippet to just absorb and enjoy:

Date: Thu, 23 Dec 2010 22:56:00
From: iHack3r <miithehacker@gmail.com>
To: info@haxx.se
Subject: Hacking me
Parts/Attachments:
1 Shown    8 lines  Text (charset: ISO-8859-1)
2   OK    ~7 lines  Text (charset: ISO-8859-1)
----------------------------------------
To the idiot named Daniel, Please stop brute force attacking my FTP client.
I do not appreciate it, i have an anonymous account set up for the general
public to access my files that i want them to access, QUIT trying to hack
the admin because 1. DISABLED unless i am leaving to go somewhere without my
computer 2: THE PASSWORD is random letters and numbers.
-iHack3r
Date: Thu, 23 Dec 2010 22:56:00 From: iHack3r <hidden> To: info@[my company] Subject: Hacking me To the idiot named Daniel, Please stop brute force attacking my FTP client. I do not appreciate it, i have an anonymous account set up for the general public to access my files that i want them to access, QUIT trying to hack the admin because 1. DISABLED unless i am leaving to go somewhere without my computer 2: THE PASSWORD is random letters and numbers. -iHack3r

The password was changed at Feb 13 2007 in the curl version 7.16.2, but there are a surprisingly large amount of older curls still around out there...

Update: as the person responded again after having read this blog post and still didn't get it, I felt the urge to speak up in even more clear terms:

I didn't have anything to do with any "hacker attack" on any site. Not yours, and not anyone else's. The fact that almost-my-email address appeared in your logs is because I wrote the FTP client. It is a general FTP client that is being used by a very very large amount of people all over the world. If I ever would attack a site, why on earth would I send along my real name or email address?

Byte ranges for FTP

In the IETF ftpext2 working group there have been some talks around clients' and servers' ability to do and support "ranged" file transfers, that is transferring only a piece of any given file. FTP supports the REST command and has done so since the dawn of man (RFC765 - June 1980), and using that command, a client can set the starting point for a transfer but there is no way to set the end point. HTTP has supported the Range: header since the first HTTP 1.1 spec back in January 1997, and that supports both a start and an end point. The HTTP header does in fact support multiple ranges within the same header, but let's not overdo it here!

Currently, to avoid getting an entire file a client would simply close the data connection when it has got all the data it wants. The unfortunate reality is that some servers don't notice clients doing this, so in order for this to work reliably a client also has to send ABOR, and after this command has been sent there is no way for the client to reliably figure out the state of the control connection so it has to get closed as well (which is crap in case more files are to be transferred to or from the same host). It primarily becomes unreliable because when ABOR is sent, the client gets one or two responses back due to a race condition between the closing and the actual end of transfer etc, and it isn't possible to tell exactly how to continue.

A solution for the future is being worked on. I've joined up the effort to write a spec that will suggest a new FTP command that sets the end point for a transfer in the same vein REST sets the start point. For the moment, we've named our suggested command RANG (as short for range). "We" in this context means Tatsuhiro Tsujikawa, Anthony Bryan and myself but we of course hope to get further valuable feedback by the great ftpext2 people.

There already are use cases that want range request for FTP. The people behind metalinks for example want to download the same file from many servers, and then it makes sense to be able to download little pieces from different sources.

The people who found the libcurl bugs I linked to above use libcurl as part of the Fedora/Redhat installer Anaconda, and if I understand things right they use this feature to just get the beginning of some files to check them out and avoid having to download the full file before it knows it truly wants it. Thus it saves lots of bandwidth.

In short, the use-cases for ranged FTP retrievals are quite likely pretty much the same ones as they are for HTTP!

The first RANG draft is now available.

An FTP hash command

Anthony Bryan strikes again. This time his name is attached to a new standards draft for how to get a hash checksum of a given file when using the FTP protocol. draft-bryan-ftp-hash-00 was published just a few days ago.

The idea is basically to introduce a spec for a new command named 'HASH' that a client can issue to a server to get a hash checksum for a given file in order to know that the file has the exact same contents you want before you even start downloading it or otherwise consider it for actions.

The spec details how you can ask for different hash algorithms, how the server announces its support for this in its FEAT response etc.

I've already provided some initial feedback on this draft, and I'll try to assist Anthony a bit more to get this draft pushed onwards.

an application protocol monster

During the autumn 2009 I was sponsored by a company to work on some new protocols for libcurl to support - IMAP, POP3, SMTP and their TLS-powered versions. It took me a little while to get things working the way I'd like them to due work load elsewhere and other irrelevant distractions.

In the next curl and libcurl release, to be called version 7.20.0, which if everything runs fine and according to plan might happen at the end of January 2010 or possibly a little later, libcurl will truly have converted from a file transfer library into a full fledged application protocol layer monster.

The current incarnation of my development libcurl supports the following 18(!) protocols:

tftp ftp telnet dict ldap ldaps http file https ftps scp sftp imap imaps pop3 pop3s smtp smtps

While SSL versions of protocols are arguably not separate protocols, the 12 protocols in the list without SSL are still many in my view.

The fundamentals of libcurl remain the same though: specify a URL to operate against. Send or receive data. Sometimes both send and receive in the same request.

Internally in libcurl, I converted a big portion of the FTP-specific code into a more generic "pingpong"-generic code which is now designed to work similarly with all these new protocols that share many similarities with FTP. They are all sending commands to the server and get responses back, in similar ways.

As before, the ability to disable specific protocols when building libcurl remains so for those who don't particularly care about these new ones and want to maintain building a library that is as small and lean as possible, there should be little extra weight due to this recent development. I've been pondering but I've not yet figured out the most perfect way to deal with such build options in the code so they are still a bit too #if and #ifdef intensive for my taste...

Meanwhile, Chris Conroy has been busy in his end implementing support for RTSP that also soon might be ripe for inclusion in the main code.

Not too surprisingly perhaps, the curl tool also then gets these protocol abilities so it becomes even more than before the Swiss army knife tool for internet protocols and then of course explicitly application layer ones.

cURL

FTP vs HTTP, really!

Since I'm doing my share of both FTP and HTTP hacking in the curl project, I quite often see and sometimes get the questions about what the actual differences are between FTP and HTTP, which is the "best" and isn't it so that ... is the faster one?

FTP vs HTTP is my attempt at a write-up covering most differences to users of the protocols without going into too technical details. If you find flaws or have additional info you think should be included, please let me know!

The document includes comparisons between the protocols in these areas:

  • Age
  • Upload
  • ASCII/binary
  • Headers
  • Pipelining
  • FTP Command/Response
  • Two Connections
  • Active and Passive
  • Firewalls
  • Encrypted Control Connections
  • Authentications
  • Download
  • Ranges/resume
  • Persistent Connections
  • Chunked Encoding
  • Compression
  • FXP
  • IPv6
  • Name based virtual hosting
  • Proxy Support
  • Transfer Speed

With your help it could become a good resource to point curious minds to in the future...