Category Archives: Network

Internet. Networking.

HTTPbis at IETF75

Mark, one of the editors of the ongoing HTTPbis efforts, first mentioned that there wasn’t going to be any HTTPbis meeting on the upcoming IETF75 meeting in Stockholm July 26-31, 2009. I felt a bit sorry for that since I live in Stockholm, I’m a bit involved in the HTTPbis work and I’ve never been to a IETF meeting.

It simply must have been due to my almighty powers, but apparently two of the editors are going here anyway and there has now been a request for a HTTPbis session during the meeting.

I’m looking forward to this! Hopefully it’ll bring some fun talks on tech we care about, but also meeting cool people in real life that I never met before.

Stockholm

Oh, and am I the only one who can’t find the dates anywhere on ietf75.se?

Linux on eee s101

I got myself a new toy the other day: an Eee PC S101 with 16GB SSD, an extra 32GB SDHC and 2GB ram.

Asus EEE PC S101There’s already a bazillion instructions on how to install and run Linux on your EEE PCs out there, but they all seemed to miss one (for me) crucial little detail:

In order to boot from the SD card, you need to press Escape when the bios start-up screen shows.

But now: to get the bios screen to show, you need some extra magic: you need to press F2 immediately at start-up to enter the bios setup screen and then you need to disable “boot booster” as otherwise it’ll skip the escape checking entirely!

Using this trick, I’ve now installed easypeasy on it and I’ll dual-boot with XP for a while since it came factory installed with that.

I’ve fallen for the commercials and also subscribed to 3g broadband now (you know the blatant lying “up to 7.2mbit” which never in reality can even come close even if I would be alone sitting on top of a base station) and I warmed up my toy and connection the other day (still running XP then) by working on curl code and made a few commits etc, while sitting on a wooden bench next to the field where my daughter was having her soccer practice.

In fact, SSHing to my primary servers and editing code with emacs or reading email with alpine turned out a much better experience than I anticipated as I’ve read about how terrible the roundtrip times can be over 3g. It actually didn’t feel a lot more different than my regular SSHing from home over wifi.

bittorrent vs HTTP

A while ago I put together my document FTP vs HTTP that compares data transfers done using those two protocols. Similarities and differences.

Today I’m taking the next step in this little series and I offer you Bittorrent vs HTTP! This document discusses differences in areas such as:

  • Transfer Speed
  • Streaming
  • Uplink
  • Firewalls
  • Redundancy
  • Server Load
  • Encryption
  • Protocol Standards

As usual, I’m all ears for your valuable input and help on making it more accurate and more detailed than I manage to myself. Point out my mistakes, my weird use of words or whatever. Post a comment here or email me.

bittorrent vs http

HTTP Status Report

Mark Nottingham Mark Nottingham held a very interesting one hour talk on the status of HTTP and the work on HTTPbis on a QCon conference recently, and luckily for us HTTP geeks there’s this great video/presentation from that.

curl is mentioned at least twice in the slides, unfortunately it has a wrong fact on the second mention where it says curl uses “Pragma: no-cache” as it isn’t true anymore. It used to do that, but we’ve stopped doing it in curl since a while ago.

I’m a subscriber to the httpbis mailing list and a casual contributor, but nonetheless his summary and overview of the state was refreshing as I’ve not been able to keep up with all the details and I haven’t been tracking that working group from its start either.

libssh2 upped a notch

There have been some well-founded criticism against libssh2 for a long time for its bad transfer performance when doing SCP and SFTP based transfers. Tests have proved it to be significantly slower than the openssh based alternatives in comparisons done in similar conditions. We’re talking down to a tenth(!) of the speed for SFTP.

Luckily I have a unnamed (by agreement) sponsor who pays me for improving this.

Giving it some love

I basically started out reading the SFTP code and cleaned it up as I went over it, and I added some clarifying comments etc. I found some irregularities that I fixed. Soon I could spot an obvious performance boost, like perhaps 3-4 times the previous speed. But since SFTP was painfully slow originally, this was still very crappy compared to openssh.

I then switched over to plain SCP tests. SCP is basically just an “scp” command sent over SSH and then streaming the data over a plain SSH “channel”, while SFTP is a whole additional protocol layer on top. Thus SCP is more low-level, on the actual SSH level, and the foundation on which SFTP runs anyway so getting SCP faster was fundamental.

Make it speedier

My initial tests with libssh2 1.0 showed libssh2 to download data at roughly 25% of the speed of openssh when SCPing a 1GB file from an openssh server running on localhost. The openssh client shows roughly 40MB/sec on my test box.

Also, just checking my CPU load meter while doing the libssh2 transfers showed that it certainly wasn’t hitting the roof or anything. It was barely even noticeable! Of course something was really wrong but what was it?

SSH has a lower protocol layer that does the entire encryption thing, the transport layer, but on top of that is the “channel layer” that is packet based for sending data back and forth over the transport layer. This channel thing has a receive window concept, much like TCP itself has, which tells the remote side how much data it is allowed to send until it gets further notice.

libssh2 1.0 had a very conservative windowing logic. It started with a default window size of 64KB and it upped it at every read with the same amount that was read (which then could be 1K to 16KB something depending on the app).

My remake of this was to simplify the logic, read data from the network more evenly distributed over time, update the window size much less frequent and increase the window size by magnitudes! I found that when using a window size of 38MB (600 times the previous default size!!) things started flying.

Improved

With these modifications, libssh2 transfers SCP at close to 40MB/sec! SFTP is still left behind at a “mere” 14MB/sec on the same test setup but it has its own set of problems and solutions. Now this discussion on the libssh2 list is more about how to sensibly size the window to work the best way for different situations.

SFTP is a protocol that works more on file operations. The client sends OPEN, READ and CLOSE requests and the server replies with status and data. The READ request asks for N bytes starting at offset Z so a simple implementation like libssh2 asks for chunk after chunk in a serial manner, increasing the offset as it loops over the range. This causes a back-and-forth effect that certainly does not make optimized use of the network bandwidth.

SFTP ping pong

openssh has a nifty approach to enhance throughput for SFTP: it sends off and handles multiple outstanding READ requests in parallel so that it better can keep things busy (and the reverse when doing uploads). That concept is slightly harder to do with an API like the one libssh2 offers but it is of course still quite doable. I suspect that we might achieve results somewhat faster by simply use multiple connections as then we can remain using this simplistic approach but still use the full bandwidth. (Yes, I realize multiple connections may not be feasible for all applications.)

Previous tests we’ve done with SFTP uploads using multiple connections have proven libssh2 to be on par or even better than competitors on both Windows and Mac.

Please test

I’ll leave it like this for now. I’ll be very happy if people could test this version and report findings so that we make sure this is working and stable enough to release soonish. We’ll need to do something that offers window size controlling to apps, but we’ll discuss that further on the mailing list. Join in!

logo1-250

murl for extended curlness

I’m a firm believer in the old unix mantra of letting each tool do its job and do it well, and pass on the rest of the work to the next tool. I’ve always stated that curl should remain this way and that it should remain within its defined walls and not try to do everything.

But time passes and more and more ideas are thrown up in the air, or in some cases directly at me, and the list of things that we could do but don’t due to this philosophical limit of remaining focused has grown. It currently includes at least:

  • metalink support
  • recursive HTML downloads
  • recursive/wildcard FTP transfers
  • bittorrent support
  • automatic proxy configuration
  • simultaneous/parallel download support

Educated readers of course immediately detect that this list (if implemented) would make a tool that basically does what wget already does (and a lot more) and I’ve explicitly said for a decade that curl is not a wget clone. Maybe it is time for us (me?) to reevaluate that sentiment – at least in some sense.

I don’t want to sacrifice the concepts that have worked so fine for curl under so many years, so I’m still firmly against stuffing all this into curl (or libcurl). That simply will not happen with me at the wheel.

A much more interesting alternative would be to instead start working on a second tool within the curl project: murl. A tool that does basically everything that curl already does, but also opens the doors for adding just about everything else we can cram in and that is still related to data transfers. That would include, but not be restricted to, all the fancy stuff mentioned in the list above!

No the name murl is not set in stone, nor is this whole idea anything but plain and early thoughts thrown out at this point so it may or may not actually take off. It will probably depend on if I get support and help from fellow hackers to get started and moving along.

cURL

Windows localhost slowness

A client of mine and myself ran a bunch of tests doing FTP and SFTP transfers against localhost to measure how fast our custom solution is compared to a set of existing solutions.

The specific results from this aren’t what caught my eyes, mostly because they’re currently still only used for comparisons and to measure relative improvements, but it was instead the relative speed differences between the tests run on Mac 10.5.5, on Windows XP SP3 and on Linux 2.6.26.

Some of the Windows transfers took a magnitude more time than the others. Ten times longer. Since we could see this across multiple tests each being run multiple times and it was also visible with third party tools, the only conclusion I can draw from this is that Windows for some reason has a much slower localhost.

Does any reader of this have any further knowledge or details to share on this topic? Anyone knows if more recent Windows versions do this any better?

It should be noted that on Windows the ssh server used was running in cygwin, which may account for some of the slowness as cygwin isn’t really known for being blazingly fast…

Update:

Three friends responded to this question:

The first mention that he’d got problems on windows in the past where 127.0.0.1 worked but ‘localhost’ didn’t which might indicate that localhost for some reason would be treated differently.

The second said that it has been mentioned that Windows Vista has significant TCP improvements compared to older versions for which version the TCP/IP stack was rewritten completely.

Pierre (at Microsoft) pointed out that on Vista localhost resolves first to ::1 (ipv6) only, which may explain why some people experience quirks on Vista at least. This test was however done on XP…

More suggested HTTP fun

I’ve already previously expressed my deepest dislike with where the HTML5 work is going, and just yesterday two new internet-drafts appeared on ietf.org that spurred up discussions all around. They’re claimed to be “part of our effort to remove from HTML5 sections that are more appropriate elsewhere” but I’m thinking they’re rather inappropriate everywhere…

The first one named Content-Type Processing Model hits a subject that I’ve been over before, namely the stupidity of having web browsers guess the content based on what it looks like. IE introduced the “I really mean it property“, the HTML5 team wants to standardize the way of the guessing. Personally, I think the world of web will become a better place if the browsers would instead become stricter and more closer follow what the servers actually say the contents is, and then all users would complain to the site admins if things are wrong and then things should be fixed.

Guessing content types allows for sloppy behaviors, it makes it harder to write browsers for the web and it still features a significant risk of guessing wrong.

The second draft propagates for the new HTTP header “Origin”, which according to the authors would help to guard servers against CSRF (“Cross-Site Request Forgery“). The main author says 3% of users on the Internet gets their Referer header stripped while virtually none gets Origin stripped. I claim this is a bogus argument since they strip Referer beacause it is a known and established header and Origin is not. I also completely fail to see the goodness of this and based on several of the other responses on the ieth-http-wg mailing list I am not alone…

IETF http-state group created

Over at the IETF another group was just created named http-state (with an associated mailing list) with the specific goal:

Ultimately, the purpose of this group is to create an updated HTTP State Management Mechanism RFC (aka cookies) that will supersede the Netscape spec, RFCs 2109, 2964, 2965 then add in real-world usage (e.g. HTTPOnly), and possibly add in additional features and possibly merge in draft-broyer-http-cookie-auth-00.txt and draft-pettersen-cookie-v2-03.txt.

I’ve joined the list and I hope to follow and participate in this, as I believe the current state of HTTP cookies is a rather sorry mess and the Netscape spec is still what closest describes how cookies work in the wild. Of course I’ll do it with my libcurl experience in my luggage.

While it perhaps would be cool to join the group in more formal way, there’s no way for me to participate in that IETF meeting in San Francisco in March.

Fun with executable extensions in viewvc

A few years ago I wrote up silly little perl script (let’s call it script.pl) that would fetch a page from a site that returns a “random URL off the internet”. I needed a range of URLs for a test program of mine and just making up a thousand or so URLs is tricky. Thus I wrote this script that I would run and allow to get a range of URLs on each invoke and then run it again later and append to the log file. It wasn’t a fancy script, but it solved my task.

The script was part of a project I got funded to work on, that was improving libcurl back in 2005/2006 so I thought adding and committing the script to CVS felt only natural and served a good purpose. To allow others to repeat what I did.

Fast forward to late 2008. The script is now browsable via viewvc on a site that… eh, doesn’t have “.pl” disabled as a cgi extension in its config! The result of course is that each time someone tries to view the script using the web interface, the web server invokes the script locally!

All of a sudden I get a mail from someone, who apparently is admin or something of the site this old script was using, and he mentions that a machine on our network is hammering his site with many requests per second (38 requests/second apparently) and asked me to stop this. It turns out a search engine crawler has indexed the viewvc output several times, and now some 8 processes or so were running this script.pl and they were all looping around getting a page, outputting the URL, getting another page…

While I think 38 requests second is a bit low to even be considered a DOS, it certainly wasn’t intended nor friendly and I was greatly surprised when I slowly realized how it all came to end up like this! Man I suck! It reminds me of my other extension mess from just a few months ago…

Maybe I’ll learn how to do things right in the future when I grow up!