Wei-Hsin Lee of Google posted about their effort to create a dictionary-based compression scheme for HTTP. I find the idea rather interesting, and it’ll be fun to see what the actual browser and server vendors will say about this.
The idea is basically to use “cookie rules” (domain, path, port number, max-age etc) to make sure a client gets a dictionary and then the server can deliver responses that are diffs computed against the dictionary it has delivered before to the client. For repeated similar contents it should be able to achieve a lot better compression ratios than any other existing HTTP compression in use.
I figure it should be seen as a relative to the “Delta encoding in HTTP” idea, although the SDCH idea seems somewhat more generically applicable.
Since they seem to be using the VCDIFF algorithm for SDCH, the recent open-vcdiff announcement of course is interesting too.
So I wrote this little perl script to perform a lot of repeated binary Rockbox builds. It builds something like 35 builds and zips them up and gives them proper names in a dedicated output directory. Perfect to do things such as release builds.
Then I wrote a similar one to build manuals and offer them too. I then made the results available on the Rockbox 3.0RC (release candidate) page of mine.
Cool, me thinks, and since I’ll be away now for a week starting Wednesday I think I should make the scripts available in case someone else wants to play with them and possibly make a release while I’m gone.
mv buildall.pl webdirectory/buildall.pl.txt
… thinking that I don’t want it to try to execute as a perl script on the server so I rename it to a .txt extension. But did this work? No. Did it cause total havoc? Yes.
First, Apache apparently still thinks these files are perl scripts (== cgi scripts) on my server, even if they got an additional extension. I really really didn’t expect this.
Then, my scripts are doing a command chain similar to “mkdir dir; cd dir; rm -rf *”. It works great when invoked in the correct directory. It works less fine when the web server invokes this because someone clicked on the file I just made available to the world.
Recursive deletion of all files the web server user was allowed to erase.
Did I immediately suspect foul play and evil doings by outsiders? Yes. Did it take quite a while to restore the damages from backups? Yes. Did it feel painful to realize that I myself was to blame for this entire incident and not at all any outside or evil perpetrator? Yes yes yes.
But honestly, in the end I felt good that it wasn’t a security hole somewhere that caused it since I hate spending all that time to track it down and fix it. And thanks to a very fine backup system, I had most of the site and things back up and running after roughly one hour off-line time.
Anthony Bryan seems to have worked hard lately and we’ve seen him submitting his Internet Draft for Metalink XML Download Description Format on the http-wg mailing list, and now the 02 (zero two) version is up for public browsing and commenting at
… and his interim versions are also browsable here.
As the primary curl author, I’m finding the comments here interesting. That blog entry “Teaching wget About Root Certificates” is about how you can get cacerts for wget by downloading them from curl’s web site, and people quickly point out how getting cacerts from an untrusted third party place of course is an ideal situation for an MITM “attack”.
Of course you can’t trust any files off a HTTP site or a HTTPS site without a “trusted” certificate, but thinking that the curl project would run one of those just to let random people load PEM files from our site seems a bit weird. Thus, we also provide the scripts we do all this with so that you can run them yourself with whatever input data you need, preferably something you trust. The more paranoid you are, the harder that gets of course.
On Fedora, curl does come with ca certs (at least I’m told recent Fedoras do) and even if it doesn’t, you can actually point curl to use whatever cacert you like and since most default installs of curl uses OpenSSL like wget does, you could tell curl to use the same cacert your wget install uses.
This last thing gets a little more complicated when one of the two gets compiled with a SSL library that doesn’t easily support PEM (read: NSS), but in the case of curl in recent Fedora they build it with NSS but with an additional patch that allows it to still be able to read PEM files.
Anthony Bryan and I had a talk the other day regarding FTP vs HTTP etc, and the outcome is available as this podcast.
Since I’m doing my share of both FTP and HTTP hacking in the curl project, I quite often see and sometimes get the questions about what the actual differences are between FTP and HTTP, which is the “best” and isn’t it so that … is the faster one?
FTP vs HTTP is my attempt at a write-up covering most differences to users of the protocols without going into too technical details. If you find flaws or have additional info you think should be included, please let me know!
The document includes comparisons between the protocols in these areas:
- FTP Command/Response
- Two Connections
- Active and Passive
- Encrypted Control Connections
- Persistent Connections
- Chunked Encoding
- Name based virtual hosting
- Proxy Support
- Transfer Speed
With your help it could become a good resource to point curious minds to in the future…
When I got to work this morning I immediately noticed that one of the servers that host a lot of services for open source projects I tend to play around with (curl, Rockbox and more), had died. It responded to pings but didn’t allow my usual login via ssh. It also hosts this blog.
I called our sysadmin guy who works next to the server and he reported that the screen mentioned inode problems on an ext3 filesystem on sda1. Powercycling the machine did nothing good but the machine simply didn’t even see the hard drive…
I did change our slave DNS for rockbox.org and made it point to a backup web server in the mean time, just to make people aware of the situation.
Some 12 hours after the discovery of the situation, Linus Nielsen Feltzing had the system back up again and it’s looking more or less identical to how it was yesterday. The backup procedure proved itself to be working flawlessly. Linus inserted a new disk, partitioned similar like the previous one, restored the whole backup, fixed the boot (lilo) and wham (ignoring some minor additional fiddling) the server was again up and running.
David M. Kristol is one of the authors of RFC2109 and RFC2965, “HTTP State Management Mechanism”. RFC2109 is also known as the first attempt to standardize how cookies should be sent and received. Prior to that document, the only cookie spec was the very brief document released by Netscape in the old days and it certainly left many loose ends.
Mr Kristol has published a great and long document, HTTP Cookies: Standards, Privacy, and Politics, about the slow and dwindling story of how the work on the IETF with the cookie standard took place and how it proceeded.
Still today, none of those documents are used very much. The original Netscape way is still how cookies are done and even if a lot of good will and great efforts were spent on doing things right in these RFCs, I can’t honestly say that I can see anything on the horizon that will push the web world towards changing the cookies to become compliant.
In Adobe’s Penguin.SWF blog, we can learn some details about the upcoming version 10 of the Adobe flash player for Linux:
They’ll rely on more libraries to be present in the system rather than provide them all by themselves in their own install. This apparently includes libcurl.
So if you get the RPM of the pre-release player, you’ll notice that it requires “libcurl.so.3” which is the old SONAME for libcurl (libcurl 7.15.5 was the last release which used the number 3) which no up-to-date distribution should provide anymore. Since october 2006 we’ve shipped libcurl.so.4.
Apparently, this made the Fedora people first implement a work-around for this that re-introduces the SONAME 3 from the same source the SONAME 4 is made from, only to a short while afterwards revert that decision…
An interesting side-note is how the Fedora people repeat over and over in those threads that libcurl with SONAME 3 and SONAME 4 use the same ABI, although that is not true (at least not by my definition of what an ABI is). The bump was not accidentally made.
Update: it seems some blame this 3 == 4 thing on Debian…
So someone pointed out this IEBlog entry for me, and I find it so hilarious I felt a need to share the fun. See the “MIME-Handling: Sniffing Opt-Out” paragraph towards the end.
Apparently Internet Explorer 7 and earlier just don’t care much for the Content-Type: header that servers reply, but they instead scan the body and guess what type it is. Thus “knowing better” than the content provider what content it truly is.
So in IE8 they’re (according to that blog entry) introducing a new attribute to the Content-Type header. If the site also sets “authoritative=true” it means it really means the type and the browser will then actually believe the site. I can’t stop giggling.
And yeah, some of the other craziness on that page is also good reading and they truly make you wonder what they are smoking during their brain storm meetings.