Tag Archives: Web

Shared Dictionary Compression over HTTP

Wei-Hsin Lee of Google posted about their effort to create a dictionary-based compression scheme for HTTP. I find the idea rather interesting, and it’ll be fun to see what the actual browser and server vendors will say about this.

The idea is basically to use “cookie rules” (domain, path, port number, max-age etc) to make sure a client gets a dictionary and then the server can deliver responses that are diffs computed against the dictionary it has delivered before to the client. For repeated similar contents it should be able to achieve a lot better compression ratios than any other existing HTTP compression in use.

I figure it should be seen as a relative to the “Delta encoding in HTTP” idea, although the SDCH idea seems somewhat more generically applicable.

Since they seem to be using the VCDIFF algorithm for SDCH, the recent open-vcdiff announcement of course is interesting too.

A bad move. A really bad move.

So I wrote this little perl script to perform a lot of repeated binary Rockbox builds. It builds something like 35 builds and zips them up and gives them proper names in a dedicated output directory. Perfect to do things such as release builds.

Then I wrote a similar one to build manuals and offer them too. I then made the results available on the Rockbox 3.0RC (release candidate) page of mine.

Cool, me thinks, and since I’ll be away now for a week starting Wednesday I think I should make the scripts available in case someone else wants to play with them and possibly make a release while I’m gone.

I did

mv buildall.pl webdirectory/buildall.pl.txt

… thinking that I don’t want it to try to execute as a perl script on the server so I rename it to a .txt extension. But did this work? No. Did it cause total havoc? Yes.

First, Apache apparently still thinks these files are perl scripts (== cgi scripts) on my server, even if they got an additional extension. I really really didn’t expect this.

Then, my scripts are doing a command chain similar to “mkdir dir; cd dir; rm -rf *”. It works great when invoked in the correct directory. It works less fine when the web server invokes this because someone clicked on the file I just made available to the world.

Recursive deletion of all files the web server user was allowed to erase.

Did I immediately suspect foul play and evil doings by outsiders? Yes. Did it take quite a while to restore the damages from backups? Yes. Did it feel painful to realize that I myself was to blame for this entire incident and not at all any outside or evil perpetrator? Yes yes yes.

But honestly, in the end I felt good that it wasn’t a security hole somewhere that caused it since I hate spending all that time to track it down and fix it. And thanks to a very fine backup system, I had most of the site and things back up and running after roughly one hour off-line time.

Rockbox

Standardized cookies never took off

David M. Kristol is one of the authors of RFC2109 and RFC2965, “HTTP State Management Mechanism”. RFC2109 is also known as the first attempt to standardize how cookies should be sent and received. Prior to that document, the only cookie spec was the very brief document released by Netscape in the old days and it certainly left many loose ends.

Mr Kristol has published a great and long document, HTTP Cookies: Standards, Privacy, and Politics, about the slow and dwindling story of how the work on the IETF with the cookie standard took place and how it proceeded.

Still today, none of those documents are used very much. The original Netscape way is still how cookies are done and even if a lot of good will and great efforts were spent on doing things right in these RFCs, I can’t honestly say that I can see anything on the horizon that will push the web world towards changing the cookies to become compliant.

Flash 10 uses native libcurl

In Adobe’s Penguin.SWF blog, we can learn some details about the upcoming version 10 of the Adobe flash player for Linux:

They’ll rely on more libraries to be present in the system rather than provide them all by themselves in their own install. This apparently includes libcurl.

So if you get the RPM of the pre-release player, you’ll notice that it requires “libcurl.so.3” which is the old SONAME for libcurl (libcurl 7.15.5 was the last release which used the number 3) which no up-to-date distribution should provide anymore. Since october 2006 we’ve shipped libcurl.so.4.

Apparently, this made the Fedora people first implement a work-around for this that re-introduces the SONAME 3 from the same source the SONAME 4 is made from, only to a short while afterwards revert that decision

An interesting side-note is how the Fedora people repeat over and over in those threads that libcurl with SONAME 3 and SONAME 4 use the same ABI, although that is not true (at least not by my definition of what an ABI is). The bump was not accidentally made.

Update: it seems some blame this 3 == 4 thing on Debian

This is the type and I mean it

So someone pointed out this IEBlog entry for me, and I find it so hilarious I felt a need to share the fun. See the “MIME-Handling: Sniffing Opt-Out” paragraph towards the end.

Apparently Internet Explorer 7 and earlier just don’t care much for the Content-Type: header that servers reply, but they instead scan the body and guess what type it is. Thus “knowing better” than the content provider what content it truly is.

So in IE8 they’re (according to that blog entry) introducing a new attribute to the Content-Type header. If the site also sets “authoritative=true” it means it really means the type and the browser will then actually believe the site. I can’t stop giggling.

And yeah, some of the other craziness on that page is also good reading and they truly make you wonder what they are smoking during their brain storm meetings.

Tracking Swedish officials’ web use

I thought I’d just mention Patrik Wallström’s interesting project named Creeper. He basically offers a URL for an image to display on the site you run/admin, and then his server checks which IP addresses that gets the image and then cross-checks those addresses against a list of IP ranges of known swedish organizations, government agencies, political party headquarters and more.

An ingenious way to get some degree of knowledge about who looks at what and when. At least if a large enough amount of people display the Creeper icon: Creeper icon

HTML is all downhill from here

Ok, so we’re now officially over the hill. HTML4 was as good as we could make HTML, and the proof of this is the mighty insane and crazy mess W3C shows in their HTML 5 draft from January 22nd 2008.W3C logo

I mean, you can possibly (I mean if you hold on to your chair hard and really really try to extend your mind and be open) show sympathy and understanding for far-fetched stuff like “ping” and “prefetch” (or for that matter “author” and “contact”), but when they come to editing, drag-and-drop and how to deal with undo they certainly lost me. My favorite chapter in the draft is however this:

Broadcasting over Bluetooth

It takes a while just to suck up the concept of a HTML draft discussing broadcasting data. Over bluetooth. Man, these guys must have the best meetings!