Tag Archives: Web

Future transports

On Sunday morning during FSCONS 2010, in the room “Torg 4 South” I did a 30 minute talk about a few future, potentially coming network protocols for transport. A quick look at the current state, some problems of today and 4 different technologies that have been and are being developed to solve the problem.

I got a fair amount of questions and several persons approached me afterwards to make sure they got a copy of my slides.

The video recording is hopefully going to be made available later on, but until then you can read the slides below and imagine my Swedish  accent talking about these matters!

Future transports

You can also download the slides directly as a PDF.

Testing 2-digit year numbers in cookies

In the current work of the IETF http-state working group, we’re documenting how cookies work. The question came up how browsers and clients treat years in ‘expires’ strings if the year is only specified with two digits. And more precisely, is 69 in the future or in the past?

I decided to figure that out. I setup a little CGI that can be used to check what your browser thinks:

http://daniel.haxx.se/cookie.cgi

It sends a single cookie header that looks like:

Set-Cookie: testme=yesyes; expires=Wed Sep  1 22:01:55 69;

The CGI script looks like this:

print "Content-Type: text/plain\n";
print "Set-Cookie: testme=yesyes; expires=Wed Sep  1 22:01:55 69;\n";
print "\nempty?\n";
print $ENV{'HTTP_COOKIE'};

You see that it prints the Cookie: header, so if you reload that URL you should see “testme=yesyes” being output if the cookie is still there. If the cookie is still there, your browser of choice treats the date above as a date in the future.

So, what browsers think 69 is in the future and what think 69 is in the past? Feel free to try out more browsers and tell me the results, this is the list we have so far:

Future:

Firefox v3 and v4 (year 2069)
curl (year 2038)
IE 7 (year 2069)
Opera (year 2036)
Konqueror 4.5
Android

Past:

Chrome (both v4 and v5)
Gnome Epiphany-Webkit

Thanks to my friends in #rockbox-community that helped me out!

(this info was originally posted to the httpstate mailing list)

Beyond just “69”

(this section was added after my first post)

After having done the above basic tests, I proceeded and wrote a slightly more involved test that sets 100 cookies in this format:

Set-Cookie: test$yy=set; expires=Wed Oct  1 22:01:55 $yy;

When the user reloads this page, the page prints all “test$yy” cookies that get sent to the server. The results with the various browsers is very interesting. These are the ranges different browsers think are future:

  • Firefox: 21 – 69 (Safari and Fennec and MicroB on n900) [*]
  • Chrome: 10 – 68
  • Konqueror: 00 – 99 (and IE3, Links, Netsurf, Voyager)
  • curl: 10 – 70
  • Opera: 41 – 69 (and Opera Mobile) [*]
  • IE8: 31 – 79 (and slimbrowser)
  • IE4: 61 – 79 (and IE5, IE6)
  • Midori: 10 – 69 (and IBrowse)
  • w3m: 10 – 37
  • AWeb: 10 – 77
  • Nokia 6300: [none]

[*] = Firefox has a default limit of 50 cookies per host which is the explanation to this funny range. When I changed the config ‘network.cookie.maxPerHost’ to 200 instead (thanks to Dan Witte), I got the more sensible and expected range 10 – 69. Opera has the similar thing, it has a limit of 30 cookies by default which explains the 41-69 limit in this case. It would otherwise get 10-69 as well. (thanks to Stanislaw Adrabinski). I guess that the IE8 range is similarly restricted due to it using a limit of 50 cookies per host and an epoch at 1980.

I couldn’t help myself from trying to parse what this means. The ranges can roughly be summarized like this:

0-9: mostly in the past
10-20: almost always in the future except Firefox
21-30: even more likely to be in the future except IE8
31-37: everyone but opera thinks this is the future
38-40: now w3m and opera think this is the past
41-68: everyone but w3m thinks this is the future
69: Chrome and w3m say past
70: curl, IE8, Konqueror say future
71-79: IE8 and Konqueror say future, everyone else say past
80-99: Konqueror say future, everyone else say past

How to test a browser near you:

  1. goto http://daniel.haxx.se/cookie2.cgi
  2. reload once
  3. the numbers shown on the screen is the year numbers the browsers consider
    to be the future as described above

Webscrapers unite!

The term web scraping has come to stay. It refers to getting stuff off web sites in an automated fashion. Other terms occasionally used for include spidering, web harvesting and more. I’ve used the term HTTP scripting for it.

The art is in many cases to act more or less like a browser with the single purpose of making the server not detect that it is a program in the other end.

This is something I’ve done a lot over the years while developing curl, and I’ve answered countless of questions on the matter. There are several good books (all of them having at least parts of them covering curl).

Today, we’ve setup a little web site over at webscrapers.haxx.se and an associated mailing list, in an attempt to gather people who are working in this field. Whether you do it for fun or profit doesn’t matter and while we can certainly discuss what tools you should or shouldn’t use for scraping, we certainly don’t limit which to discuss on the list as long as the subject is somewhat relevant.

So, if you’re into scraping or you want to learn more on how to automate your web bots, come on over and join the list and let’s see where this might take us!

Daniel’s currency exchange is no more

For quite a number of years I maintained a little web service to provide currency exchange rates in a handy format and in a way that was friendly for machines and other machine-exchangers. My personal favorite feature was the “easy conversion” helper that would provide a “easy to calculate in head” formula for back and forth between two currencies based on their current rates. Like “multiply by 5 and divide by 2” etc.

This service goes all the way back to 1997 when I started to work on getting exchange rates downloaded as a service to the IRC bot I ran in #amiga on efnet (even before the split when ircnet was created). Back then I was primarily working on the IRC bot named Dancer. 1997 I started the work on a tool to fetch rates. The tool would become curl and the web site to access the rates was initially hosted by the company Frontec for which I worked back then.One dollar bill

The URL changed a few more times but it has been available at http://daniel.haxx.se/currency for the last few years until a few weeks ago. Well, technically the URL still works but the service does not.

So a few weeks ago the primary site I’ve scraped for this info changed their format and I decided to not play cat and mouse anymore. I was already bending the rules by not reading their terms of service as I feared I wouldn’t be allowed to use their data like this. Also, I really don’t have any use for this service myself so I decided to do myself a service and stop wasting spare time on one of these projects that don’t give me enough personal satisfaction. I’m sure that if there is a demand for such a service I now closed down, there will be someone else out there ready to fire it up and serve users.

So long, and thanks for all the currency exchange fun.

roffit lives!

Many moons ago I created a little tool I named roffit. It is just a tiny perl script that converts a man page written in the nroff format to good-looking HTML. I should perhaps also add that I didn’t find any decent alternatives then so I wrote up my own version. I’ve been using it since in projects such as curl, c-ares and libssh2 to produce web versions of the docs.

It has just done its job and I haven’t had any needs to fiddle with it. The project page lists it as last modified in 2004, even though I actually moved it to a sourceforge CVS repo back in 2007.

Just a few days ago, I got emailed and was notified that Debian has it included as a package in the distribution and someone was annoyed on some particular flaws.

This resulted in a bunch of bugs getting submitted to the Debian bug tracker, I started up the brand new roffit-devel mailing list to easier host roffit discussions and I switched over the CVS repo to a git one on github.

If you like seeing man pages turned into web pages, consider joining up and help us improve this thing!

My talk Optimera Sthlm

30 minutes is a tricky period to fill with contents when you do a talk, and yesterday I did my best at confusing/informing the audience at the OPTIMERA STHLM conference in transport layer performance. Where time is spent or lost today in TCP, what to think about to get things to behave faster, that RTT is not getting better even though brandwidth is growing really fast these days and a little about some future technologies like WebSockets, SPDY, SCTP and MPTCP.

Note: this talk is entirely in Swedish.

My slides for this is also viewable with slideshare.net like this:

50 hours offline

Several sites in the haxx.se domain and other stuff related to me and my fellows were completely offline for almost 50 hours between August 24th 19:00 UTC and August 26th 20:30 UTC.

The sites affected included the main web sites for the following projects: curl, c-ares, trio, libssh2 and Rockbox. It also affected mailing lists and CVS repositories etc for some of those.

The reason for the outage has been explained by the ISP (Black Internet) to be because of some kind of sabotage. Their explanation given so far (first in Swedish):

Strax efter kl 20 i måndags drabbades Black Internet och Black Internets kunder av ett mycket allvarligt sabotage. Sabotaget gjordes mot flera av våra core-switchar, våra knutpunkter. Detta resulterade i ett mer eller mindre totalt avbrott för oss och våra kunder. Vi har polisanmält händelsen och har ett bra samarbete med dem.

Translated to English (by me) it becomes:

Soon after 8pm on Monday, Black Internet and its customers were struck by a very serious act of sabotage. The sabotage was made against several of our core switches. This resulted in a more or less total disruption of service for us and our customers. We have reported the incident to the police and we have a good cooperation with them.

Do note that you could keep track of this situation by following me on twitter.

It’s good to be back. Let’s hope it’ll take ages until we go away like that again!

Update: according to my sources, someone erased/cleared Black Internet’s core routers and then they learned that they had no working backups so they had to restore everything by hand.

I host www.libssh2.org

Sara Golemon, the founder and former maintainer of libssh2, pointed over the main site www.libssh2.org to my server the other day and now my previously unofficial libssh2 web site suddenly turned out to be the only and official one.

The plan is now to get the web contents push into a separate git repo to allow all libssh2’ers to modify it.

I’m also open and interested in feedback and ideas on how to improve the web site in whatever kind of way you think. Consider the current site mostly a placeholder for the info we have. How can we make it better?

libssh2

The web shop timeout mystery

Another one of the things in the modern world I’ve not yet understood:

why on earth do some web-based shops timeout your shopping and automatically clear you “shopping cart” if you just leave it around for a few hours/days? Why why why? What harm does it do them if I don’t hurry on to purchase?

I love being able to press ‘buy’ on lots of stuff (that then are added to the “cart”) and then ponder a few days if I want more stuff, if I selected the right models, alter a few things and similar. So when they time-out on me like this, it’s like a blow in the face and I need to start over again. It’s simply crazy that I have to backup my list of things to buy just in case they’ll flush me before I’m done!

Yes, I’m aware that some sites offer “save lists” etc if you’re registered and logged in and all, but I don’t want to have to do that.

I can imagine that at times things run out of stock or they even change the prices of merchandise that’s in my cart, but they could still solve that in other ways than just clearing everything.