Category Archives: Technology

Really everything related to technology

Websockets right now

Lots of sites today have JavaScript running that connects to the site and keeps the connection open for a long time (or just doing very frequent checks if there is updated info to get). This to such an wide extent that people have started working on a better way. A way for scripts in browsers to connect to a site in a TCP-like manner to exchange messages back and forth, something that doesn’t suffer from the problems HTTP provides when (ab)used for this purpose.

Websockets is the name of the technology that has been lifted out from the HTML5 spec, taken to the IETF and is being worked on there to produce a network protocol that is basically a message-based low level protocol over TCP, designed to allow browsers to do long-living connections to servers instead of using “long-polling HTTP”, Ajax polling or other more or less ugly tricks.

Unfortunately, the name is used for both the on-the-wire protocol as well as the JavaScript API so there’s room for confusion when you read or hear this term, but I’m completely clueless about JavaScript so you can rest assured that I will only refer to the actual network protocol when I speak of Websockets!

I’m tracking the Hybi mailing list closely, and I’m summarizing this post on some of the issues that’s been debated lately. It is not an attempt to cover it all. There is a lot more to be said, both now and in the future.

The first versions of Websockets that were implemented by browsers (Chrome) was the -75 version, as written by Ian Hickson as editor and the main man behind it as it came directly from the WHATWG team (the group of browser-people that work on the HTML5 spec).

He also published a -76 version (that was implemented by Firefox) before it was taken over by the IETF’s Hybi working group, who published that same document as its -00 draft.

There are representatives for a lot of server software and from browsers like Opera, Firefox, Safari and Chrome on the list (no Internet Explorer people seem to be around).

Some Problems

The 76/00 version broke compatibility with the previous version and it also isn’t properly HTTP compatible that is wanted and needed by many.

The way Ian and WHATWG have previously worked on this (and the html5 spec?) is mostly by Ian leading and being the benevolent dictator of what’s good, what’s right and what goes into the spec. The collision with IETF’s way of working that include “everyone can have a say” and “rough consensus” has been harsh and a reason for a lot of arguments and long threads on the hybi mailing list, and I will throw out a guess that it will continue to be the source of “flames” even in the future.

Ian maintains – for example – that a hard requirement on the protocol is that it needs to be “trivial to implement for amateurs”, while basically everyone else have come to the conclusion that this requirement is mostly silly and should be reworded to “be simple” or similar, that avoids the word “amateur” completely or the implying that amateurs wouldn’t be able to follow specs etc.

Right now

(This is early August 2010, things and facts in this protocol are likely to change, potentially a lot, over time so if you read this later on you need to take the date of this writing into account.)

There was a Hybi meeting at the recent IETF 78 meeting in Maastricht where a range of issues got discussed and some of the controversial subjects did actually get quite convincing consensus – and some of those didn’t at all match the previous requirements or the existing -00 spec.

Almost in parallel, Ian suggested a schedule for work ahead that most people expressed a liking in: provide a version of the protocol that can be done in 4 weeks for browsers to adjust to right now, and then work on an updated version to be shipped in 6 months with more of the unknowns detailed.

There’s a very lively debate going on about the need for chunking, the need for multiplexing multiple streams over a single TCP connection and there was a very very long debate going on regarding if the protocol should send data length-prefixed or if data should use a sentinel that marks the end of it. The length-prefix approach (favored by yours truly) seem to have finally won. Exactly how the framing is to be done, what parts that are going to be core protocol and what to leave for extensions and how extensions should work are not yet settled. I think everyone wants the protocol to be “as simple as possible” but everyone has their own mind setup on what amount of features that “the simplest” possible form of the protocol has.

The subject on exactly how the handshake is going to be done isn’t quite agreed to either by my reading, even if there’s a way defined in the -00 spec. The primary connection method will be using a HTTP Upgrade: header and thus connect to the server’s port 80 and then upgrade the protocol from HTTP over to Websockets. A procedure that is documented and supported by HTTP, but there’s been a discussion on how exactly that should be done so that cross-protocol requests can be avoided to the furthest possible extent.

We’ve seen up and beyond 50 mails per day on the hybi mailing list during some of the busiest days the last couple of weeks. I don’t see any signs of it cooling down short-term, so I think the 4 weeks goal seems a bit too ambitious at this point but I’m not ruling it out. Especially since the working procedure with Ian at the wheel is not “IETF-ish” but it’s not controlled entirely by Ian either.

Future

We are likely to see a future world with a lot of client-side applications connecting back to the server. The amount of TCP connections for a single browser is likely to increase, a lot. In fact, even a single web application within a single tab is likely to do multiple websocket connections when they re-use components and widgets from several others.

It is also likely that libcurl will get a websocket implementation in the future to do raw websocket transfers with, as it most likely will be needed in a future to properly emulate a browser/client.

I hope to keep tracking the development, occasionally express my own opinion on the matter (on the list and here), but mostly stay on top of what’s going on so that I can feed that knowledge to my friends and make sure that the curl project keeps up with the time.

chinese-socket

(The pictured socket might not be a websocket, but it is a pretty remarkably designed power socket that is commonly seen in China, and as you can see it accepts and works with many of the world’s different power plug designs.)

I’ll talk at FSCONS 2010

Recently I was informed that I got two talks accepted to the FSCONS 2010 conference, to be held in the beginning of November 2010.

My talks will be about the Future and current state of internet transport protocols (TCP, HTTP, SPDY, WebSockets, SCTP and more) and on High performance multi-protocol applications with libcurl, which will educate the audience on how to use libcurl when doing high performance clients with potentially a very large number of simultaneous transfers. A somewhat clueful reader will of course spot that these two talks have a lot in common, and yeah they do reveal a lot of what I do and what I like and what I poke on these days. I hope I’ll be able to put the light on some things not everyone is already perfectly aware of.

The talks will be held in English, and if the past FSCONS conferences tell anything, my talks will be video filmed and become available online afterward for the world to see if you have a funeral or something to attend to that prevents you from actually attending in person.

If you have thoughts, questions or anything on these topics that you would like to get answered in my talk, feel free to bring them up and I’ll see what I can do.

(If those fine guys and gals at FSCONS ever settled for a logo, or had one I could link to, I would’ve shown one of them right here.)

C-ares, now and ahead!

The project c-ares started many years ago (March 2004) when I decided to fork the existing ares project to get the changes done that I deemed necessary – and the original project owner didn’t want them.

I did my original work on c-ares back then primarily to get a good asynchronous name resolver for libcurl so that we would get around the limitation of having to do the name resolves totally synchronously as the libc interfaces mandate. Of course, c-ares was and is more than just name resolving and not too surprisingly, there have popped up other projects that are now using c-ares.

I’m maintaining a bunch of open source projects, and c-ares was never one that I felt a lot of love for, it was mostly a project that I needed to get done and when things worked the way I wanted them I found myself having ended up as maintainer for yet another project. I’ve repeatedly mentioned on the c-ares mailing list that I don’t really have time to maintain it and that I’d rather step down and let someone else “take over”.

After having said this for over 4 years, I’ve come to accept that even though c-ares has many users out there, and even seems to be appreciated by companies and open source projects, there just isn’t any particular big desire to help out in our project. I find it very hard to just “give up” a functional project, so I linger and do my best to give it the efforts and love it needs. I very much need and want help to maintain and develop c-ares. I’m not doing a very good job with it right now.

Threaded name resolves competes

I once thought we would be able to make c-ares capable of becoming a true drop-in replacement for the native system name resolver functions, but over the years with c-ares I’ve learned that the dusty corners of name resolving in unix and Linux have so many features and fancy stuff that c-ares is still a long way from that. It has also made me turn around somewhat and I’ve reconsidered that perhaps using a threaded native resolver is the better way for libcurl to do asynchronous name resolves. That way we don’t need any half-baked implementations of the resolver. Of course it comes at the price of a new thread for each name resolve, which turns really nasty of you grow the number of connections just a tad bit, but still most libcurl-using applications today hardly use more than just a few (say less than a hundred) simultaneous transfers.

Future!

I don’t think the future has any radical changes or drastically new stuff in the pipe for c-ares. I think we should keep polishing off bugs and add the small functions and features that we’re missing. I believe we’re not yet parsing all records we could do, to a convenient format.

As usual, a project is not about how much we can add but about how much we can avoid adding and how much we can remain true to our core objectives. I wish the growing popularity will make more people join the project and then not only to through a single patch at us, but to also hand around a while and help us somewhat more.

Hopefully we will one day be able to use c-ares instead of a typical libc-based name resolver and yet resolve the same names.

Join us and help us give c-ares a better future!

c-ares

poll vs select

I’m a person working a lot with networking and development around it. I mostly do this on Linux, often involving drivers or otherwise very close to the operating system and C and the core libraries.

The other day I once again fell over some random inaccuracy about poll compared to select and instead of trying to whine on some IRC channel or complain on their mailing list, I decided I would instead strike back by writing up and presenting a web page of my own. It details as much as possible about poll vs select and related event-based functions. I want it to become a placeholder for everything that is relevant to say about poll and select in a comparison aspect and when comparing them to event-based alternatives like libevent and libev.

So the next time I face someone not quite understanding this whole situation or perhaps when someone reiterates something that isn’t quite true, I have a resource to point to.

Not to mention that I think this new poll vs select page fits in nicely with my other “X vs Y” articles and docs pages I’ve written the last few years.

If you find flaws, or miss details or have questions about this page. Please do not hesitate to comment here, or to mail me about it or tweet me on twitter or whatever method you prefer. I appreciate your feedback!

poll vs select

Webscrapers unite!

The term web scraping has come to stay. It refers to getting stuff off web sites in an automated fashion. Other terms occasionally used for include spidering, web harvesting and more. I’ve used the term HTTP scripting for it.

The art is in many cases to act more or less like a browser with the single purpose of making the server not detect that it is a program in the other end.

This is something I’ve done a lot over the years while developing curl, and I’ve answered countless of questions on the matter. There are several good books (all of them having at least parts of them covering curl).

Today, we’ve setup a little web site over at webscrapers.haxx.se and an associated mailing list, in an attempt to gather people who are working in this field. Whether you do it for fun or profit doesn’t matter and while we can certainly discuss what tools you should or shouldn’t use for scraping, we certainly don’t limit which to discuss on the list as long as the subject is somewhat relevant.

So, if you’re into scraping or you want to learn more on how to automate your web bots, come on over and join the list and let’s see where this might take us!

Daniel’s currency exchange is no more

For quite a number of years I maintained a little web service to provide currency exchange rates in a handy format and in a way that was friendly for machines and other machine-exchangers. My personal favorite feature was the “easy conversion” helper that would provide a “easy to calculate in head” formula for back and forth between two currencies based on their current rates. Like “multiply by 5 and divide by 2” etc.

This service goes all the way back to 1997 when I started to work on getting exchange rates downloaded as a service to the IRC bot I ran in #amiga on efnet (even before the split when ircnet was created). Back then I was primarily working on the IRC bot named Dancer. 1997 I started the work on a tool to fetch rates. The tool would become curl and the web site to access the rates was initially hosted by the company Frontec for which I worked back then.One dollar bill

The URL changed a few more times but it has been available at http://daniel.haxx.se/currency for the last few years until a few weeks ago. Well, technically the URL still works but the service does not.

So a few weeks ago the primary site I’ve scraped for this info changed their format and I decided to not play cat and mouse anymore. I was already bending the rules by not reading their terms of service as I feared I wouldn’t be allowed to use their data like this. Also, I really don’t have any use for this service myself so I decided to do myself a service and stop wasting spare time on one of these projects that don’t give me enough personal satisfaction. I’m sure that if there is a demand for such a service I now closed down, there will be someone else out there ready to fire it up and serve users.

So long, and thanks for all the currency exchange fun.

Unexpected C64 reference

Yesterday at the OPTIMERA STHLM conference, Isac Lagerblad, really surprised me when he brought up an image of me and my fellow hackers in the C64 demo group named Horizon from back in the days of 1990. I’m the guy in the middle in the lower row.

You can see it happen at 02:35 into the part 4 clip, where you’ll go if you click on the image (Isac’s talk is in Swedish).

OPTIMERA STHLM sudden tribute to Daniel for cURL with C64 referenceThe original picture is a very old scan and isn’t a lot better:

I was honored and flattered by this unexpected “tribute”. Thanks Isac, it was really fun to see!

And before anyone asks: me and my brother Björn got our first Commodore 64 1985 and that is what got me into computers. I’ve not stopped enjoying them since then. We did a lot of demos, a few games and we had a great time and got similar minded friends all over the world.

My talk Optimera Sthlm

30 minutes is a tricky period to fill with contents when you do a talk, and yesterday I did my best at confusing/informing the audience at the OPTIMERA STHLM conference in transport layer performance. Where time is spent or lost today in TCP, what to think about to get things to behave faster, that RTT is not getting better even though brandwidth is growing really fast these days and a little about some future technologies like WebSockets, SPDY, SCTP and MPTCP.

Note: this talk is entirely in Swedish.

My slides for this is also viewable with slideshare.net like this:

Bye bye Crystone

or, why we should give up on service providers that don’t treat us well enough.

We co-locate

We (Haxx) have a server (technically speaking we have more than one but this is about our main one that hosts most of our public stuff). This server is ours. We bought it, installed it, configured it and then we handed it over to a company that “co-locates” it for us. It means they put our hardware in their big server room and we pay them for it and for the bandwidth our server consumes.

It also means that we have less control over it and we need to call the company to get access to our machine and so on. Ok, so we’ve used Crystone for this for a long time. They’ve been cheap enough and they haven’t complained when we’ve greatly overrun our bandwidth “allowance” for many months in a row.

A bad track record

We did have concerns a while ago (August 2009 and then again in March 2010) when they had power problems in their facility and we suffered from outages and server down-times. Crystone was then really bad at communicating with what happened, what they do and we started to look around for alternative providers since it started to get annoying and they didn’t seem to care for us properly. But we didn’t really get around to actually moving and time passed.

Maybe they had fixed their flaws and things were now fine?

A Saturday in May

Suddenly, on the early morning Saturday May 22nd 2010 our machine didn’t respond to network traffic anymore. We didn’t find out until we woke up and tried to use our services and after having tried a few things. we contacted Crystone to hear if the problem was theirs or if the problem was ours – we’ve had some troubles lately with the network interface card and we feared that perhaps the network might had stopped working due to this flaky hardware.

The customer service at Crystone immediately said that they were experiencing problems due to their move of the server park to the new facilities (they moved from Liljeholmen to Hammarby, both different locations within the general Stockholm area). They said they had network problems and that they were working on it. They did not give any estimation of when our machine would be back online.

They also said that they had mailed their customers about this move, and yeah we felt a bit bad about not having noticed such a mail so that we had been prepared.

The entire day passed. No network. Their web site mentioned problems due to this particular server move. We waited, we got no further info. We were unhappy.

Saturday become Sunday

How big problems can you have when the down-time for your customers exceeds 24 hours and you still haven’t fixed it nor told us what the problems actually are? The Sunday passed and they updated their web site a few times. The last update mentioned the time 16:03 and it said “most customers” are now back online and that if there’s any remaining problem we should contact their customer service. I spotted that message a couple of hours later, when our machine still wasn’t available. And what did customer service have to say to us about it? Nothing, they were closed. Our server remained dead and inaccessible.

Monday, now beyond 50 hours

In the wee hours of the Monday we passed 50 hours offline time and when the customer service “desk” opened in the morning and answered our phone call, they could get our machine back online. By rebooting it. No explanation from their part why our machine was like the only one that suffered this long.

A search in the mail logs also proved that Crystone never mailed us to tell that our server would move. Isn’t that odd? (not really, as we would find out later)

We won’t stand it

Already during the weekend we had decided we are fed up with this complete ignorance and crappy treatment. Down-times and problems happen, but the complete lack of information and care about us – their customers – is what made it clear we are not suitable to be their customers. We had to go elsewhere.

Crystone offered us a month fee worth of deduction on the hosting charges as a compensation for the troubles we had. That was nice of them, but really this service isn’t expensive so it’s not the cost of this that is burdensome. We just can’t stand having a service this unreliable and working with a company that is this uncommunicative.

This big server move was Crystone moving a lot of equipment over to the facility that is owned and run by Phonera, another ISP, and the one that we happened to have an offer from since before when we were looking for alternatives. Handy – we thought – perhaps we could just go there and carry our server over from one shelf to another and we’ll be fine. Phonera is slightly more expensive but hey, perhaps we’d get peace of mind!

“We don’t steal customers”

Phonera was first glad to accept us as customers, but surprised us greatly when they turned around and declined getting us as new customers, since they claimed they don’t want to “steal” customers from Crystone (that are now themselves customers of Phonera). Baffled, we simply sent off another request to Portlane instead and within minutes we had a decision made and a contract signed.

Later that afternoon, a Phonera guy got back to us and had changed position again and said that perhaps we could become customers anyway. They had figured out that none of them would gain by us going to a third company, but in any case it was now too late for them and we had already made up our minds about going Portlane.

“Sir, your server is not here”

On Tuesday 13:00, Björn (as co-admin of the server) had an appointment with Crystone to extract our server from their care to take it over to its new home. When he appeared in Hammarby at the new facility to get the server he was up for (another) surprise. It wasn’t there. Now Crystone could inform us that our server is still left in the old facility in Liljeholmen. It was never moved!

Glad our business with these guys would soon be over, Björn  handed over our 1U of server to Portlane and within a short while it had found a new home, with a new IP address and a new caretaker.

We could once again take a deep breath of relief and carry on with whatever we were doing before again.

curl and speced cookie order

I just posted this on the curl-library list, but I feel it suits to be mentioned here separately.

As I’ve mentioned before, I’m involved in the IETF http-state working group which is working to document how cookies are used in the wild today. The idea is to create a spec that new implementations can follow and that existing implementations can use to become more interoperable.

(If you’re interested in these matters, I can only urge you to join the http-state mailing list and participate in the discussions.)

The subject of how to order cookies in outgoing HTTP Cookie: headers have been much debated over the recent months and I’ve also blogged about it. Now, the issue has been closed and the decision went quite opposite to my standpoint and now the spec will say that while the servers SHOULD not rely on the order (yeah right, some obviously already do and with this specified like this even more will soon do the same) it will recommend clients to sort the cookies in a given way that is close to the way current Firefox does it[*].

This has the unfortunate side-effect that to become fully compatible with how the browsers do cookies, we will need to sort our cookies a bit more than what we just recently introduced. That in itself really isn’t very hard since once we introduced qsort() it is easy to sort on more/other keys.

The biggest problem we get with this, is that the sorting uses creation time of the cookies. libcurl and curl and others mostly use the Netscape cookie files to store cookies and keep state between invokes, and that file format doesn’t include creation time info! It is a simple text-based file format with TAB-separated columns and the last (7th) column is the cookie’s content.

In order to support the correct sorting between sessions, we need to invent a way to pass on the creation time. My thinking is that we do this in a way that allows older libcurls still understand the file but just not see/understand the creation time, while newer versions will be able to get it. This would be possible by extending the expires field (the 6th) as it is a numerical value that the existing code will parse as a number and it will stop at the first non-digit character. We could easily add a character separation and store the
creation time after. Like:

Old expire time:

2345678

New expire+creation time:

2345678/1234567

This format might even work with other readers of this file format if they do similar assumptions on the data, but the truth is that while we picked the format in the first place to be able to exchange cookies with a well known and well used browser, no current browser uses that format anymore. I assume there are still a bunch of other tools that do, like wget and friends.

Update: as my friend Micah Cowan explains: we can in fact use the order of the cookie file as “creation time” hint and use that as basis for sorting. Then we don’t need to modify the file format. We just need to make sure to store them in time-order… Internally we will need to keep a line number or something per cookie so that we can use that for sorting.

[*] – I believe it sorts on path length, domain length and time of creation, but as soon as the -06 draft goes online it will be easy to read the exact phrasing. The existing -05 draft exists at: http://tools.ietf.org/html/draft-ietf-httpstate-cookie-05