Category Archives: Web

web stuff

OPTIMERA STHLM

Our friends at .SE are once again putting together an interesting conference-style day with talks, and this time the title of it is “OPTIMERA STHLM” (yes they use all caps) and it is all about optimizing web-related things.

I’ve been invited and I will do a 30 minute talk during that day about the transport layer and stuff on top of the transport layer. In other words it’ll include things to consider for TCP, DNS, HTTP, handling sockets, libcurl and a quick look at things such as Websockets, SPDY, MPTCP and SCTP.

The full day’s program is now available on the linked page. Enjoy!

cookie order

I’ve previously mentioned the IETF httpstate working group several times, and here’s some insights into one topic we’re currently discussing:

HTTP Cookie: sort order

The current httpstate draft for the updated cookie spec says that cookies that are sent to a server with the Cookie: header in a HTTP request need to be sorted. They must be sorted primarily on path length and secondary on creation-time.

According to others on the http-state mailinglist, that’s exactly what the major browsers do already and they thus think we should document that as that’s how cookies work.

During these discussions I’ve learned that cookies with the same name that have been specified with different paths, will all be sent in the request and thus they need to be sent in a particular order and in fact even the original Netscape “spec” said so. I agree with this and I’ve also modified libcurl to act accordingly.

I hold the position that only cookies that have identical names need this treatment and that’s what we must specify. Implementers however will most likely find that sorting all cookies at once will be easier.

The secondary sort key, the creation-time, is much more questionable. Why would the server care about that? How can they even rely on that?

Previously in this discussion (back in August 2009) I checked other open source cookie implementations to see how they deal with cookie sorting. I learned that perl’s LWP does sort them based on path length but on nothing else. The following tools did no sorting at all: wget, curl, libsoup, pavuk, lftp and aria2. I have no doubts that we can find more if we would search for more implementations in more languages and environments.

Specifying that sorting is a MUST on path length will still keep LWP and the next curl version compliant. Specifying that they MUST sort on creation-date as well will then make all of these projects non-conforming.

One problem I have in the cookie discussion in general is that the (5?) major browsers pretty much all behave the same and while the 5 major browsers have almost 100% of the web browser market for users, we cannot then automatically assume that they have 100% of the HTTP client or cookie-using client market. We just don’t know how many applications, tools and frameworks exist out there that aren’t actually browsers but still are using cookies.

Of course, I want to say that creation-time sorting is pointless but I have nothing to back that up with. No numbers. The other side of the discussion has a bunch of browsers that sort like this already, but no numbers or evidence if servers rely on this or how many that do.

Can any reader of this find a site out there that depends on the cookie order being sorted on creation-date?

(Reminder: the charter for the HTTPSTATE working group is to document existing widely used practices. It is not to solve problems or to fix problems in the existing cookie protocol. We all know and acknowledge that cookies as they currently work are quite flawed and painful.)

httpstate cookie domain pains

Back in 2008, I wrote about when Mozilla started to publish their effort the “public suffixes list” and I was a bit skeptic.

Well, the problem with domains in cookies of course didn’t suddenly go away and today when we’re working in the httpstate working group in the IETF with documenting how cookies work, we need to somehow describe how user agents tend to work with these and how they should work with them. It’s really painful.

The problem in short, is that a site that is called ‘www.example.com’ is allowed to set a cookie for ‘example.com’ but not for ‘com’ alone. That’s somewhat easy to understand. It gets more complicated at once when we consider the UK where ‘www.example.co.uk’ is fine but it cannot be allowed to set a cookie for ‘co.uk’.

So how does a user agent know that co.uk is magic?

Firefox (and Chrome I believe) uses the suffixes list mentioned above and IE is claimed to have its own (not published) version and Opera is using a trick where it tries to check if the domain name resolves to an IP address or not. (Although Yngve says Opera will soon do online lookups against the suffix list as well.) But if you want to avoid those tricks, Adam Barth’s description on the http-state mailing list really is creepy.

For me, being a libcurl hacker, I can’t of course but to think about Adam’s words in his last sentence: “If a user agent doesn’t care about security, then it can skip the public suffix check”.

Well. Ehm. We do care about security in the curl project but we still (currently) skip the public suffix check. I know, it is a bit of a contradiction but I guess I’m just too stuck in my opinion that the public suffixes list is a terrible solution but then I can’t figure anything that will work better and offer the same level of “safety”.

I’m thinking perhaps we should give in. Or what should we do? And how should we in the httpstate group document that existing and new user agents should behave to be optimally compliant and secure?

(in case you do take notice of details: yes the mailing list is named http-state while the working group is called httpstate without a dash!)

50 hours offline

Several sites in the haxx.se domain and other stuff related to me and my fellows were completely offline for almost 50 hours between August 24th 19:00 UTC and August 26th 20:30 UTC.

The sites affected included the main web sites for the following projects: curl, c-ares, trio, libssh2 and Rockbox. It also affected mailing lists and CVS repositories etc for some of those.

The reason for the outage has been explained by the ISP (Black Internet) to be because of some kind of sabotage. Their explanation given so far (first in Swedish):

Strax efter kl 20 i måndags drabbades Black Internet och Black Internets kunder av ett mycket allvarligt sabotage. Sabotaget gjordes mot flera av våra core-switchar, våra knutpunkter. Detta resulterade i ett mer eller mindre totalt avbrott för oss och våra kunder. Vi har polisanmält händelsen och har ett bra samarbete med dem.

Translated to English (by me) it becomes:

Soon after 8pm on Monday, Black Internet and its customers were struck by a very serious act of sabotage. The sabotage was made against several of our core switches. This resulted in a more or less total disruption of service for us and our customers. We have reported the incident to the police and we have a good cooperation with them.

Do note that you could keep track of this situation by following me on twitter.

It’s good to be back. Let’s hope it’ll take ages until we go away like that again!

Update: according to my sources, someone erased/cleared Black Internet’s core routers and then they learned that they had no working backups so they had to restore everything by hand.

I host www.libssh2.org

Sara Golemon, the founder and former maintainer of libssh2, pointed over the main site www.libssh2.org to my server the other day and now my previously unofficial libssh2 web site suddenly turned out to be the only and official one.

The plan is now to get the web contents push into a separate git repo to allow all libssh2’ers to modify it.

I’m also open and interested in feedback and ideas on how to improve the web site in whatever kind of way you think. Consider the current site mostly a placeholder for the info we have. How can we make it better?

libssh2

HTTP cookies IETF working group

So finally (remember I mentioned this list when it was created back in January 2009) an IETF http-state working group was created, with the following description:

The HTTP State Management Mechanism (Cookies) was original created by Netscape Communications in their Netscape cookie specification, from which a formal specification followed (RFC 2109, RFC 2965). Due to years of implementation and extension, several ambiguities have become evident, impairing interoperability and the ability to easily implement and use HTTP State Management Mechanism.

I’m on the list from the start and I hope to be able to contribute some of my cookie experiences and knowledge to aid the document to actually end up with something useful. The ambition, while it was “toned down” somewhat since the initial posts of the mailing lists, is still fairly high I would claim:

The working group will refine RFC2965 to:

  • Incorporate errata and updates
  • Clarify conformance requirements
  • Remove known ambiguities where they affect interoperability
  • Clarify existing methods of extensibility
  • Remove or deprecate those features that are not widely implemented and also unduly affect interoperability
  • Add features that are already widely implemented or have a critical mass of support
  • Where necessary, add implementation advice
  • Document the security properties of HTTP State Management Mechanism and its associated mechanisms for common applications

In doing so, it should consider:

  • Implementer experience
  • Demonstrated use of HTTP State Management Mechanism
  • Impact on existing implementations and deployments
  • Ability to achieve broad implementation.
  • Ability to address broader use cases than may be contemplated by the original authors.

The Working Group’s specification deliverables are:

  • A document that is suitable to supersede RFC 2965
  • A document cataloging the security properties of HTTP State Management Mechanism

I think this is a scope that is manageable enough to actually have a chance to succeed and its planning is quite similar to that of the IETF httpbis group. Still, RFC2965 lists a huge pile of stuff that has never been implemented by anyone and even though it was a while since I did read that spec I also expect it to lack several things existing cookie parsers and senders already use. The notorious IE httpOnly is an example I can think of right now.

My HTC Magic Review

This is my first “smartphone” I’ve owned myself so of course I have nothing else this fancy to actually compare against. I’ve played around with others’ a few times but that doesn’t really count. I’ve owned perhaps 8 mobile phones since I got my first one 1996, and they have all been Nokias and Sony Ericssons.

I was never really interested in iPhone due to many reasons. It is not open. It has a (very) restricted app distribution mechanism. It forbids apps from running simultaneously etc. And it has a pretty strong connection with itunes with no proper mass-storage syncing supported. But I admit that it has a slick UI and many cool apps.

My plan is to get some Android hacking going eventually and this is basically the first Android phone that has reached Swedish soil. I mean without requiring me to bend over backwards to get it, as I’m sure I could’ve bought previous Android phones from obroad if I really wanted to.

Random good things:

  • it’s fast, most things run faster than on my previous Sony Ericsson thing and yet this is way more advanced with much bigger screen estate and fancier UI
  • it has a nice gui that you mostly can guess how to work with
  • I love being able to use a qwerty-style keyboard when messaging instead of relying on T9 etc
  • wifi is fun, but with a decent data plan it basically only brings me slightly improved speed and I often can’t even tell the difference!
  • the integration with the Google services are nice, gmail and maps most noticeably
  • there really are a bunch of existing cool apps (I know iphone has lots more, but there are still thousands)
  • it has a much better approach to messaging, similar to what I’ve seen in the iphone, than I’ve ever experienced in a Nokia or Sony Ericsson. It focuses on conversations and keeps the “thread”.HTC Magic
  • I really really like the feeling of it being a networked thing that also can make phone calls. I can browse, use maps, use gmail just as easily as I can message or call people. With my previous phones all the internet-related services always felt tacked on like a very late afterthought.
  • The notification system is nice, and the three-screen wide “home” with its widget-system is really neat.

Bad stuff:

  • I’ve had some apps crash on me on occasion. But it’s rarely a problem as they’re restarted automatically for me.
  • Toggling wifi on/off a lot can sometimes lead to me not getting any data network at all, and I’ve had to reboot the phone to get back to phone-based (Edge/3G) data.

On-screen keyboard

Of course any and all geek friend I have ask me about how I deal with the on-screen keyboard. I must admit I’m still quite fond of it. Mostly because a physical keyboard makes the phone clonky and it adds physical contraints and wear-points that I don’t like. So the keyboard is a bit small, especially when the phone is in portrait mode, but the suggested completions are fine and I believe I’m already typing pretty quickly on the thing. When I ssh’ed from the phone to one of my servers I did find the obvious lack of cursor keys (to for example navigate an ordinary ncurses-based app or the command line history of a bash prompt) but other than that I really can’t complain.

Background Applications

One obvious advantage compared to iphones is of course the ability to run applications exactly the way I’d like. I can actually run the irc client and then have it in the background while I go browse the web or answer a call or whatever and then at my choice go back to the still connected irc client. In fact when playing with this it feels like a really ridiculous restriction of the iphone.

Comparing to my SE w550i

My previous phone is 94 grams compared to the Magic’s 116. The magic has a much bigger screen. The magic is roughly 11mm wider and 14mm taller. That makes it use 30% more volume (85 cm2) but still fits fine in the front pocket of any set of pants I use. The magic claims a lot longer battery life, but given that it has so much functionality I can’t help to play with all the time I doubt it’ll notice. It’ll more likely run down fast simply because I’ll use it more.

I’m also pleased that there’s no problem to just plug in the Magic to my Linux desktop and copy/sync the photos and the videos etc.

Google Integration

I realize some people will feel that the very tight integration with Google and Google’s services is a downside as it adds just another item that Google “owns” in your life. Still, it makes the experience very slick and as a user I get a lot of stuff “for free” as it just connects to lots of things that I already used and had accounts on. So gmail, sharing photos on picasaweb etc “just works”.

Concepts of a new distributed build

It was time to make an overhaul of our distributed builds system for Rockbox. The one currently in place is quite fancy and it does build 106 builds in around 7-8 minutes, but during the years it has served us we have found a few areas where we want to improve.

The goals for the new system were primarily:

  • do all the builds faster
  • reverse the connection so that people can contribute clients easier
  • make a system that is more allowing for slower machines to contribute

The biggest weaknesses of the existing system:

  • The master uses ssh to the distributed clients, which forces them to have an accessible ssh server and port etc. It also makes it awkward for people behind NATs who wants to run more clients.
  • It only hands out a particular build to one client, so thus if a large build happens to get handed to a slow client towards the end of a build round, all the other clients will sit idle waiting for the last client to finish.
  • The build and the subsequent upload of results to the master are synchronous, so thus a client with a very slow uplink may spend a significant time on the upload before it can start the next build.

The  new system is currently in development. It consists of a server that runs on one of our main servers, and there’s a client script that each volunteer contributor runs on their systems.

The clients connect to the master on a dedicated TCP port, specifying user name, password, name of the particular client instance, what particular architectures the client can build and how many bogomips the client boasts. While bogomips is a bogus way to measure anything, we’ve started out using it for a rough way to sort the the build clients based on speed.

The clients keep connected to the server all the time. There’s a ping message from the master every N second of idleness to make sure the connection is kept alive. As soon as the master wants the client to do a build, it sends a message to it detailing exactly how it should build it and using what SVN revision. The client will then do the build at once, upload the results using HTTP to a dedicated place and then tell the server the build is complete.

The server knows about all builds to do at a  commit, what we call a build round. It has a rough “score” or “weight” for each build that grades them in a slow to fast order. When a build round starts, the server will first sort all builds based on number of times they’ve been handed out and as secondary sort key the “weight” of it. Then it loops over the currently connected build clients and hand out builds from the sorted build table. The server then continues to do that until all clients have three builds each to build. As soon as a build is reported to have been completed by a client, that client will get the next build from the sorted build list.

If a client connects to the server and the server deems the client to be too old (since it does specify its version in the handshake message), it will be told to update to a specific version instead and come back then. This way the server can update all build clients when important things are fixed.

The clients will soon start to get assigned builds that already have been assigned to another client. This is not a problem but in fact our intention. The client that completes the build first will simply tell the server, and the server will then tell all the other clients that build that same build that they should cancel that particular build.

A client that joins the server in the middle of a build round will simply get a bunch of builds immediately and join in. A client that disconnects during a build round simply won’t complete its builds and other clients will instead do them. The system is also tolerant against the fact that bogomips is lame to compare computers with, and that the build “score” may not be very accurate or even that some server will have very slow or very fast upload speeds at unpredictable times.

The build master itself does not know when to start a new build round. It simply knows about the concept and it knows how to tell clients to complete a round. To make the master to start a new round, you need to connect to the server’s listening port and issue a special command and provide a password and then you can tell the server to start a build of a specific SVN revision. Or to queue up a build to be performed after the current one if there happens to be one in progress already.

When a full build round is complete, a hundred or so builds have been done, and full packages and log files are now in a directory on the build server, the server will simply trigger an external script that then takes care of updating our build table etc. In fact, every single completed build will optionally trigger an external script to allow web pages or stats pages to get updated as we go.

This build system is currently pretty Rockbox-specific as this is the project and development system we’re writing this for, but there’s really nothing in this that must be this way. I’m sure that if someone (you?) wants to adapt this for another project, I’d be more than happy to assist and to help ensuring that this becomes a more generic distributed build system. Just raise your hand and step forward!

At the time of this writing, (primarily) me and Björn are still ironing out quirks in this new system to hopefully get it going live real soon…

Rockbox

encrypted file transfer protocols compared

I like putting up some explanatory “this versus that” documents on stuff I know a little about. I’ve done things like curl vs wget, ftp vs http and http vs bittorrent in the past.

This time, I decided it was about time to do a technical comparison of the four major encrypted file transfer protocols SCP, SFTP, FTPS and HTTPS and explain how they differ in as many aspects and viewpoints as possible. I quite often get questions about how some of these compare against some of the others and why you’d use one instead of another etc. I hope this document will help people to find such answers themselves.

Of course I do mistakes and sometimes express myself in muddy ways, so your feedback and help is important. You can help me make this comparison become better!

http://daniel.haxx.se/docs/encrypted-transfer-protocols-compared.html

It’s still rough and all, but what question and comparisons between them do you miss? What mistakes have I done? What parts aren’t spelled out clear enough?