Tag Archives: HTTP

curl and speced cookie order

I just posted this on the curl-library list, but I feel it suits to be mentioned here separately.

As I’ve mentioned before, I’m involved in the IETF http-state working group which is working to document how cookies are used in the wild today. The idea is to create a spec that new implementations can follow and that existing implementations can use to become more interoperable.

(If you’re interested in these matters, I can only urge you to join the http-state mailing list and participate in the discussions.)

The subject of how to order cookies in outgoing HTTP Cookie: headers have been much debated over the recent months and I’ve also blogged about it. Now, the issue has been closed and the decision went quite opposite to my standpoint and now the spec will say that while the servers SHOULD not rely on the order (yeah right, some obviously already do and with this specified like this even more will soon do the same) it will recommend clients to sort the cookies in a given way that is close to the way current Firefox does it[*].

This has the unfortunate side-effect that to become fully compatible with how the browsers do cookies, we will need to sort our cookies a bit more than what we just recently introduced. That in itself really isn’t very hard since once we introduced qsort() it is easy to sort on more/other keys.

The biggest problem we get with this, is that the sorting uses creation time of the cookies. libcurl and curl and others mostly use the Netscape cookie files to store cookies and keep state between invokes, and that file format doesn’t include creation time info! It is a simple text-based file format with TAB-separated columns and the last (7th) column is the cookie’s content.

In order to support the correct sorting between sessions, we need to invent a way to pass on the creation time. My thinking is that we do this in a way that allows older libcurls still understand the file but just not see/understand the creation time, while newer versions will be able to get it. This would be possible by extending the expires field (the 6th) as it is a numerical value that the existing code will parse as a number and it will stop at the first non-digit character. We could easily add a character separation and store the
creation time after. Like:

Old expire time:

2345678

New expire+creation time:

2345678/1234567

This format might even work with other readers of this file format if they do similar assumptions on the data, but the truth is that while we picked the format in the first place to be able to exchange cookies with a well known and well used browser, no current browser uses that format anymore. I assume there are still a bunch of other tools that do, like wget and friends.

Update: as my friend Micah Cowan explains: we can in fact use the order of the cookie file as “creation time” hint and use that as basis for sorting. Then we don’t need to modify the file format. We just need to make sure to store them in time-order… Internally we will need to keep a line number or something per cookie so that we can use that for sorting.

[*] – I believe it sorts on path length, domain length and time of creation, but as soon as the -06 draft goes online it will be easy to read the exact phrasing. The existing -05 draft exists at: http://tools.ietf.org/html/draft-ietf-httpstate-cookie-05

httpstate cookie domain pains

Back in 2008, I wrote about when Mozilla started to publish their effort the “public suffixes list” and I was a bit skeptic.

Well, the problem with domains in cookies of course didn’t suddenly go away and today when we’re working in the httpstate working group in the IETF with documenting how cookies work, we need to somehow describe how user agents tend to work with these and how they should work with them. It’s really painful.

The problem in short, is that a site that is called ‘www.example.com’ is allowed to set a cookie for ‘example.com’ but not for ‘com’ alone. That’s somewhat easy to understand. It gets more complicated at once when we consider the UK where ‘www.example.co.uk’ is fine but it cannot be allowed to set a cookie for ‘co.uk’.

So how does a user agent know that co.uk is magic?

Firefox (and Chrome I believe) uses the suffixes list mentioned above and IE is claimed to have its own (not published) version and Opera is using a trick where it tries to check if the domain name resolves to an IP address or not. (Although Yngve says Opera will soon do online lookups against the suffix list as well.) But if you want to avoid those tricks, Adam Barth’s description on the http-state mailing list really is creepy.

For me, being a libcurl hacker, I can’t of course but to think about Adam’s words in his last sentence: “If a user agent doesn’t care about security, then it can skip the public suffix check”.

Well. Ehm. We do care about security in the curl project but we still (currently) skip the public suffix check. I know, it is a bit of a contradiction but I guess I’m just too stuck in my opinion that the public suffixes list is a terrible solution but then I can’t figure anything that will work better and offer the same level of “safety”.

I’m thinking perhaps we should give in. Or what should we do? And how should we in the httpstate group document that existing and new user agents should behave to be optimally compliant and secure?

(in case you do take notice of details: yes the mailing list is named http-state while the working group is called httpstate without a dash!)

My Nordic Free Software Awards 2009 nominees

Hey, it’s really about time to nominate your favourite Free Software persons and projects from the nordic region for the 2009 awards before the time runs out.

This year, I decided to nominate the following “nordic” heroes:

Simon Josefsson

For his excellent work in GnuTLS, libssh2 and a bunch of other projects.

Henrik Nordström

For his work in the Squid project, and his efforts within IETF and its HTTP related struggles and more.

Björn Stenberg

As the primary founder of the Rockbox project. He started somehting special back in 2001 that now is a huge, thriving and succesful Free Software project.

As you might spot, I favor “doers”. I don’t believe in the concept of “nordic projects” when it comes to free or open software – the entire concept of open and free should mean that projects cross borders and regions.

In fact, it feels so out of the ordinary to think about open source people in a geographical context I find it hard to come up with a lot of names. It would be cool if ohloh had some ways to list people and projects based on where people live.

Then again, if a person from a nordic country moves somewhere else, is he or she still a nordic person? Does it depend on where the person lived during the actual act? Is Linus Torvalds a nordic person since he was born, lived many years and started his big project in Finland?

(yeah I already blogged about this subject but hey, it can’t hurt can it?)

My IETF 75 Story

A while ago, like a couple of years ago, I joined the mailing list for the HTTPbis effort. That’s the IETF work group which is working on producing an update to the HTTP 1.1 spec RFC2616 that clarifies it and removes things that nobody does or that doesn’t work. That spec is huge and the number of conflicting statements or just generally muddy expressions is large. This work is not near completion yet.

So when I learned the IETF75 meeting was to be held here in Stockholm, I of course cheered the opportunity to get to join in the talks and meet the people here.IETF

IETF is basically just an informal bunch of people who likes internet protocols, “above the wire and below the application”. The goal of the IETF is to make the Internet work better. This is the organization behind the RFCs. They wrote the specs for TCP, FTP, SMTP, POP3, HTTP and a wide range of other protocols we all use non-stop these days. I’m actually quite surprised IETF is so little known. When I mention the organization name, most people just give me a blank face back that reveals it just isn’t known to very many “civilians”.

IETF has meetings around the world 2 or 3 times per year, and every time some 1K-2K people join up and there’s a week filled with working group meetings, talks and a lot of socializing and so on. The attitude is generally relaxed all over and very welcoming for newcomers (like me). Everyone can join any talk or discussion. There’s simply no dress code, but everyone just wears whatever they think is comfortable.

The whole week was packed with scheduled talks and sessions, but I only spent roughly 1.5 days at the conference center as I had to get some “real work” done as well and quite honestly I didn’t find sessions that matched my interest every single day anyway.

After the scheduled days, and between sessions and sometimes instead of the scheduled stuff, a lot of social events took place. People meet in informal gatherings, talk, plans, sessions and dinners. The organizers of this particular even also arranged two separate off-topic social events. As one of the old-timers I talked to said something like “this year was unusually productive, but just about all of that was done outside of the schedule”…

This time. The 75th IETF meeting was hosted by .se in Stockholm Sweden in the end of July 2009. 1230 persons had signed up to come. The entrance fee was 650 USD unless you wanted to pay very late or at the door, as then it was a 130 extra or so. Oh, and we got a t-shirt!

One amusing little detail: on the t-shirt we got there was a “free invite code” to the Spotify streaming music service. But when people at the IETF meeting tried to use it at the conference center, Spotify refused to accept the users claiming it doesn’t work in the US! Clearly they’re not using the most updated ip-geography database in the world! 😉

So let me quickly mention a few of the topics I caught and found interesting, some of which I’ve blogged about separately.

HTTP over SCTP

The guys in the SCTP team works on this draft on how to do HTTP the best possible way over SCTP. They cite tests that claim “web browsing” becomes a better experience when done over SCTP. I’m personally quite interested in this work and in SCTP in general and I hope to be able to play with making libcurl support this in a not too distant future.

How to select TCP or SCTP

Assuming that SCTP gets widespread adoption and even browsers and browser-like apps start support it. How should the clients figure out which transport mechanism to use? The problem is similar to the selection between IPv4 or IPv6 but for the IP choice we can at least use A and AAAA lookups. The suggestions that were presented basically argued for trying all combinations in parallel and going with the fastest to respond, but a few bright minds questioned the smartness of that as it scales very badly if more transport options are added and it potentially introduces a lot more (start-up) traffic.

Getting HTTP from multiple mirrors

What the authors call Multiserver HTTP, is a pretty small suggestion of a few additional HTTP headers that a server would be able to return to a client hinting about other URIs where the exact same file/resource can be downloaded from. A client would then get that resource in parallel from multiple HTTP servers using range requests. Quite inspired by other technologies such as bittorrent and metalink. This first draft faced some criticism of not using existing HTTP as it could, but the general spirit has been welcoming.

The presenter of the idea, Mike Hanley, also mentioned the ability to allow the server response include a wildcard mention, so that a server for example could hint that “other images from this dir path can also be found under this dir path on this other server”. Thus a client would be able to download such images from multiple servers. This idea is not in that draft and I’m not personally sure I think it fits as nicely.

HTTP-state wg

During the IETF week, it was announced that the HTTP-state working group is being formed. It didn’t actually happen on the actual meeting but still… See my separate blog post on http-state.

Multipath TCP

The idea and concept behind MPTCP was new to me but I quickly come to like the thought of getting this into network stacks around me. I hope this will grow up to become something fine! See my separate blog post on multipath tcp for all details.

IRI

I visited the IRI BOF and got some fine insights on the troubles of creating the IRI spec. Without revealing too much, it’s quite clear sometimes that politics can be hard even in these surroundings…

tng – Transport Next Generation

What felt pretty “researchy” and still not really ready for adaption (or am I wrong?) is this effort they call Transport Next Generation. Their ideas include the concept of inserting a whole bunch of more layers into the typical network stuff, to for example move congestion handling into its own layer to be able to make it per network-segment basis instead of only doing end-to-end like today. Apparently they have tests and studies that suggest that the per network-segment basis can improve traffic a lot. These days a lot of the first part and last part of network accesses are done over wireless networks while the core center tends to still be physical cables.

DCCP

I found it interesting and amusing when they presented DCCP with all its bells and whistles, and then toward the end of the presentation it surfaces that they don’t really know what DCCP would be used for and at the moment the work group is pretty much done but there’s just nobody that’s using the protocol…

HTTPbis

HTTPbis is more or less my “home” in the IETF. We had a meeting in which things were discussed, some decisions were made and some new topics were raised. RFC2616 is a monster of a spec and it certainly contains so much details, so many potentially conflicting statements and quite clearly very many implementers have interpreted sections differently, that doing these clarifications is a next to endless work. I figure the work will simply be deemed “done” one day, and the remaining confusions will then just be left. The good part is then that the new document should at least be heaps better than the former. It will certainly benefit future and existing HTTP implementers nonetheless.

And a bunch of the HTTPbis guys got together a bit outside of the meeting as well, so we get to talk quite a bit and top off the evening with a dinner…

OpenDNSSec

There was quite a lot of DNSSEC talk during the week, and it annoys me that I double-booked the evening they had their opendnssec release (or was it tech preview?) party so I couldn’t go there and take advantage of my two free beers!

Observations

Apple laptops. A crushing majority of the people seemed to have Apple branded laptops, and nearly all presentations I saw were done with Apples.

Not too surprising, the male vs female ratio was very very high. I would guess 20:1 to 30:1, at least in those surrounding where I spent my time.

Upcoming Meetings

This week was lots of fun. More fun than I have had in a conference in a very long time. I’ll definately consider going to some upcoming meetings, although the next one in Japan in November doesn’t fit my schedule. Possibly Anaheim in March 2010 and even more likely the 78th meeting in Maastricht in July 2010.

Thanks everyone who was there. Thanks to the hosters for a great event. It was a blast!

HTTP cookies IETF working group

So finally (remember I mentioned this list when it was created back in January 2009) an IETF http-state working group was created, with the following description:

The HTTP State Management Mechanism (Cookies) was original created by Netscape Communications in their Netscape cookie specification, from which a formal specification followed (RFC 2109, RFC 2965). Due to years of implementation and extension, several ambiguities have become evident, impairing interoperability and the ability to easily implement and use HTTP State Management Mechanism.

I’m on the list from the start and I hope to be able to contribute some of my cookie experiences and knowledge to aid the document to actually end up with something useful. The ambition, while it was “toned down” somewhat since the initial posts of the mailing lists, is still fairly high I would claim:

The working group will refine RFC2965 to:

  • Incorporate errata and updates
  • Clarify conformance requirements
  • Remove known ambiguities where they affect interoperability
  • Clarify existing methods of extensibility
  • Remove or deprecate those features that are not widely implemented and also unduly affect interoperability
  • Add features that are already widely implemented or have a critical mass of support
  • Where necessary, add implementation advice
  • Document the security properties of HTTP State Management Mechanism and its associated mechanisms for common applications

In doing so, it should consider:

  • Implementer experience
  • Demonstrated use of HTTP State Management Mechanism
  • Impact on existing implementations and deployments
  • Ability to achieve broad implementation.
  • Ability to address broader use cases than may be contemplated by the original authors.

The Working Group’s specification deliverables are:

  • A document that is suitable to supersede RFC 2965
  • A document cataloging the security properties of HTTP State Management Mechanism

I think this is a scope that is manageable enough to actually have a chance to succeed and its planning is quite similar to that of the IETF httpbis group. Still, RFC2965 lists a huge pile of stuff that has never been implemented by anyone and even though it was a while since I did read that spec I also expect it to lack several things existing cookie parsers and senders already use. The notorious IE httpOnly is an example I can think of right now.

HTTPbis at IETF75

Mark, one of the editors of the ongoing HTTPbis efforts, first mentioned that there wasn’t going to be any HTTPbis meeting on the upcoming IETF75 meeting in Stockholm July 26-31, 2009. I felt a bit sorry for that since I live in Stockholm, I’m a bit involved in the HTTPbis work and I’ve never been to a IETF meeting.

It simply must have been due to my almighty powers, but apparently two of the editors are going here anyway and there has now been a request for a HTTPbis session during the meeting.

I’m looking forward to this! Hopefully it’ll bring some fun talks on tech we care about, but also meeting cool people in real life that I never met before.

Stockholm

Oh, and am I the only one who can’t find the dates anywhere on ietf75.se?

bittorrent vs HTTP

A while ago I put together my document FTP vs HTTP that compares data transfers done using those two protocols. Similarities and differences.

Today I’m taking the next step in this little series and I offer you Bittorrent vs HTTP! This document discusses differences in areas such as:

  • Transfer Speed
  • Streaming
  • Uplink
  • Firewalls
  • Redundancy
  • Server Load
  • Encryption
  • Protocol Standards

As usual, I’m all ears for your valuable input and help on making it more accurate and more detailed than I manage to myself. Point out my mistakes, my weird use of words or whatever. Post a comment here or email me.

bittorrent vs http

HTTP Status Report

Mark Nottingham Mark Nottingham held a very interesting one hour talk on the status of HTTP and the work on HTTPbis on a QCon conference recently, and luckily for us HTTP geeks there’s this great video/presentation from that.

curl is mentioned at least twice in the slides, unfortunately it has a wrong fact on the second mention where it says curl uses “Pragma: no-cache” as it isn’t true anymore. It used to do that, but we’ve stopped doing it in curl since a while ago.

I’m a subscriber to the httpbis mailing list and a casual contributor, but nonetheless his summary and overview of the state was refreshing as I’ve not been able to keep up with all the details and I haven’t been tracking that working group from its start either.

Code re-use is fun

Back in 2003 I wrote up support for the HTTP NTLM authentication method for libcurl. Happy with my achievement, I later that year donated a GPL licensed version of my code to the Wget project (which also was my first contact with the signed paper stuff with the GNU/FSF to waive my copyright claims and instead hand them over). What was perhaps not so amusing with this code was when both curl and Wget 2005 were discovered to have the same security flaw due to my mistakes in this code shared by both projects!

Just recently, the neon project seems to be interested in taking on the version I adjusted somewhat for them, so possibly the third HTTP code is soon using this. Yeah I posted it on their mailing list back then so it has been sitting there in the archives maturing for some 6 years by now…

I also happened to fall over the SSH Tunnel Creator tool, which I’ve never used myself, that apparently snatched my neon donation (quite according to what the license allowed of course) and used it in their tool to do NTLM!

It’s actually not until recent years I discovered libntlm, and while I don’t know how good it was back in the days when I wrote my first NTLM stuff I generally think using existing libs is the better idea…

More suggested HTTP fun

I’ve already previously expressed my deepest dislike with where the HTML5 work is going, and just yesterday two new internet-drafts appeared on ietf.org that spurred up discussions all around. They’re claimed to be “part of our effort to remove from HTML5 sections that are more appropriate elsewhere” but I’m thinking they’re rather inappropriate everywhere…

The first one named Content-Type Processing Model hits a subject that I’ve been over before, namely the stupidity of having web browsers guess the content based on what it looks like. IE introduced the “I really mean it property“, the HTML5 team wants to standardize the way of the guessing. Personally, I think the world of web will become a better place if the browsers would instead become stricter and more closer follow what the servers actually say the contents is, and then all users would complain to the site admins if things are wrong and then things should be fixed.

Guessing content types allows for sloppy behaviors, it makes it harder to write browsers for the web and it still features a significant risk of guessing wrong.

The second draft propagates for the new HTTP header “Origin”, which according to the authors would help to guard servers against CSRF (“Cross-Site Request Forgery“). The main author says 3% of users on the Internet gets their Referer header stripped while virtually none gets Origin stripped. I claim this is a bogus argument since they strip Referer beacause it is a known and established header and Origin is not. I also completely fail to see the goodness of this and based on several of the other responses on the ieth-http-wg mailing list I am not alone…