Category Archives: Web

web stuff

this vs that and ssh through proxy

Taken from the web stats for daniel.haxx.se during September 2012. The top-10 search phrases used to end up on a page on this site:

  1. ssh proxy (198)
  2. curl vs wget (145)
  3. ftp vs http (92)
  4. wget vs curl (91)
  5. ssh through proxy (72)
  6. http vs ftp (67)
  7. curl wget (55)
  8. wget curl (53)
  9. http ftp (46)
  10. difference between ftp and http (45)

The top-3 most visited pages on my site during the same month were:

  1. SSH Through or Over Proxy (viewed 4800 times)
  2. curl vs Wget (viewed 3000 times)
  3. FTP vs HTTP (viewed 2300 times)

I guess this tells me something. I’m not sure what…

Swedish FOSS-magasin

foss-magasinClaes, our friend from foss-sthlm and several Open Source adventures, has just fired off a new initiative: FOSS-Magasin. The site launched for real on the evening November 19th.

Where there’s no real content on the site yet, Claes has set out a mission for himself and future contributors to create a site with technical content in Swedish that we geeks miss. This would be within areas such as FOSS, *nix, networking and more.

Tired of the poor state of technical and IT related media in Sweden that always seem to try to capture the really large audience and therefore always dumb down everything to a silly level, this is meant to be directed on more competent and interested readers.

The site is free and Claes is looking around for contributors to help hem get content to publish. I can only urge my Swedish friends to join up and help it get going, as I think it would be nice to get a proper Swedish tech site. For me, it will be especially interesting for things that actually happen in or otherwise is related to Sweden, as for all the rest I personally have no problems accessing English sites to get the info.

The cookie RFC 6265

http://www.rfc-editor.org/rfc/rfc6265.txt is out!

Back when I was a HTTP rookie in the late 90s, I once expected that there was this fine RFC document somewhere describing how to do HTTP cookies. I was wrong. A lot of others have missed that document too, both before and after my initial search.

I was wrong in the sense that sure there were RFCs for cookies. There were even two of them (RFC2109 and RFC2965)! The only sad thing was however that both of them were totally pointless as in effect nobody (servers nor clients) implemented cookies like that so they documented idealistic protocols that didn’t exist in the real world. This sad state has made people fall into cookie problems all the way into modern days when they’ve implemented services according to those RFCs and then blame their browser for failing.

cookie

It turned out that the only document that existed that were being used, was the original Netscape cookie document. It can’t even be called a specification because it is so short and is so lacking in details that it leaves large holes open and forces implementers to guess about the missing pieces. A sweet irony in itself is the fact that even Netscape removed the document from their site so the only place to find this document is at archive.org or copies like the one I link to above at the curl.haxx.se site. (For some further and more detailed reading about the history of cookies and a bunch of the flaws in the protocol/design, I recommend Michal Zalewski’s excellent blog post HTTP cookies, or how not to design protocols.)

While HTTP was increasing in popularity as a protocol during the 00s and still is, and more and more stuff get done in browsers and everything and everyone are using cookies, the protocol was still not documented anywhere as it was actually used.

Somewhat modeled after the httpbis working group (which is working on updating and bugfixing the HTTP 1.1 spec), IETF setup a mailing list named httpstate in the early 2009 to start discussing what problems there are with cookies and all related matters. After lively discussions throughout the year, the working group with the same name as the mailinglist was founded at December 11th 2009.

One of the initial sparks to get the httpstate group going came from Bill Corry who said this about the start:

In late 2008, Jim Manico and I connected to create a specification for
HTTPOnly — we saw the security issues arising from how the browser vendors
were implementing HTTPOnly in varying ways[1] due to a lack of a specification
and formed an ad-hoc working group to tackle the issue[2].
When I approached the IETF about forming a charter for an official working
group, I was told that I was <quote> “wasting my time” because cookies itself
did not have a proper specification, so it didn’t make sense to work on a spec
for HTTPOnly.  Soon after, we pursued reopening the IETF httpstate Working
Group to tackle the entire cookie spec, not just HTTPOnly.  Eventually Adam
Barth would become editor and Jeff Hodges our chair.

In late 2008, Jim Manico and I connected to create a specification for HTTPOnly — we saw the security issues arising from how the browser vendors were implementing HTTPOnly in varying ways[1] due to a lack of a specification and formed an ad-hoc working group to tackle the issue[2].

When I approached the IETF about forming a charter for an official working group, I was told that I was <quote> “wasting my time” because cookies itself did not have a proper specification, so it didn’t make sense to work on a spec for HTTPOnly.  Soon after, we pursued reopening the IETF httpstate Working Group to tackle the entire cookie spec, not just HTTPOnly. Eventually Adam Barth would become editor and Jeff Hodges our chair.

Since then Adam Barth has worked fiercely as author of the specification and lots of people have joined in and contributed their views, comments and experiences, and we have over time really nailed down how cookies work in the wild today. The current spec now actually describes how to send and receive cookies, the way it is done by existing browsers and clients. Of course, parts of this new spec say things I don’t think it should, like how it deals with the order of cookies in headers, but as everything in life we needed to compromise and I seemed to be rather lonely on my side of that “fence”.
I must stress that the work has only involved to document how things work today and not to invent or create anything new. We don’t fix any of the many known problems with cookies, but we describe how you write your protocol implementation if you want to interact fine with existing infrastructure.

The new spec explicitly obsoletes the older RFC2965, but doesn’t obsolete RFC2109. That was done already by RFC2965. (I updated this paragraph after my initial post.)

Oh, and yours truly is mentioned in the ending “acknowledgements” section. It’s actually the second RFC I get to be mentioned in, the first being RFC5854.

Future

I am convinced that I will get reason to get back to the cookie topic soon and describe what is being worked on for the future. Once the existing cookies have been documented, there’s a desire among people to design something that overcomes the problems with the existing protocol. Adam’s CAKE proposal being one of the attempts and ideas in the pipe.

Another parallel IETF effort is the http-auth mailing list in which lots of discussions around HTTP authentication is being held, and as they often today involve cookies there’s a lot of talk about them there as well. See for example Timothy D. Morgan’s document Weaning the Web off of Session Cookies.

I’ll certainly track the development. And possibly even participate in shaping how this will go. We’ll see.

(cookie image source)

HTTP transfer compression

HTTP is a protocol that looks simple in its simplest form and its readability can easily fool you into believing an implementation is straight forward and quickly done.

That’s not the reality though. HTTP is a very big protocol with lots of corners and twisting mazes that one can get lost in. Even after having been the primary author of curl for 13+ years, there are still lots of HTTP things I don’t master.

To name an example of an area with little known quirks, there’s a funny situation when it comes to how HTTP supports and doesn’t support compression of  data and compression of data in transfer.

No header compression

A little flaw in HTTP in regards to compression is that there’s no way to compress headers, in either direction. No matter what we do, we must send the text as-is and both requests and responses are sometimes very big these days. Especially taken into account how cookies are always inserted in requests if they match. Anyway, this flaw is nothing we can do anything about in HTTP 1.1 so we need to live with it.

On the other side, compression of the response body is supported.

Compressing data

Compression of data can be done in two ways: either the actual transfer is compressed or the body data is compressed. The difference is subtle, but when the body data is compressed there’s really nothing that mandates that the client has to uncompress it for the end user, and if the transfer is compressed the receiver must uncompress it in order to deal with the transfer properly.

For reasons that are unknown to me, HTTP clients and servers started out supporting compression only using the Content-Encoding style. It means that the client tells the server what kind of content encodings it supports (using Accept-Encoding:) and the server then sends the response data using one of the supported encodings. The client then decides on its own that if it gets the content in one of the compressed formats that it said it can handle, it will automatically uncompress that on arrival.

The HTTP protocol designers however intended this kind of automatic compression and subsequent uncompress to be done using Transfer-Encoding, as the end result is the completely transparent and the uncompress action is implied and intended by the protocol design. This is done by the client telling the server what transfer encodings it supports with the TE: header and the server adds a Transfer-Encoding: header in the response telling how the transfer is encoded.

HTTP 1.1 introduced a mandatory encoding that all servers can use whenever they feel like it: chunked encoding, so all HTTP 1.1 clients already have to deal with Transfer-Encoding to some degree.

Surely curl is better than all those other guys, right?

Not really. Not yet anyway.

curl has a long history of copying its behavior from what the browsers do, in order to allow users to basically script anything imaginable that is HTTP-like with curl. In this vein, we implemented compression support the same way as all the browsers did it: the content encoding style. (I have reason to believe that at least Opera actually supports or used to support compressed Transfer-Encoding.)

Starting now (code pushed to git repo just after the 7.21.5 release), we’ve taken steps to improve things. We’re changing gears and we’re introducing support for asking for and using compressed Transfer-Encoding. This will start out as an optional feature/flag (–tr-encoding / CURLOPT_TRANSFER_ENCODING) so that we can start out and see how servers in the wild behave and that we can deal with them properly. Then possibly we can switch the default in the future to always ask for compressed transfers. At least for the command line tool.

We know from the little tests we are aware of, that there are at least one known little problem or shall we call it a little detail to keep on eye at, with introducing compressed Transfer-Encoding. As has been so fine reported several years ago in the opera blog Browser sniffing gone wrong (again): Cars.com, there are cases where this may cause the server to send data that gets compressed twice (using both Content and Transfer Encoding) and that needs to be taken care of properly by the client.

At the time of  this writing, I’ve not yet taken care of the double-compress case in the code, but I intend to get on to it within shortly.

I’m otherwise very interested in hearing what kind of experience people will have from this. What servers and sites will support this as documented and intended?

Today I am Chinese

Google thinks I'm ChineseI’m going about my merry life and I use google every day.

Today Google decided I’m in China and redirects me to google.com.hk and it shows me all text in Chinese. It’s just another proof how silly it is trying to use the IP address to figure out location (or even worse trying to guess language based on IP address).

Click on the image to get it in its full glory.

I haven’t changed anything locally, but it seems Google has updated (broken) their database somehow.

Just to be perfectly sure my browser isn’t playing any tricks behind my back, I snooped up the headers sent in the HTTP request and there’s nothing notable:

GET /complete/search?output=firefox&client=firefox&hl=en-US&q=rockbox HTTP/1.1
Host: suggestqueries.google.com
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101209 Fedora/3.6.13-1.fc13 Firefox/3.6.13
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Cookie: PREF=ID=dc410 [truncated]

Luckily, I know about the URL “google.com/ncr” (No Country Redirect) so I can still use it, but not through my browser’s search box…

Cookies and Websockets and HTTP headers

So yesterday we held a little HTTP-related event in Stockholm, arranged by OWASP Sweden. We talked a bit about cookies, websockets and recent HTTP headers.

Below are all the slides from the presentations I, Martin Holst Swende and John Wilanders did. (The entire event was done in Swedish.)

Martin Holst Swende’s talk:

John Wilander’s slides from his talk are here:

WebSockets now: handshake and masking

In August 2010 I blogged about the WebSockets state at the time. In some aspects nothing has changed, and in some other aspects a lot has changed. There’s still no WebSockets specification that approaches consensus (remember the 4 weeks plan from July?).

Handshaking this or that way

We’ve been reading an endless debate through the last couple of months on how the handshake should be made and how to avoid that stupid intermediaries might get tricked by HTTP-looking websocket traffic. In the midst of that storm, a team of people posted the paper Transparent Proxies: Threat or Menace? which argued that HTTP+Upgrade would be insecure and that CONNECT should be used (Abarth’s early draft of the CONNECT handshake).

CONNECT to the server is not kosher HTTP and is not being appreciated by several people – CONNECT is meant to get sent to proxies and proxies are explicitly setup to a client.

The idea to use a separate and dedicated port is of course brought up every now and then but is mostly not considered. Most people seem to want this protocol to go over the “web” ports 80 and 443 and thus to be able to share the proxy environment used for HTTP.

Currently it seems as if we’re back to a HTTP+Upgrade handshake.

Masking the traffic

A lot of people also questioned the very binary outcome of the Transparet Proxies report mentioned above, and later on it seems the consensus that by “masking” WebSocket traffic it should be possible to avoid the risk that stupid intermediaries misinterpret the traffic as HTTP. The masking is currently being discussed to be XOR with a frame-specific key, so that a typical stream will change key multiple times but is still easy for a WebSocket-aware tool (say Wireshark and similar) to “demask” on purpose.

The last few weeks have been spent on discussing how the masking is done, if it is to become optional and if the masking should include the framing or not.

This is an open process

I’m not sure I’ve stressed this properly before: IETF is an open organization. Anyone can join in and share their views and opinions, but of course you need to argue technical merits.

s/Firefox/Chrome/g

Google Chrome BallA few weeks ago I decided to give Chrome a good ride on my main machine, a Debian Linux unstable. I use it a lot, every day, and I of course use my browser during a large portion of my time in front of it. I’m a long time Firefox fan and when I’ve heard and read other people converting I’ve always thought it’d be hard for me due to my heavy use of certain plugins, old habits and so on.

(Of course, in Debian lingo the browsers are actually called Chromium and Iceweasel, but I’ve decided to ignore that fact in this post.)

Here’s the story on how it went, what’s good with Chrome and what’s lacking in comparison to Firefox. As compared on my Linux box here.

Obvious benefints:

  • Less wasted window/screen estate. The tabs up in the window title is brilliant.
  • Faster. It’s generally faster in almost every aspect, but what’s most noticeable is when starting it.
  • Less memory hungy. At times I’ve found my Firefox installation to spend an annoying amount of my precious RAM (I have 4GB installed) and even though I would expect Chrome’s a process-per-tab concept to be more expensive memory wise, I’ve had less such problems with it.
  • The unified address/search bar, back to how Firefox once had it, is only sensible.
  • In my Firefox I’ve had two minor quirks for a while that have annoyed me: 1) when I start to search for something, I get a few seconds “freeze” immediately after I’ve started searching. Like I enter a few letters, waaaaaaait, then I can continue. This is certainly nothing life-threatening or something I can’t live through but it is annoying. 2) I occasionally get problems with flash video playbacks that the video pause or studder, most often a few seconds into it. Chrome has not given me these quirks.
  • Mailman! I administrate more than 20 mailing lists on the same host (cool.haxx.se) using mailman. Each list has iFirefox Ballts own URL and its own password. But Firefox just cannot remember them separately!!! These are pages I visit several times each day to ack or reject posts etc. Chrome remembers the passwords excellently for all the individual lists. This makes me a much happier person.

Problems I didn’t get:

  • The adblock version for Chrome is as good. I’m not sure exactly how well they compare but I haven’t noticed anything that’s given me reason to get annoyed.
  • The resizeable text edit areas in Chrome is excellent and removes the need for some of the fancier edit plugins in Firefox.

Things still nicer in Firefox:

  • I love the plugin to force unknown content-types to still be displayed by the browser. Far too many resources are still done using the wrong one and Chrome’s only option is to save it locally and then force me to run a local tool to display the file. Sure, it works fine but when I want to do that on many files it gets tedious.
  • In general Chrome, is a bit worse at handing content it doesn’t know about. I’ve managed to fiddle with my /etc/mozpluggerrc so that at least PDFs are now saved instead of saying “missing plug-in” but so far I’ve failed to get evince to display them directly. Even if it still is possible to make it happen, it is certainly a bit quirky to have to manually edit a text file to make it happen…

Conclusion

I’ll be running Chrome here now for a while!

Future transports

On Sunday morning during FSCONS 2010, in the room “Torg 4 South” I did a 30 minute talk about a few future, potentially coming network protocols for transport. A quick look at the current state, some problems of today and 4 different technologies that have been and are being developed to solve the problem.

I got a fair amount of questions and several persons approached me afterwards to make sure they got a copy of my slides.

The video recording is hopefully going to be made available later on, but until then you can read the slides below and imagine my Swedish  accent talking about these matters!

Future transports

You can also download the slides directly as a PDF.