Tag Archives: HTTP

HTTP2 Expression of Interest: curl

July 13, 2012 Daniel Stenberg

For the readers of my blog, this is a copy of what I posted to the httpbis mailing list on July 12th 2012.

Hi,

This is a response to the httpis call for expressions of interest

BACKGROUND

I am the project leader and maintainer of the curl project. We are the openÂ source project that makes libcurl, the transfer library and curl the commandÂ line tool. It is among many things a client-side implementation of HTTP andÂ HTTPS (and some dozen other application layer protocols). libcurl is veryÂ portable and there exist around 40 different bindings to libcurl for virtuallyÂ all languages/enviornments imaginable. We estimate we might have upwards 500
million users or so. We’re entirely voluntary driven without any paidÂ developers or particular company backing.

HTTP/1.1 problems I’d like to see adressed

Pipelining – I can see how something that better deals with increasingÂ bandwidths with stagnated RTT can improve the end users’ experience. It is notÂ easy to implement in a nice manner and provide in a library like ours.

Many connections – to avoid problems with pipelining and queueing on theÂ connections, many connections are used and and it seems like a general wasteÂ that can be improved.

HTTP/2.0

We’ve implemented HTTP/1.1 and we intend to continue to implement any and allÂ widely deployed transport layer protocols for data transfers that appear onÂ the Internet. This includes HTTP/2.0 and similar related protocols.

curl has not yet implemented SPDY support, but fully intends to do so. TheÂ current plan is to provide SPDY support with the help of spindly, aÂ separate SPDY library project that I lead.

We’ve selected to support SPDY due to the momentum it has and the multipleÂ existing implementaions that A) have multi-company backing and B) prove it toÂ be a truly working concept. SPDY seems to address HTTP’s pipelining and many-connections problems in a decent way that appears to work in reality too. IÂ believe SPDY keeps enough HTTP paradigms to be easily upgraded to for mostÂ parties, and yet the ones who can’t or won’t can remain with HTTP/1.1 withoutÂ too much pain. Also, while Spindly is not production-ready, it has still givenÂ me the sense that implementing a SPDY protocol engine is not rocket scienceÂ and that the existing protocol specs are good.

By relying on external libs for protocol and implementation details, my hopesÂ is that we should be able to add support for other potentially comingÂ HTTP/2.0-ish protocols that gets deployed and used in the wild. In the curlÂ project we’re unfortunately rarely able to be very pro-active due to theÂ nature of our contributors, which tends to make us follow the rest andÂ implement and go with what others have already decided to go with.

I’m not aware of any competitors to SPDY that is deployed or used to anyÂ particular and notable extent on the public internet so therefore no otherÂ “HTTP/2.0 protocol” has been considered by us. The two biggest protocolÂ details people will keep mention that speak against SPDY is SSL and theÂ compression requirements, yet I like both of them.Â I intend to continue to participate in dicussions and technical arguments onÂ the ietf-http-wg mailing list on HTTP details for as long as I have time andÂ energy.

HTTP AUTH

curl currently supports Basic, Digest, NTLM and Negotiate for both host andÂ proxy.

Similar to the HTTP protocol, we intend to support any widely adoptedÂ authentication protocols. The HOBA, SCRAM and Mutual auth suggestions all seemÂ perfectly doable and fine in my perspective.

However, if there’s no proper logout mechanism provided for HTTP auth I don’tÂ forsee any particular desire from browser vendor or web site creators to useÂ any of these just like they don’t use the older ones either to any significantÂ extent. And for automatic (non-browser) uses only, I’m not sure there’sÂ motivation enough to add new auth protocols to HTTP as at least historicallyÂ we seem to rarely be able to pull anything through that isn’t pushed for by atÂ least one of the major browsers.

The “updated HTTP auth” work should be kept outside of the HTTP/2.0 work asÂ far as possible and similar to how RFC2617 is separate from RFC2616 it shouldÂ be this time around too. The auth mechnism should not be too tightly knit toÂ the HTTP protocol.

The day we can clense connection-oriented authentications like NTLM from theÂ HTTP world will be a happy day, as it’s state awareness is a pain to deal withÂ in a generic HTTP library and code.

cURL and libcurl, Network

shorter HTTP requests for curl

May 12, 2012 Daniel Stenberg

Starting in curl 7.26.0 (due to be released at the end of May 2012), we will shrink the User-agent: header that curl sends by default in HTTP(S) requests to something much shorter! I suspect that this will raise some eyebrows out there so even though I’ve emailed about it to the curl-users list before I thought I’d better write it up and elaborate.

A default ‘curl localhost’ on Debian Linux makes 170 bytes get sent in that single request:

GET / HTTP/1.1
User-Agent: curl/7.24.0 (i486-pc-linux-gnu) libcurl/7.24.0 OpenSSL/1.0.0g zlib/1.2.6 libidn/1.23 libssh2/1.2.8 librtmp/2.3
Host: localhost
Accept: */*

As you can see, the user-agent description takes up a large portion of that request, and this for really no good reason at all. Without sacrificing any functionality I shrunk the same request down to 71 bytes:

GET / HTTP/1.1
User-Agent: curl/7.24.0
Host: localhost
Accept: */*

That means we shrunk it down to 41% of the original size. I’ll admit the example is a bit extreme and most other normal use cases will use longer host names and longer paths, but even for a URL like “https://daniel.haxx.se/docs/curl-vs-wget.html” we’re down to 50% of the original request size (100 vs 199).

Can we shrink it even more? Sure, we could leave out the version number too. I left it in there now only to allow some kind of statistics to get extracted. We can’t remove the entire header, we need to include a user-agent in requests since there are too many servers who won’t function properly otherwise.

And before anyone asks: this change is only for the curl command line tool and not for libcurl, the library. libcurl does in fact not send any user-agent at all by default…

Haxx, Linux

Travel for fun or profit

March 15, 2012 Daniel Stenberg 5 Comments

As a protocol geek I love working in my open source projects curl, libssh2, c-ares and spindly. I also participate in a few related IETF working groups around these protocols, and perhaps primarily I enjoy the HTTPbis crowd.

Meanwhile, I’m a consultant during the day and most of my projects and assignments involve embedded systems and primarily embedded Linux. The protocol part of my life tends to be left to getÂ practicedÂ during my “copious” amount of spare time – you know that time after your work, after you’ve spent time with your family and played with your kids and done the things you need to do at home to keep the household in a decent shape. That time when the rest of the family has gone to bed and you should too but if you did when would you ever get time to do that fun things you really want to do?

IETF has these great gatherings every now and then and they’re awesome places to just drown in protocol mumbo jumbo for several days. They’re being hosted by various cities all over the world so often I deem them too far away or too awkward to go to, also a lot because I rarely have any direct monetary gain or compensation for going but rather I’d have to do it as a vacation and pay for it myself.

IETF 83 is going to be held in Paris during March 25-30 and it is close enough for me to want to go and HTTPbis and a few other interesting work groups are having scheduled meetings. I really considered going, at least to meet up with HTTP friends.

Something very rare instead happened that prevents me from going there! My customer (for whom I work full-time since about six months and shall remain nameless for now) asked me to join their team and go visit the large embedded conference ESC in San Jose, California in the exact same week! It really wasn’ t a hard choice for me, since this is my job and being asked to do something because I’m wanted is a nice feeling and position – and they’re paying me to go there. It will also be my first time in California even though I guess I won’t get time to actually see much of it.

I hope to write a follow-up post later on about what I’m currently working with, once it has gone public.

cURL and libcurl

Top-3 curl bugs in 2011

December 31, 2011 Daniel Stenberg

This is a continuation of my little top-3 things in curl during 2011 which started with the top-3 changes 2011.

The changelog on the curl site lists 150 bugs fixed in the seven released of the year. The most import fixes in my view were…

Bug-fix 1: handle HTTP redirects to //hostname/path

Following redirects is one of the fundamentals of HTTP user agents and one of the primary things people use curl and libcurl for is to mimic browser to do automatic stuff on the web. Therefore it was even more embarrassing to realize that libcurl didn’t properly support the relative redirect when the Location: header doesn’t include the protocol but the host name. It basically means that the protocol shall remain (in reality that means HTTP or HTTPS) but it should move over to the new host and path. All browsers support this since ages ago. Since November 15th 2011, libcurl does too!

Bug-fix 2: inappropriate GSSAPI delegation

We had one security vulnerability announced in 2011 and this was it. I won’t try to blame someone else for this mistake, but there are some corners of curl and libcurl I’m not personally very familiar with and I would say the GSS stuff is one of those. In fact, even the actual GSS and GSSAPI technologies areÂ mercyÂ areas as far as my knowledge reaches so I was not at all aware of this feature or that we even made us of it… Of course it also turns out that there’s a certain amount of existing applications that need it so we now have that ability in the library again if enabled by an option.

Bug-fix 3: multi interface, connect fail continue to next IP

One of those silly bugs nobody would expect us to have at this point. It turned out the code for the multi interface didn’t properly move on to try the next IP in case a connect() failed and the host name had resolved to a number of addresses to try. A long term goal of mine is to remodel the internals of libcurl to always use the multi interface code and I would just wrap that interfact with some glue logic to offer the easy interface. For that to work (and for lots of other reasons of course), the multi interface simply must work for all of these things.

Additionally, this is another of those things that are hard to test for in the test suite as it would involve trickery on IP or TCP level and that’s not easy to accomplish in a portable manner.

cURL and libcurl, Network

libspdy

October 18, 2011 Daniel Stenberg 8 Comments

SPDY is a neat new protocol and possible contender to replace HTTP – at least in some areas and for some use cases. SPDY has been invented and developed mostly by Google engineers.

SPDY allows better usage of fewer TCP connections (since it sends multiple logical streams over a single physical TCP connection) and it helps clients overcome problems with TCP (like how a new connection starts slowly) while at the same time reducing latency and bandwidth requirements. Very similar to how channels are handled over an SSH connection.

Chrome of course already supports SPDY and Firefox has some early experimental support being worked on.

Of course there are also legitimate criticisms against SPDY as well, including subjects like how it makes caching proxies impossible (because everything goes over SSL), how it makes debugging a lot harder by using compressed headers, how it is impossible to extract just a single header from the stream due to its compression approach and how the compression state buffers make each individual stream use more memory than plain old HTTP (plain TCP) ones.

We can expect SPDY<=>HTTP gateways to appear so that nobody gets locked into either side of these protocols.

SPDY will provide faster transfers. libcurl is currently used for speed reasons in many cases. To me, it makes perfect sense to have libcurl use and try to use SPDY instead of HTTP exactly like how the browsers are starting to do it, so that the libcurl using applications will get their contents transferred faster.

My thinking is that we introduce some new magic option(s) that makes libcurl use SPDY, and for normal easy interface transfers it will remain to use a single connection for each new SPDY transfer, but if you use the multi interface and you enable pipelining you’ll instead make libcurl do multiple transfers over the same single SPDY connection (as long as you speak with the same server and port etc). From an application’s stand-point it shouldn’t make any difference, apart from being faster than otherwise. Just like we want it!

Implementation wise, I would like to use a reliable and efficient third-party library for the actual SPDY implementation. If there doesn’t exist any, we make one and run that one independently. I found libspdy, but I found some concerns about it (noÂ mailing list, looks like one-man project, not C89 compliant, no API docs etc). I mailed the libspdy author, I hoping we’d sort out my doubts and then I’d base my continued work on that library.

After some time Thomas Roth, primary libspdy author, responded and during our subsequent email exchange I’ve gotten a restored faith and belief in this library and its direction. Not only did he fix the C89 compliance pretty quickly, he is also promising rather big changes that are pending to get committed within a week or so.

Comforted by what I’ve learned from Thomas, I’ll wait for his upcoming changes and I’ll join the soon to be created mailing list for the libspdy project and I’ll contribute some ideas and efforts to help shape it into the fine SPDY library we all want. I can only encourage other fellow SPDY library interested persons to do the same!

Updated: Join the SPDY library development

Network, Security

What SOCKS is good for

August 12, 2011 Daniel Stenberg 4 Comments

You ever wondered what SOCKS is good for these days?

To help us use the Internet better without having the surrounding be able to watch us as much as otherwise!

There’s basically two good scenarios and use areas for us ordinary people to use SOCKS:

You’re a consultant or you’re doing some kind of work and you are physically connected to a customer’s or a friend’s network. You access the big bad Internet via their proxy or entirely proxy-less using their equipment and cables. This allows the network admin(s) to capture and snoop on your network traffic, be it on purpose or by mistake, as long as you don’t use HTTPS or other secure mechanisms. When surfing the web, it is very easily made to drop out of HTTPS and into HTTP by mistake. Also, even if you HTTPS to the world, the name resolves and more are still done unencrypted and will leak information.
You’re using an open wifi network that isn’t using a secure encryption. Anyone else on that same area can basically capture anything you send and receive.

What you need to set it up? You run

ssh -D 8080 myname@myserver.example.com

… and once you’ve connected, you make sure that you change the network settings of your favourite programs (browsers, IRC clients, mail reader, etc) to reach the Internet using the SOCKS proxy on localhost port 8080. Now you’re done.

Now all your traffic will reach the Internet via your remote server and all traffic between that and your local machine is sent encrypted and secure. This of course requires that you have a server running OpenSSH somewhere, but don’t we all?

If you are behind another proxy in the first place, it gets a little more complicated but still perfectly doable. See my separate SSH through or over proxy document for details.

cURL and libcurl, Technology

US patent 6,098,180

May 11, 2011 Daniel Stenberg 5 Comments

(I am not a lawyer, this is not legal advice and these are not legal analyses, just my personal observations and ramblings. Please correct me where I’m wrong or add info if you have any!)

At 3:45 pm on March 18th 2011, the company Content Delivery Solutions LLC filed a complaint in a court in Texas, USA. The defendants are several bigwigs and the list includes several big and known names of the Internet:

Akamai
AOL
AT&T
CD Networks
Globalscape
Google
Limelight Networks
Peer 1 Network
Research In Motion
Savvis
Verizon
Yahoo!

The complaint was later amended with an additional patent (filed on April 18th), making it list three patents that these companies are claimed to violate (I can’t find the amended version online though). Two of the patents ( 6,393,471 and 6,058,418) are for marketing data and how to use client info to present ads basically. The third is about file transfer resumes.

I was contacted by a person involved in the case at one of the defendants’. This unspecified company makes one or more products that use “curl“. I don’t actually know if they use the command line tool or the library – but I figure that’s not too important here. curl gets all its superpowers from libcurl anyway.

This Patent Troll thus basically claims that curl violates a patent on resumed file transfers!

The patent in question that would be one that curl would violate is the US patent 6,098,180 which basically claims to protect this idea:

A system is provided for the safe transfer of large data files over an unreliable network link in which the connection can be interrupted for a long period of time.

The patent describes several ways in how it may detect how it should continue the transfer from such a break. As curl only does transfer resumes based on file name and an offset, as told by the user/application, that could be the only method that they can say curl would violate of their patent.

The patent goes into detail in how a client first sends a “signature” and after an interruption when the file transfer is about to continue, the client would ask the server about details of what to send in the continuation. With a very vivid imagination, that could possibly equal the response to a FTP SIZE command or the Content-Length: response in a HTTP GET or HEAD request.

A more normal reader would rather say that no modern file transfer protocol works as described in that patent and we should go with “defendant is not infringing, move on nothing to see here”.

But for the sake of the argument, let’s pretend that the patent actually describes a method of file transfer resuming that curl uses.

The ‘180 (it is referred to with that name within the court documents) patent was filed at February 18th 1997 (and issued on August 1, 2000). Apparently we need to find prior art that was around no later than February 17th 1996, that is to say one year before the filing of the stupid thing. (This I’ve been told, I had no idea it could work like this and it seems shockingly weird to me.)

What existing tools and protocols did resumed transfers in February 1996 based on a file name and a file offset?

Lots!

Thank you all friends for the pointers and references you’ve brought to me.

The FTP spec RFC 959 was published in October 1985. FTP has a REST command that tells at what offset to “restart” the transfer at. This was being used by FTP clients long before 1996, and an example is the known Kermit FTP client that did offset-based file resumed transfer in 1995.
The HTTP header Range: introduces this kind of offset-based resumed transfer, although with a slightly fancier twist. The Range: header was discussed before the magic date, as also can be seen on the internet already in this old mailing list post from December 1995.
One of the protocols from the old days that those of us who used modems and BBSes in the old days remember is zmodem. Zmodem was developed in 1986 and there’s this zmodem spec from 1988 describing how to do file transfer resumes.
A slightly more modern protocol that I’ve unfortunately found no history for before our cut-off date is rsync, as I could only find the release mail for rsync 1.0 from June 1996. Still long before the patent was filed obviously, and also clearly showing that the one year margin is silly as for all we know they could’ve come up with the patent idea after reading the rsync releases notes and still rsync can’t be counted as prior art.
Someone suggested GetRight as a client doing this, but GetRight wasn’t released in 1.0 until Febrary 1997 so unfortunately that didn’t help our case even if it seems to have done it at the time.
curl itself does not pre-date the patent filing. curl was first released in March 1998, and the predecessor was started around summer-time 1997. I don’t have any remaining proofs of that, and it still wasn’t before “the date” so I don’t think it matters much now.

At the time of this writing I don’t know where this will end up or what’s going to happen. Time will tell.

This Software patent obviously is a concern mostly to US-based companies and those selling products in the US. I am neither a US citizen nor do I have or run any companies based in the US. However, since curl and libcurl are widely used products that are being used by several hundred companies already, I want to help bring out as much light as possible onto this problem.

The patent itself is of course utterly stupid and silly and it should never have been accepted as it describes trivially thought out ideas and concepts that have been thought of and implemented already decades before this patent was filed or granted although I claim that the exact way explained in the patent is not frequently used. Possibly the protocol using a method that is closed to the description of the patent is zmodem.

I guess I don’t have to mention what I think about software patents.

I’m convinced that most or all download tools and browsers these days know how to resume a previously interrupted transfer this way. Why wouldn’t these guys also approach one of the big guys (with thick wallets) who also use this procedure? Surely we can think of a few additional major players with file tools that can resume file transfers and who weren’t targeted in this suit!

I don’t know why. Clearly they’ve not backed down from attacking some of the biggest tech and software companies.

patent drawing

(Illustration from the ‘180 patent.)

Electronics, Network, Web

The cookie RFC 6265

April 28, 2011 Daniel Stenberg 4 Comments

http://www.rfc-editor.org/rfc/rfc6265.txt is out!

Back when I was a HTTP rookie in the late 90s, I once expected that there was this fine RFC document somewhere describing how to do HTTP cookies. I was wrong. A lot of others have missed that document too, both before and after my initial search.

I was wrong in the sense that sure there were RFCs for cookies. There were even two of them (RFC2109 and RFC2965)! The only sad thing was however that both of them were totally pointless as in effect nobody (servers nor clients) implemented cookies like that so they documented idealistic protocols that didn’t exist in the real world. This sad state has made people fall into cookie problems all the way into modern days when they’ve implemented services according to those RFCs and then blame their browser for failing.

It turned out that the only document that existed that were being used, was the original Netscape cookie document. It can’t even be called a specification because it is so short and is so lacking in details that it leaves large holes open and forces implementers to guess about the missing pieces. A sweet irony in itself is the fact that even Netscape removed the document from their site so the only place to find this document is at archive.org or copies like the one I link to above at the curl.haxx.se site. (For some further and more detailed reading about the history of cookies and a bunch of the flaws in the protocol/design, I recommend Michal Zalewski’s excellent blog post HTTP cookies, or how not to design protocols.)

While HTTP was increasing in popularity as a protocol during the 00s and still is, and more and more stuff get done in browsers and everything and everyone are using cookies, the protocol was still not documented anywhere as it was actually used.

Somewhat modeled after the httpbis working group (which is working on updating and bugfixing the HTTP 1.1 spec), IETF setup a mailing list named httpstate in the early 2009 to start discussing what problems there are with cookies and all related matters. After lively discussions throughout the year, the working group with the same name as the mailinglist was founded at December 11th 2009.

One of the initial sparks to get the httpstate group going came from Bill Corry who said this about the start:

In late 2008, Jim Manico and I connected to create a specification for

HTTPOnly — we saw the security issues arising from how the browser vendors

were implementing HTTPOnly in varying ways[1] due to a lack of a specification

and formed an ad-hoc working group to tackle the issue[2].

When I approached the IETF about forming a charter for an official working

group, I was told that I was <quote> “wasting my time” because cookies itself

did not have a proper specification, so it didn’t make sense to work on a spec

for HTTPOnly. Â Soon after, we pursued reopening the IETF httpstate Working

Group to tackle the entire cookie spec, not just HTTPOnly. Â Eventually Adam

Barth would become editor and Jeff Hodges our chair.

In late 2008, Jim Manico and I connected to create a specification for HTTPOnly — we saw the security issues arising from how the browser vendors were implementing HTTPOnly in varying ways[1] due to a lack of a specification and formed an ad-hoc working group to tackle the issue[2].

When I approached the IETF about forming a charter for an official working group, I was told that I was <quote> “wasting my time” because cookies itself did not have a proper specification, so it didn’t make sense to work on a spec for HTTPOnly. Soon after, we pursued reopening the IETF httpstate Working Group to tackle the entire cookie spec, not just HTTPOnly. Eventually Adam Barth would become editor and Jeff Hodges our chair.

Since then Adam Barth has worked fiercely as author of the specification and lots of people have joined in and contributed their views, comments and experiences, and we have over time really nailed down how cookies work in the wild today. The current spec now actually describes how to send and receive cookies, the way it is done by existing browsers and clients. Of course, parts of this new spec say things I don’t think it should, like how it deals with the order of cookies in headers, but as everything in life we needed to compromise and I seemed to be rather lonely on my side of that “fence”.

I must stress that the work has only involved to document how things work today and not to invent or create anything new. We don’t fix any of the many known problems with cookies, but we describe how you write your protocol implementation if you want to interact fine with existing infrastructure.

The new spec explicitly obsoletes the older RFC2965, but doesn’t obsolete RFC2109. That was done already by RFC2965. (I updated this paragraph after my initial post.)

Oh, and yours truly is mentioned in the ending “acknowledgements” section. It’s actually the second RFC I get to be mentioned in, the first being RFC5854.

Future

I am convinced that I will get reason to get back to the cookie topic soon and describe what is being worked on for the future. Once the existing cookies have been documented, there’s a desire among people to design something that overcomes the problems with the existing protocol. Adam’s CAKE proposal being one of the attempts and ideas in the pipe.

Another parallel IETF effort is the http-auth mailing list in which lots of discussions around HTTP authentication is being held, and as they often today involve cookies there’s a lot of talk about them there as well. See for example Timothy D. Morgan’s document Weaning the Web off of Session Cookies.

I’ll certainly track the development. And possibly even participate in shaping how this will go. We’ll see.

(cookie image source)

cURL and libcurl, Network, Web

HTTP transfer compression

April 18, 2011 Daniel Stenberg

HTTP is a protocol that looks simple in its simplest form and its readability can easily fool you into believing an implementation is straight forward and quickly done.

That’s not the reality though. HTTP is a very big protocol with lots of corners and twisting mazes that one can get lost in. Even after having been the primary author of curl for 13+ years, there are still lots of HTTP things I don’t master.

To name an example of an area with little known quirks, there’s a funny situation when it comes to how HTTP supports and doesn’t support compression of data and compression of data in transfer.

No header compression

A little flaw in HTTP in regards to compression is that there’s no way to compress headers, in either direction. No matter what we do, we must send the text as-is and both requests and responses are sometimes very big these days. Especially taken into account how cookies are always inserted in requests if they match. Anyway, this flaw is nothing we can do anything about in HTTP 1.1 so we need to live with it.

On the other side, compression of the response body is supported.

Compressing data

Compression of data can be done in two ways: either the actual transfer is compressed or the body data is compressed. The difference is subtle, but when the body data is compressed there’s really nothing that mandates that the client has to uncompress it for the end user, and if the transfer is compressed the receiver must uncompress it in order to deal with the transfer properly.

For reasons that are unknown to me, HTTP clients and servers started out supporting compression only using the Content-Encoding style. It means that the client tells the server what kind of content encodings it supports (using Accept-Encoding:) and the server then sends the response data using one of the supported encodings. The client then decides on its own that if it gets the content in one of the compressed formats that it said it can handle, it will automatically uncompress that on arrival.

The HTTP protocol designers however intended this kind of automatic compression and subsequent uncompress to be done using Transfer-Encoding, as the end result is the completely transparent and the uncompress action is implied and intended by the protocol design. This is done by the client telling the server what transfer encodings it supports with the TE: header and the server adds a Transfer-Encoding: header in the response telling how the transfer is encoded.

HTTP 1.1 introduced a mandatory encoding that all servers can use whenever they feel like it: chunked encoding, so all HTTP 1.1 clients already have to deal with Transfer-Encoding to some degree.

Surely curl is better than all those other guys, right?

Not really. Not yet anyway.

curl has a long history of copying its behavior from what the browsers do, in order to allow users to basically script anything imaginable that is HTTP-like with curl. In this vein, we implemented compression support the same way as all the browsers did it: the content encoding style. (I have reason to believe that at least Opera actually supports or used to support compressed Transfer-Encoding.)

Starting now (code pushed to git repo just after the 7.21.5 release), we’ve taken steps to improve things. We’re changing gears and we’re introducing support for asking for and using compressed Transfer-Encoding. This will start out as an optional feature/flag (–tr-encoding / CURLOPT_TRANSFER_ENCODING) so that we can start out and see how servers in the wild behave and that we can deal with them properly. Then possibly we can switch the default in the future to always ask for compressed transfers. At least for the command line tool.

We know from the little tests we are aware of, that there are at least one known little problem or shall we call it a little detail to keep on eye at, with introducing compressed Transfer-Encoding. As has been so fine reported several years ago in the opera blog Browser sniffing gone wrong (again): Cars.com, there are cases where this may cause the server to send data that gets compressed twice (using both Content and Transfer Encoding) and that needs to be taken care of properly by the client.

At the time of this writing, I’ve not yet taken care of the double-compress case in the code, but I intend to get on to it within shortly.

I’m otherwise very interested in hearing what kind of experience people will have from this. What servers and sites will support this as documented and intended?

Web

Today I am Chinese

February 18, 2011 Daniel Stenberg 5 Comments

I’m going about my merry life and I use google every day.

Today Google decided I’m in China and redirects me to google.com.hk and it shows me all text in Chinese. It’s just another proof how silly it is trying to use the IP address to figure out location (or even worse trying to guess language based on IP address).

Click on the image to get it in its full glory.

I haven’t changed anything locally, but it seems Google has updated (broken) their database somehow.

Just to be perfectly sure my browser isn’t playing any tricks behind my back, I snooped up the headers sent in the HTTP request and there’s nothing notable:

GET /complete/search?output=firefox&client=firefox&hl=en-US&q=rockbox HTTP/1.1
Host: suggestqueries.google.com
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101209 Fedora/3.6.13-1.fc13 Firefox/3.6.13
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Cookie: PREF=ID=dc410 [truncated]

Luckily, I know about the URL “google.com/ncr” (No Country Redirect) so I can still use it, but not through my browser’s search box…