Category Archives: Network

Internet. Networking.

curl 7.49.0 goodies coming

Here’s a closer look at three new features that we’re shipping in curl and libcurl 7.49.0, to be released on May 18th 2016.

connect to this instead

If you’re one of the users who thought --resolve and doing Host: header tricks with --header weren’t good enough, you’ll appreciate that we’re adding yet another option for you to fiddle with the connection procedure. Another “Swiss army knife style” option for you who know what you’re doing.

With --connect-to you basically provide an internal alias for a certain name + port to instead internally use another name + port to connect to.

Instead of connecting to HOST1:PORT1, connect to HOST2:PORT2

It is very similar to --resolve which is a way to say: when connecting to HOST1:PORT1 use this ADDR2:PORT2. --resolve effectively prepopulates the internal DNS cache and makes curl completely avoid the DNS lookup and instead feeds it with the IP address you’d like it to use.

--connect-to doesn’t avoid the DNS lookup, but it will make sure that a different host name and destination port pair is used than what was found in the URL. A typical use case for this would be to make sure that your curl request asks a specific server out of several in a pool of many, where each has a unique name but you normally reach them with a single URL who’s host name is otherwise load balanced.

--connect-to can be specified multiple times to add mappings for multiple names, so that even following HTTP redirects to other host names etc can be handled. You don’t even necessarily have to redirect the first used host name.

The libcurl option name for for this feature is CURLOPT_CONNECT_TO.

Michael Kaufmann brought this feature.

http2 prior knowledge

In our ongoing quest to provide more and better HTTP/2 support in a world that is slowly but steadily doing more and more transfers over the new version of the protocol, curl now offers --http2-prior-knowledge.

As the name might hint, this is a way to tell curl that you have “prior knowledge” that the URL you specifies goes to a host that you know supports HTTP/2. The term prior knowledge is in fact used in the HTTP/2 spec (RFC 7540) for this scenario.

Normally when given a HTTP:// or a HTTPS:// URL, there will be no assumption that it supports HTTP/2 but curl when then try to upgrade that from version HTTP/1. The command line tool tries to upgrade all HTTPS:// URLs by default even, and libcurl can be told to do so.

libcurl wise, you ask for a prior knowledge use by setting CURLOPT_HTTP_VERSION to CURL_HTTP_VERSION_2_PRIOR_KNOWLEDGE.

Asking for http2 prior knowledge when the server does in fact not support HTTP/2 will give you an error back.

Diego Bes brought this feature.

TCP Fast Open

TCP Fast Open is documented in RFC 7413 and is basically a way to pass on data to the remote machine earlier in the TCP handshake – already in the SYN and SYN-ACK packets. This of course as a means to get data over faster and reduce latency.

The --tcp-fastopen option is supported on Linux and OS X only for now.

This is an idea and technique that has been around for a while and it is slowly getting implemented and supported by servers. There have been some reports of problems in the wild when “middle boxes” that fiddle with TCP traffic see these packets, that sometimes result in breakage. So this option is opt-in to avoid the risk that it causes problems to users.

A typical real-world case where you would use this option is when  sending an HTTP POST to a site you don’t have a connection already established to. Just note that TFO relies on the client having had contact established with the server before and having a special TFO “cookie” stored and non-expired.

TCP Fast Open is so far only used for clear-text TCP protocols in curl. These days more and more protocols switch over to their TLS counterparts (and there’s room for future improvements to add the initial TLS handshake parts with TFO). A related option to speed up TLS handshakes is --false-start (supported with the NSS or the secure transport backends).

With libcurl, you enable TCP Fast Open with CURLOPT_TCP_FASTOPEN.

Alessandro Ghedini brought this feature.

fcurl is fread and friends for URLs

This whole family of functions, fopen, fread, fwrite, fgets, fclose and more are defined in the C standard since C89. You can’t really call yourself a C programmer without knowing them and probably even using them in at least a few places.

The charm with these is that they’re standard, they’re easy to use and they’re available everywhere where there’s a C compiler.

A basic example that just reads a file from disk and writes it to stdout could look like this:

FILE *file;

file = fopen("hello.txt", "r");
if(file) {
  char buffer [256];
  while(1) {
    size_t rc = fread(buffer, sizeof(buffer),
                1, file);
    if(rc > 0)
      fwrite(buffer, rc, 1, stdout);
    else
      break;
  }
  fclose(file);
}

Imagine you’d like to switch this example, or one of your actual real world programs that use the fopen() family of functions to read or write files, and instead read and write files from and to the Internet instead using your favorite Internet protocols. How would you do that without having to change your code a lot and do a major refactoring job?

Enter fcurl

I’ve started to work on a library that provides a look-alike API with matching functions and behaviors, but that allows fopen() to instead specify a URL instead of a file name. I call it fcurl. (Much inspired by the libcurl example fopen.c, which I wrote the first version of already back in 2002!)

It is of course open source and is powered by libcurl.

The project is in its early infancy. I think it would be interesting to try it out and I’ve mentioned the idea to a few people that have shown interest. I really can’t make this happen all on my own anyway so while I’ve created a first embryo, it will take some time before it gets truly useful. Help from others would be greatly appreciated of course.

Using this API, a version of the above example that instead reads data from a HTTPS site instead of a local file could look like:

FCURL *file;

file = fcurl_open("https://daniel.haxx.se/",
                  "r");
if(file) {
  char buffer [256];
  while(1) {
    size_t rc = fcurl_read(buffer,         
                           sizeof(buffer), 1, 
                           file);
    if(rc > 0)
      fwrite(buffer, rc, 1, stdout);
    else
      break;
  }
  fcurl_close(file);
}

And it could even actually also read a local file using the file:// sheme.

Drop-in replacement

The idea here is to make the alternative functions have new names but as far as possible accept the same input arguments, return the same return codes and so on.

If we do it right, you could possibly even convert an existing program with just a set of #defines at the top without even having to change the code!

Something like this:

#define FILE FCURL
#define fopen(x,y) fcurl_open(x, y)
#define fclose(x) fcurl_close(x)

I think it is worth considering a way to provide an official macro set like that for those who’d like to switch easily (?) and quickly.

Fun things to consider

1. for non-scheme input, use normal fopen?

An interesting take is probably to make fcurl_open() treat input specified without a “scheme://” to be a local file, and then passed to fopen() instead under the hood. That would then enable even more code to switch to fcurl since all the existing use cases with local file names would just continue to work.

2. LD_PRELOAD

An interesting area of deeper research around this could be to provide a way to LD_PRELOAD replacements for the functions so that not even any source code would need be changed and already built existing binaries could be given this functionality.

3. fopencookie

There’s also the GNU libc’s fopencookie concept to figure out if that is something for fcurl to support/use. BSD and OS X have something similar called funopen.

4. merge in official libcurl

If this turns out useful, appreciated and good. We could consider moving the API in under the curl project’s umbrella and possibly eventually even making it part of the actual libcurl. But hey, we’re far away from that and I’m not saying that is even the best idea…

Your input is valuable

Please file issues or pull-requests. Let’s see where we can take this!

HTTP/2 in April 2016

On April 12 I had the pleasure of doing another talk in the Google Tech Talk series arranged in the Google Stockholm offices. I had given it the title “HTTP/2 is upon us, and here’s what you need to know about it.” in the invitation.

The room seated 70 persons but we had the amazing amount of over 300 people in the waiting line who unfortunately didn’t manage to get a seat. To those, and to anyone else who cares, here’s the video recording of the event.

If you’ve seen me talk about HTTP/2 before, you might notice that I’ve refreshed the material somewhat since before.

Summers are for HTTP

stockholm castle and ship
Stockholm City, as photographed by Michael Caven

In July 2015, 40-something HTTP implementers and experts of the world gathered in the city of Münster, Germany, to discuss nitty gritty details about the HTTP protocol during four intense days. Representatives for major browsers, other well used HTTP tools and the most popular HTTP servers were present. We discussed topics like how HTTP/2 had done so far, what we thought we should fix going forward and even some early blue sky talk about what people could potentially see being subjects to address in a future HTTP/3 protocol.

You can relive the 2015 version somewhat from my daily blog entries from then that include a bunch of details of what we discussed: day one, two, three and four.

http workshopThe HTTP Workshop was much appreciated by the attendees and it is now about to be repeated. In the summer of 2016, the HTTP Workshop is again taking place in Europe, but this time as a three-day event slightly further up north: in the capital of Sweden and my home town: Stockholm. During 25-27 July 2016, we intend to again dig in deep.

If you feel this is something for you, then please head over to the workshop site and submit your proposal and show your willingness to attend. This year, I’m also joining the Program Committee and I’ve signed up for arranging some of the local stuff required for this to work out logistically.

The HTTP Workshop 2015 was one of my favorite events of last year. I’m now eagerly looking forward to this year’s version. It’ll be great to meet you here!

Stockholm
The city of Stockholm in summer sunshine

HTTP redirects

I find that many web minded people working client-side or even server-side have neglected to learn the subtle details of the redirects of today. Here’s my attempt at writing another text about it that the ones who should read it still won’t.

Nothing here, go there!

The “redirect” is a fundamental part of the HTTP protocol. The concept was present and is documented already in the first spec (RFC 1945), published in 1996, and it has remained well used ever since.

A redirect is exactly what it sounds like. It is the sredirect-signerver sending back an instruction to the client – instead of giving back the contents the client wanted. The server basically says “go look over [here] instead for that thing you asked for“.

But not all redirects are alike. How permanent is the redirect? What request method should the client use in the next request?

All redirects also need to send back a Location: header with the new URI to ask for, which can be absolute or relative.

Permanent or Temporary

Is the redirect meant to last or just remain for now? If you want a GET to resource A permanently redirect users to resource B with another GET, send back a 301. It also means that the user-agent (browser) is meant to cache this and keep going to the new URI from now on when the original URI is requested.

The temporary alternative is 302. Right now the server wants the client to send a GET request to B, but it shouldn’t cache this but keep trying the original URI when directed to it.

Note that both 301 and 302 will make browsers do a GET in the next request, which possibly means changing method if it started with a POST (and only if POST). This changing of the HTTP method to GET for 301 and 302 responses is said to be “for historical reasons”, but that’s still what browsers do so most of the public web will behave this way.

In practice, the 303 code is very similar to 302. It will not be cached and it will make the client issue a GET in the next request. The differences between a 302 and 303 are subtle, but 303 seems to be more designed for an “indirect response” to the original request rather than just a redirect.

These three codes were the only redirect codes in the HTTP/1.0 spec.

GET or POST?

All three of these response codes, 301 and 302/303, will assume that the client sends a GET to get the new URI, even if the client might’ve sent a POST in the first request. This is very important, at least if you do something that doesn’t use GET.

If the server instead wants to redirect the client to a new URI and wants it to send the same method in the second request as it did in the first, like if it first sent POST it’d like it to send POST again in the next request, the server would use different response codes.

To tell the client “the URI you sent a POST to, is permanently redirected to B where you should instead send your POST now and in the future”, the server responds with a 308. And to complicate matters, the 308 code is only recently defined (the spec was published in June 2014) so older clients may not treat it correctly! If so, then the only response code left for you is…

The (older) response code to tell a client to send a POST also in the next request but temporarily is 307. This redirect will not be cached by the client though so it’ll again post to A if requested to. The 307 code was introduced in HTTP/1.1.

Oh, and redirects work the exact same way in HTTP/2 as they do in HTTP/1.1.

The helpful table version

Permanent Temporary
Switch to GET 301 302 and 303
Keep original method 308 307

It’s a gap!

Yes. The 304, 305, and 306 codes are not used for redirects at all.

What about other HTTP methods?

I decided to simplify the explanation above. In all places where it says POST above, you can replace it with any non-GET method. They’re just slightly less common on the browser centric web.

curl and redirects

I couldn’t write a text like this without spicing it up with some curl details!

First, curl and libcurl don’t follow redirects by default. You need to ask curl to do it with -L (or –location) or libcurl with CURLOPT_FOLLOWLOCATION.

It turns out that there are web services out there in the world that want a POST sent, are responding with HTTP redirects that use a 301, 302 or 303 response code and still want the HTTP client to send the next request as a POST. As explained above, browsers won’t do that and neither will curl – by default.

Since these setups exist, and they’re actually not terribly rare, curl offers options to alter its behavior.

You can tell curl to not change the non-GET request method to GET after a 30x response by using the dedicated options for that:
–post301, –post302 and –post303. If you are instead writing a libcurl based application, you control that behavior with the CURLOPT_POSTREDIR option.

Here’s how a simple HTTP/1.1 redirect can look like. Note the 301, this is “permanent”:
curl-shows-redirect

HTTP/2 adoption, end of 2015

http2 front imageWhen I asked my surrounding in March 2015 to guess the expected HTTP/2 adoption by now, we as a group ended up with about 10%. OK, the question was vaguely phrased and what does it really mean? Let’s take a look at some aspects of where we are now.

Perhaps the biggest flaw in the question was that it didn’t specify HTTPS. All the browsers of today only implement HTTP/2 over HTTPS so of course if every HTTPS site in the world would support HTTP/2 that would still be far away from all the HTTP requests. Admittedly, browsers aren’t the only HTTP clients…

During the fall of 2015, both nginx and Apache shipped release versions with HTTP/2 support. nginx made it slightly harder for people by forcing users to select either SPDY or HTTP/2 (which was a technical choice done by them, not really enforced by the protocols) and also still telling users that SPDY is the safer choice.

Let’s Encrypt‘s finally launching their public beta in the early December also helps HTTP/2 by removing one of the most annoying HTTPS obstacles: the cost and manual administration of server certs.

Amount of Firefox responses

This is the easiest metric since Mozilla offers public access to the metric data. It is skewed since it is opt-in data and we know that certain kinds of users are less likely to enable this (if you’re more privacy aware or if you’re using it in enterprise environments for example). This also then measures the share by volume of requests; making the popular sites get more weight.

Firefox 43 counts no less than 22% of all HTTP responses as HTTP/2 (based on data from Dec 8 to Dec 16, 2015).

Out of all HTTP traffic Firefox 43 generates, about 63% is HTTPS which then makes almost 35% of all Firefox HTTPS requests are HTTP/2!

Firefox 43 is also negotiating HTTP/2 four times as often as it ends up with SPDY.

Amount of browser traffic

One estimate of how large share of browsers that supports HTTP/2 is the caniuse.com number. Roughly 70% on a global level. Another metric is the one published by KeyCDN at the end of October 2015. When they enabled HTTP/2 by default for their HTTPS customers world wide, the average number of users negotiating HTTP/2 turned out to be 51%. More than half!

Cloudflare however, claims the share of supported browsers are at a mere 26%. That’s a really big difference and I personally don’t buy their numbers as they’re way too negative and give some popular browsers very small market share. For example: Chrome 41 – 49 at a mere 15% of the world market, really?

I think the key is rather that it all boils down to what you measure – as always.

Amount of the top-sites in the world

Netcraft bundles SPDY with HTTP/2 in their October report, but it says that “29% of SSL sites within the thousand most popular sites currently support SPDY or HTTP/2, while 8% of those within the top million sites do.” (note the “of SSL sites” in there)

That’s now slightly old data that came out almost exactly when Apache first release its HTTP/2 support in a public release and Nginx hadn’t even had it for a full month yet.

Facebook eventually enabled HTTP/2 in November 2015.

Amount of “regular” sites

There’s still no ideal service that scans a larger portion of the Internet to measure adoption level. The httparchive.org site is about to change to a chrome-based spider (from IE) and once that goes live I hope that we will get better data.

W3Tech’s report says 2.5% of web sites in early December – less than SPDY!

I like how isthewebhttp2yet.com looks so far and I’ve provided them with my personal opinions and feedback on what I think they should do to make that the preferred site for this sort of data.

Using the shodan search engine, we could see that mid December 2015 there were about 115,000 servers on the Internet using HTTP/2.  That’s 20,000 (~24%) more than isthewebhttp2yet site says. It doesn’t really show percentages there, but it could be interpreted to say that slightly over 6% of HTTP/1.1 sites also support HTTP/2.

On Dec 3rd 2015, Cloudflare enabled HTTP/2 for all its customers and they claimed they doubled the number of HTTP/2 servers on the net in that single move. (The shodan numbers seem to disagree with that statement.)

Amount of system lib support

iOS 9 supports HTTP/2 in its native HTTP library. That’s so far the leader of HTTP/2 in system libraries department. Does Mac OS X have something similar?

I had expected Window’s wininet or other HTTP libs to be up there as well but I can’t find any details online about it. I hear the Android HTTP libs are not up to snuff either but since okhttp is now part of Android to some extent, I guess proper HTTP/2 in Android is not too far away?

Amount of HTTP API support

I hear very little about HTTP API providers accepting HTTP/2 in addition or even instead of HTTP/1.1. My perception is that this is basically not happening at all yet.

Next-gen experiments

If you’re using a modern Chrome browser today against a Google service you’re already (mostly) using QUIC instead of HTTP/2, thus you aren’t really adding to the HTTP/2 client side numbers but you’re also not adding to the HTTP/1.1 numbers.

QUIC and other QUIC-like (UDP-based with the entire stack in user space) protocols are destined to grow and get used even more as we go forward. I’m convinced of this.

Conclusion

Everyone was right! It is mostly a matter of what you meant and how to measure it.

Future

Recall the words on the Chromium blog: “We plan to remove support for SPDY in early 2016“. For Firefox we haven’t said anything that absolute, but I doubt that Firefox will support SPDY for very long after Chrome drops it.

A 2015 retrospective

Wow, another year has passed. Summing up some things I did this year.

Commits

I don’t really have good global commit count for the year, but github counts 1300 commits and I believe the vast majority of my commits are hosted there. Most of them in curl and curl-oriented projects.

We did 8 curl releases during the year featuring a total of 575 bug fixes. The almost 1,200 commits were authored by 107 different individuals.

Books

I continued working on http2 explained during the year, and after having changed to markdown format it is now available in more languages than ever thanks to our awesome translators!

I started my second book project in the fall of 2015, using the working title everything curl, which is a much larger book effort than the HTTP/2 book and after having just passed 23,500 words that create over 110 pages in the PDF version, almost half of the planned sections are still left to write…

Twitter

I almost doubled my number of twitter followers during this year, now at 2,850 something. While this is a pointless number, reaching out slightly further does have the advantage that I get better responses and that makes me appreciate and get more out of twitter.

Stackoverflow

I’ve continued to respond to questions there, and my total count is now at 550 answers, out of which I wrote about 80 this year. The top scored answer I wrote during 2015 is for a question that isn’t phrased like one: Apache and HTTP2.

Keyboard use

I’ve pressed a bit over 6.4 million keys on my primary keyboard during the year, and 10.7% of the keys were pressed on weekends.

During the 2900+ hours when at least one key press were registered, I averaged on 2206 key presses per hour.

The most excessive key banging hour of the year started  September 21 at 14:00 and ended with me reaching 10,875 key presses.

The most excessive day was June 9, during which I pushed 63,757 keys.

Talks

This is all the 16 opportunities where I’ve talked in front of an audience during 2015. As you will see, the list of topics were fairly limited…

Daniel talking at Apachecon 2015

curl and HTTP/2 by default

cURLFollowers of this blog know that I’ve dabbled with HTTP/2 stuff for quite some time, and curl got its initial support for the new protocol version already in September 2013.

curl shipped “proper” HTTP/2 support as it looks in the final specification both for the command line tool and the libcurl library before any browsers did in their release versions. (Firefox was the first browser to ship HTTP/2 in a release version, followed by Chrome. Both did this in the beginning of 2015.)

libcurl features an option that lets the application to select HTTP version to use, and that includes HTTP/2 since back then. The command line tool got a corresponding command line option (aptly named –http2) to switch on this protocol version.

This goes hand in hand with curl’s general philosophy that it just does the basics and you have to specifically switch on more features and tell it to enable things you want to use. This conservative approach makes it very reliable protocol-wise and provides applications a very large level of control. The downside is of course that fewer people switch on certain features since they’re just not aware of them. Or as in this case with HTTP/2, it also complicates matters that only a subset of users still have a HTTP/2 tool and library since they might still run outdated versions or they may run recent versions that were built without the necessary prerequisites present (basically the nghttp2 library).

By default?

libcurl is even more conservative that the curl tool so switching default for the library isn’t really on the agenda yet. We are very careful of modifying behavior so we’re saving that for later but what about upping the curl tool a notch?

We could switch the default to use HTTP/2 as soon as the tool has the powers built-in. But for regular clear text HTTP, using the Upgrade: header has a few drawbacks. It makes the requests larger, it complicates matter somewhat that most servers don’t do upgrades on HTTP POST requests (and a few others) so there might indeed be several requests before an upgrade is actually made even on a server that supports HTTP/2 and perhaps the strongest reason: we already found servers that (wrongly, I would claim) reject requests with Upgrade: headers they don’t understand. All this taken together, Upgrade over HTTP will break too many requests that work with HTTP 1.1. And then we haven’t even considered how the breakage situation will be when using explicit or transparent proxies…

By default!

To help users with this problem of HTTP upgrades not being feasible by default, we’ve just landed a new alternative to the “set HTTP version” that only sets HTTP/2 for HTTPS requests and leaves it to HTTP/1.1 for clear text HTTP requests. This option will ship in the next release, to be called 7.47.0, and can of course be tested out before that with git or daily snapshot builds.

Setting this option is next to risk-free, as the HTTP/2 negotiation in TLS is based on one or two TLS extensions (NPN and ALPN) that both have proper fallbacks to 1.1.

Said and done. The curl tool now sets this option. Using the curl tool on a HTTPS:// URL will from now on use HTTP/2 by default as soon as both the libcurl it uses and the server it connects to support HTTP/2!

We will of course keep our eyes and ears open to see if this causes any problems. Let us know what you see!

The most popular curl download – by a malware

During October 2015 the curl web site sent out 1127 gigabytes of data. This was the first time we crossed the terabyte limit within a single month.

Looking at the stats a little closer, I noticed that in July 2015 a particular single package started to get very popular. The exact URL was

http://curl.haxx.se/gknw.net/7.40.0/dist-w32/curl-7.40.0-devel-mingw32.zip

Curious. In October it alone was downloaded more than 300,000 times, accounting for over 70% of the site’s bandwidth. Why?

The downloads came from what appears to be different locations. They don’t use any HTTP referer headers and they used different User-agent headers. I couldn’t really see a search bot gone haywire or a malicious robot stuck in a crazy mode.

After I shared some of this data over in our IRC channel (#curl on freenode), Björn Stenberg stumbled over this AVG slide set, describing how a particular malware works when it infects a computer. Downloading that particular file is thus a step in its procedures to create a trojan that will run on the host system – see slide 11 for the curl details. The slide also mentions that an updated version of the malware comes bundled with the curl library already, which then I guess makes the hits we see on the curl site being done by the older versions still being run.

Of course, we can’t be completely sure this is the source for the increased download of this particular file but it seems highly likely.

I renamed the file just now to see what happens.

Evil use of good code

We can of course not prevent evil uses of our code. We provide source code and we even host some binaries of curl and libcurl and both good and bad actors are able to take advantage of our offers.

This rename won’t prevent a dedicated hacker, but hopefully it can prevent a few new victims from getting this malware running on their machines.

Update: the hacker news discussion about this post.