Category Archives: Technology

Really everything related to technology

curl 7.57.0 happiness

The never-ending series of curl releases continued today when we released version 7.57.0. The 171th release since the beginning, and the release that follows 37 days after 7.56.1. Remember that 7.56.1 was an extra release that fixed a few most annoying regressions.

We bump the minor number to 57 and clear the patch number in this release due to the changes introduced. None of them very ground breaking, but fun and useful and detailed below.

41 contributors helped fix 69 bugs in these 37 days since the previous release, using 115 separate commits. 23 of those contributors were new, making the total list of contributors now contain 1649 individuals! 25 individuals authored commits since the previous release, making the total number of authors 540 persons.

The curl web site currently sends out 8GB data per hour to over 2 million HTTP requests per day.

Support RFC7616 - HTTP Digest

This allows HTTP Digest authentication to use the must better SHA256 algorithm instead of the old, and deemed unsuitable, MD5. This should be a transparent improvement so curl should just be able to use this without any particular new option has to be set, but the server-side support for this version seems to still be a bit lacking.

(Side-note: I'm credited in RFC 7616 for having contributed my thoughts!)

Sharing the connection cache

In this modern age with multi core processors and applications using multi-threaded designs, we of course want libcurl to enable applications to be able to get the best performance out of libcurl.

libcurl is already thread-safe so you can run parallel transfers multi-threaded perfectly fine if you want to, but it doesn't allow the application to share handles between threads. Before this specific change, this limitation has forced multi-threaded applications to be satisfied with letting libcurl has a separate "connection cache" in each thread.

The connection cache, sometimes also referred to as the connection pool, is where libcurl keeps live connections that were previously used for a transfer and still haven't been closed, so that a subsequent request might be able to re-use one of them. Getting a re-used connection for a request is much faster than having to create a new one. Having one connection cache per thread, is ineffective.

Starting now, libcurl's "share concept" allows an application to specify a single connection cache to be used cross-thread and cross-handles, so that connection re-use will be much improved when libcurl is used multi-threaded. This will significantly benefit the most demanding libcurl applications, but it will also allow more flexible designs as now the connection pool can be designed to survive individual handles in a way that wasn't previously possible.

Brotli compression

The popular browsers have supported brotli compression method for a while and it has already become widely supported by servers.

Now, curl supports it too and the command line tool's --compressed option will ask for brotli as well as gzip, if your build supports it. Similarly, libcurl supports it with its CURLOPT_ACCEPT_ENCODING option. The server can then opt to respond using either compression format, depending on what it knows.

According to CertSimple, who ran tests on the top-1000 sites of the Internet, brotli gets contents 14-21% smaller than gzip.

As with other compression algorithms, libcurl uses a 3rd party library for brotli compression and you may find that Linux distributions and others are a bit behind in shipping packages for a brotli decompression library. Please join in and help this happen. At the moment of this writing, the Debian package is only available in experimental.

(Readers may remember my libbrotli project, but that effort isn't really needed anymore since the brotli project itself builds a library these days.)

Three security issues

In spite of our hard work and best efforts, security issues keep getting reported and we fix them accordingly. This release has three new ones and I'll describe them below. None of them are alarmingly serious and they will probably not hurt anyone badly.

Two things can be said about the security issues this time:

1. You'll note that we've changed naming convention for the advisory URLs, so that they now have a random component. This is to reduce potential information leaks based on the name when we pass these around before releases.

2. Two of the flaws happen only on 32 bit systems, which reveals a weakness in our testing. Most of our CI tests, torture tests and fuzzing are made on 64 bit architectures. We have no immediate and good fix for this, but this is something we must work harder on.

1. NTLM buffer overflow via integer overflow

(CVE-2017-8816) Limited to 32 bit systems, this is a flaw where curl takes the combined length of the user name and password, doubles it, and allocates a memory area that big. If that doubling ends up larger than 4GB, an integer overflow makes a very small buffer be allocated instead and then curl will overwrite that.

Yes, having user name plus password be longer than two gigabytes is rather excessive and I hope very few applications would allow this.

2. FTP wildcard out of bounds read

(CVE-2017-8817) curl's wildcard functionality for FTP transfers is not a not very widely used feature, but it was discovered that the default pattern matching function could erroneously read beyond the URL buffer if the match pattern ends with an open bracket '[' !

This problem was detected by the OSS-Fuzz project! This flaw  has existed in the code since this feature was added, over seven years ago.

3. SSL out of buffer access

(CVE-2017-8818) In July this year we introduced multissl support in libcurl. This allows an application to select which TLS backend libcurl should use, if it was built to support more than one. It was a fairly large overhaul to the TLS code in curl and unfortunately it also brought this bug.

Also, only happening on 32 bit systems, libcurl would allocate a buffer that was 4 bytes too small for the TLS backend's data which would lead to the TLS library accessing and using data outside of the heap allocated buffer.

Next?

The next release will ship no later than January 24th 2018. I think that one will as well add changes and warrant the minor number to bump. We have fun pending stuff such as: a new SSH backend, modifiable happy eyeballs timeout and more. Get involved and help us do even more good!

HTTPS-only curl mirrors

We've had volunteers donating bandwidth to the curl project basically since its inception. They mirror our download archives so that you can download them directly from their server farms instead of hitting the main curl site.

On the main site we check the mirrors daily and offers convenient download links from the download page. It has historically been especially useful for the rare occasions when our site has been down for administrative purpose or others.

Since May 2017 the curl site is fronted by Fastly which then has reduced the bandwidth issue as well as the downtime problem. The mirrors are still there though.

Starting now, we will only link to download mirrors that offer the curl downloads over HTTPS in our continued efforts to help our users to stay secure and avoid malicious manipulation of data. I've contacted the mirror admins and asked if they can offer HTTPS instead.

The curl download page still contains links to HTTP-only packages and pages, and we would really like to fix them as well. But at the same time we've reasoned that it is better to still help users to find packages than not, so for the packages where there are no HTTPS linkable alternatives we still link to HTTP-only pages. For now.

If you host curl packages anywhere, for anyone, please consider hosting them over HTTPS for all the users' sake.

The life of a curl security bug

The report

Usually, security problems in the curl project come to us out of the blue. Someone has found a bug they suspect may have a security impact and they tell us about it on the curl-security@haxx.se email address. Mails sent to this address reach a private mailing list with the curl security team members as the only subscribers.

An important first step is that we respond to the sender, acknowledging the report. Often we also include a few follow-up questions at once. It is important to us to keep the original reporter in the loop and included in all subsequent discussions about this issue - unless they prefer to opt out.

If we find the issue ourselves, we act pretty much the same way.

In the most obvious and well-reported cases there are no room for doubts or hesitation about what the bugs and the impact of them are, but very often the reports lead to discussions.

The assessment

Is it a bug in the first place, is it perhaps even documented or just plain bad use?

If it is a bug, is this a security problem that can be abused or somehow put users in some sort of risk?

Most issues we get reported as security issues are also in the end treated as such, as we tend to err on the safe side.

The time plan

Unless the issue is critical, we prefer to schedule a fix and announcement of the issue in association with the pending next release, and as we do releases every 8 weeks like clockwork, that's never very far away.

We communicate the suggested schedule with the reporter to make sure we agree. If a sooner release is preferred, we work out a schedule for an extra release. In the past we've did occasional faster security releases also when the issue already had been made public, so we wanted to shorten the time window during which users could be harmed by the problem.

We really really do not want a problem to persist longer than until the next release.

The fix

The curl security team and the reporter work on fixing the issue. Ideally in part by the reporter making sure that they can't reproduce it anymore and we add a test case or two.

We keep the fix undisclosed for the time being. It is not committed to the public git repository but kept in a private branch. We usually put it on a private URL so that we can link to it when we ask for a CVE, see below.

All security issues should make us ask ourselves - what did we do wrong that made us not discover this sooner? And ideally we should introduce processes, tests and checks to make sure we detect other similar mistakes now and in the future.

Typically we only generate a single patch from the git master master and offer that as the final solution. In the curl project we don't maintain multiple branches. Distros and vendors who ship older or even multiple curl versions backport the patch to their systems by themselves. Sometimes we get backported patches back to offer users as well, but those are exceptions to the rule.

The advisory

In parallel to working on the fix, we write up a "security advisory" about the problem. It is a detailed description about the problem, what impact it may have if triggered or abused and if we know of any exploits of it.

What conditions need to be met for the bug to trigger. What's the version range that is affected, what's the remedies that can be done as a work-around if the patch is not applied etc.

We work out the advisory in cooperation with the reporter so that we get the description and the credits right.

The advisory also always contains a time line that clearly describes when we got to know about the problem etc.

The CVE

Once we have an advisory and a patch, none of which needs to be their final versions, we can proceed and ask for a CVE. A CVE is a unique "ID" that is issued for security problems to make them easy to reference. CVE stands for Common Vulnerabilities and Exposures.

Depending on where in the release cycle we are, we might have to hold off at this point. For all bugs that aren't proprietary-operating-system specific, we pre-notify and ask for a CVE on the distros@openwall mailing list. This mailing list prohibits an embargo longer than 14 days, so we cannot ask for a CVE from them longer than 2 weeks in advance before our release.

The idea here is that the embargo time gives the distributions time and opportunity to prepare updates of their packages so they can be pretty much in sync with our release and reduce the time window their users are at risk. Of course, not all operating system vendors manage to actually ship a curl update on two weeks notice, and at least one major commercial vendor regularly informs me that this is a too short time frame for them.

For flaws that don't affect the free operating systems at all, we ask MITRE directly for CVEs.

The last 48 hours

When there is roughly 48 hours left until the coming release and security announcement, we merge the private security fix branch into master and push it. That immediately makes the fix public and those who are alert can then take advantage of this knowledge - potentially for malicious purposes. The security advisory itself is however not made public until release day.

We use these 48 hours to get the fix tested on more systems to verify that it is not doing any major breakage. The weakest part of our security procedure is that the fix has been worked out in secret so it has not had the chance to get widely built and tested, so that is performed now.

The release

We upload the new release. We send out the release announcement email, update the web site and make the advisory for the issue public. We send out the security advisory alert on the proper email lists.

Bug Bounty?

Unfortunately we don't have any bug bounties on our own in the curl project. We simply have no money for that. We actually don't have money at all for anything.

Hackerone offers bounties for curl related issues. If you have reported a critical issue you can request one from them after it has been fixed in curl.

 

Say hi to curl 7.56.0

Another curl version has been released into the world. curl 7.56.0 is available for download from the usual place. Here are some news I think are worthy to mention this time...

An FTP security issue

A mistake in the code that parses responses to the PWD command could make curl read beyond the end of a buffer, Max Dymond figured it out, and we've released a security advisory about it. Our 69th security vulnerability counted from the beginning and the 8th reported in 2017.

Multiple SSL backends

Since basically forever you've been able to build curl with a selected SSL backend to make it get a different feature set or behave slightly different - or use a different license or get a different footprint. curl supports eleven different TLS libraries!

Starting now, libcurl can be built to support more than one SSL backend! You specify all the SSL backends at build-time and then you can tell libcurl at run-time exactly which of the backends it should use.

The selection can only happen once per invocation so there's no switching back and forth among them, but still. It also of course requires that you actually build curl with more than one TLS library, which you do by telling configure all the libs to use.

The first user of this feature that I'm aware of is git for windows that can select between using the schannel and OpenSSL backends.

curl_global_sslset() is the new libcurl call to do this with.

This feature was brought by Johannes Schindelin.

New MIME API

The currently provided API for creating multipart formposts, curl_formadd, has always been considered a bit quirky and complicated to work with. Its extensive use of varargs is to blame for a significant part of that.

Now, we finally introduce a replacement API to accomplish basically the same features but also with a few additional ones, using a new API that is supposed to be easier to use and easier to wrap for bindings etc.

Introducing the mime API: curl_mime_init, curl_mime_addpart, curl_mime_name and more. See the postit2.c and multi-post.c examples for some easy to grasp examples.

This work was done by Patrick Monnerat.

SSH compression

The SSH protocol allows clients and servers to negotiate to use of compression when communicating, and now curl can too. curl has the new --compressed-ssh option and libcurl has a new setopt called CURLOPT_SSH_COMPRESSION using the familiar style.

Feature worked on by Viktor Szakats.

SSLKEYLOGFILE

Peter Wu and Jay Satiro have worked on this feature that allows curl to store SSL session secrets in a file if this environment variable is set. This is normally the way you tell Chrome and Firefox to do this, and is extremely helpful when you want to wireshark and analyze a TLS stream.

This is still disabled by default due to its early days. Enable it by defining ENABLE_SSLKEYLOGFILE when building libcurl and set environment variable SSLKEYLOGFILE to a pathname that will receive the keys.

Numbers

This, the 169th curl release, contains 89 bug fixes done during the 51 days since the previous release.

47 contributors helped making this release, out of whom 18 are new.

254 commits were done since the previous release, by 26 authors.

The top-5 commit authors this release are:

  1. Daniel Stenberg (116)
  2. Johannes Schindelin (37)
  3. Patrick Monnerat (28)
  4. Jay Satiro (12)
  5. Dan Fandrich (10)

Thanks a lot everyone!

(picture from pixabay)

The backdoor threat

-- "Have you ever detected anyone trying to add a backdoor to curl?"

-- "Have you ever been pressured by an organization or a person to add suspicious code to curl that you wouldn't otherwise accept?"

-- "If a crime syndicate would kidnap your family to force you to comply, what backdoor would you be be able to insert into curl that is the least likely to get detected?" (The less grim version of this question would instead offer huge amounts of money.)

I've been asked these questions and variations of them when I've stood up in front of audiences around the world and talked about curl and how it is one of the most widely used software components in the world, counting way over three billion instances.

Back door (noun)
-- a feature or defect of a computer system that allows surreptitious unauthorized access to data.

So how is it?

No. I've never seen a deliberate attempt to add a flaw, a vulnerability or a backdoor into curl. I've seen bad patches and I've seen patches that brought bugs that years later were reported as security problems, but I did not spot any deliberate attempt to do bad in any of them. But if done with skills, certainly I wouldn't have noticed them being deliberate?

If I had cooperated in adding a backdoor or been threatened to, then I wouldn't tell you anyway and I'd thus say no to questions about it.

How to be sure

There is only one way to be sure: review the code you download and intend to use. Or get it from a trusted source that did the review for you.

If you have a version you trust, you really only have to review the changes done since then.

Possibly there's some degree of safety in numbers, and as thousands of applications and systems use curl and libcurl and at least some of them do reviews and extensive testing, one of those could discover mischievous activities if there are any and report them publicly.

Infected machines or owned users

The servers that host the curl releases could be targeted by attackers and the tarballs for download could be replaced by something that carries evil code. There's no such thing as a fail-safe machine, especially not if someone really wants to and tries to target us. The safeguard there is the GPG signature with which I sign all official releases. No malicious user can (re-)produce them. They have to be made by me (since I package the curl releases). That comes back to trusting me again. There's of course no safe-guard against me being forced to signed evil code with a knife to my throat...

If one of the curl project members with git push rights would get her account hacked and her SSH key password brute-forced, a very skilled hacker could possibly sneak in something, short-term. Although my hopes are that as we review and comment each others' code to a very high degree, that would be really hard. And the hacked person herself would most likely react.

Downloading from somewhere

I think the highest risk scenario is when users download pre-built curl or libcurl binaries from various places on the internet that isn't the official curl web site. How can you know for sure what you're getting then, as you couldn't review the code or changes done. You just put your trust in a remote person or organization to do what's right for you.

Trusting other organizations can be totally fine, as when you download using Linux distro package management systems etc as then you can expect a certain level of checks and vouching have happened and there will be digital signatures and more involved to minimize the risk of external malicious interference.

Pledging there's no backdoor

Some people argue that projects could or should pledge for every release that there's no deliberate backdoor planted so that if the day comes in the future when a three-letter secret organization forces us to insert a backdoor, the lack of such a pledge for the subsequent release would function as an alarm signal to people that something is wrong.

That takes us back to trusting a single person again. A truly evil adversary can of course force such a pledge to be uttered no matter what, even if that then probably is more mafia level evilness and not mere three-letter organization shadiness anymore.

I would be a bit stressed out to have to do that pledge every single release as if I ever forgot or messed it up, it should lead to a lot of people getting up in arms and how would such a mistake be fixed? It's little too irrevocable for me. And we do quite frequent releases so the risk for mistakes is not insignificant.

Also, if I would pledge that, is that then a promise regarding all my code only, or is that meant to be a pledge for the entire code base as done by all committers? It doesn't scale very well...

Additionally, I'm a Swede living in Sweden. The American organizations cannot legally force me to backdoor anything, and the Swedish versions of those secret organizations don't have the legal rights to do so either (caveat: I'm not a lawyer). So, the real threat is not by legal means.

What backdoor would be likely?

It would be very hard to add code, unnoticed, that sends off data to somewhere else. Too much code that would be too obvious.

A backdoor similarly couldn't really be made to split off data from the transfer pipe and store it locally for other systems to read, as that too is probably too much code that is too different than the current code and would be detected instantly.

No, I'm convinced the most likely backdoor code in curl is a deliberate but hard-to-detect security vulnerability that let's the attacker exploit the program using libcurl/curl by some sort of specific usage pattern. So when triggered it can trick the program to send off memory contents or perhaps overwrite the local stack or the heap. Quite possibly only one step out of several steps necessary for a successful attack, much like how a single-byte-overwrite can lead to root access.

Any past security problems on purpose?

We've had almost 70 security vulnerabilities reported through the project's almost twenty years of existence. Since most of them were triggered by mistakes in code I wrote myself, I can be certain that none of those problems were introduced on purpose. I can't completely rule out that someone else's patch modified curl along the way and then by extension maybe made a vulnerability worse or easier to trigger, could have been made on purpose. None of the security problems that were introduced by others have shown any sign of "deliberateness". (Or were written cleverly enough to not make me see that!)

Maybe backdoors have been planted that we just haven't discovered yet?

Discussion

Follow-up discussion/comments on hacker news.

keep finding old security problems

I decided to look closer at security problems and the age of the reported issues in the curl project.

One theory I had when I started to collect this data, was that we actually get security problems reported earlier and earlier over time. That bugs would be around in public release for shorter periods of time nowadays than what they did in the past.

My thinking would go like this: Logically, bugs that have been around for a long time have had a long time to get caught. The more eyes we've had on the code, the fewer old bugs should be left and going forward we should more often catch more recently added bugs.

The time from a bug's introduction into the code until the day we get a security report about it, should logically decrease over time.

What if it doesn't?

First, let's take a look at the data at hand. In the curl project we have so far reported in total 68 security problems over the project's life time. The first 4 were not recorded correctly so I'll discard them from my data here, leaving 64 issues to check out.

The graph below shows the time distribution. The all time leader so far is the issue reported to us on March 10 this year (2017), which was present in the code since the version 6.5 release done on March 13 2000. 6,206 days, just three days away from 17 whole years.

There are no less than twelve additional issues that lingered from more than 5,000 days until reported. Only 20 (31%) of the reported issues had been public for less than 1,000 days. The fastest report was reported on the release day: 0 days.

The median time from release to report is a whopping 2541 days.

When we receive a report about a security problem, we want the issue fixed, responsibly announced to the world and ship a new release where the problem is gone. The median time to go through this procedure is 26.5 days, and the distribution looks like this:

What stands out here is the TLS session resumption bypass, which happened because we struggled with understanding it and how to address it properly. Otherwise the numbers look all reasonable to me as we typically do releases at least once every 8 weeks. We rarely ship a release with a known security issue outstanding.

Why are very old issues still found?

I think partly because the tools are gradually improving that aid people these days to find things much better, things that simply wasn't found very often before. With new tools we can find problems that have been around for a long time.

Every year, the age of the oldest parts of the code get one year older. So the older the project gets, the older bugs can be found, while in the early days there was a smaller share of the code that was really old (if any at all).

What if we instead count age as a percentage of the project's life time? Using this formula, a bug found at day 100 that was added at day 50 would be 50% but if it was added at day 80 it would be 20%. Maybe this would show a graph where the bars are shrinking over time?

But no. In fact it shows 17 (27%) of them having been present during 80% or more of the project's life time! The median issue had been in there during 49% of the project's life time!

It does however make another issue the worst offender, as one of the issues had been around during 91% of the project's life time.

This counts on March 20 1998 being the birth day. Of course we got no reports the first few years since we basically had no users then!

Specific or generic?

Is this pattern something that is specific for the curl project or can we find it in other projects too? I don't know. I have not seen this kind of data being presented by others and I don't have the same insight on such details of projects with an enough amount of issues to be interesting.

What can we do to make the bars shrink?

Well, if there are old bugs left to find they won't shrink, because for every such old security issue that's still left there will be a tall bar. Hopefully though, by doing more tests, using more tools regularly (fuzzers, analyzers etc) and with more eyeballs on the code, we should iron out our security issues over time. Logically that should lead to a project where newly added security problems are detected sooner rather than later. We just don't seem to be at that point yet...

Caveat

One fact that skews the numbers is that we are much more likely to record issues as security related these days. A decade ago when we got a report about a segfault or something we would often just consider it bad code and fix it, and neither us maintainers nor the reporter would think much about the potential security impact.

These days we're at the other end of the spectrum where we people are much faster to jumping to a security issue suspicion or conclusion. Today people report bugs as security issues to a much higher degree than they did in the past. This is basically a good thing though, even if it makes it harder to draw conclusions over time.

Data sources

When you want to repeat the above graphs and verify my numbers:

  • vuln.pm - from the curl web site repository holds security issue meta data
  • releaselog - on the curl web site offers release meta data, even as a CSV download on the bottom of the page
  • report2release.pl - the perl script I used to calculate the report until release periods.

“OPTIONS *” with curl

(Note: this blog post as been updated as the command line option changed after first publication, based on comments to this very post!)

curl is arguably a "Swiss army knife" of HTTP fiddling. It is one of the available tools in the toolbox with a large set of available switches and options to allow us to tweak and modify our HTTP requests to really test, debug and torture our HTTP servers and services.

That's the way we like it.

In curl 7.55.0 it will take yet another step into this territory when we finally introduce a way for users to send "OPTION *" and similar requests to servers. It has been requested occasionally by users over the years but now the waiting is over. (brought by this commit)

"OPTIONS *" is special and peculiar just because it is one of the few specified requests you can do to a HTTP server where the path part doesn't start with a slash. Thus you cannot really end up with this based on a URL and as you know curl is pretty much all about URLs.

The OPTIONS method was introduced in HTTP 1.1 already back in RFC 2068, published in January 1997 (even before curl was born) and with curl you've always been able to send an OPTIONS request with the -X option, you just were never able to send that single asterisk instead of a path.

In curl 7.55.0 and later versions, you can remove the initial slash from the path part that ends up in the request by using --request-target. So to send an OPTION * to example.com for http and https URLs, you could do it like:

$ curl --request-target "*" -X OPTIONS http://example.com
$ curl --request-target "*" -X OPTIONS https://example.com/

In classical curl-style this also opens up the opportunity for you to issue completely illegal or otherwise nonsensical paths to your server to see what it does on them, to send totally weird options to OPTIONS and similar games:

$ curl --request-target "*never*" -X OPTIONS http://example.com

$ curl --request-target "allpasswords" http://example.com

Enjoy!

curling over HTTP proxy

Starting in curl 7.55.0 (this commit), curl will no longer try to ask HTTP proxies to perform non-HTTP transfers with GET, except for FTP. For all other protocols, curl now assumes you want to tunnel through the HTTP proxy when you use such a proxy and protocol combination.

Protocols and proxies

curl supports 23 different protocols right now, if we count the S-versions (the TLS based alternatives) as separate protocols.

curl also currently supports seven different proxy types that can be set independently of the protocol.

One type of proxy that curl supports is a so called "HTTP proxy". The official HTTP standard includes a defined way how to speak to such a proxy and ask it to perform the request on the behalf of the client. curl supports using that over either HTTP/1.1 or HTTP/1.0, where you'd typically only use the latter version if you the first really doesn't work with your ancient proxy.

HTTP proxy

All that is fine and good. But HTTP proxies were really only defined to handle HTTP, and to some extent HTTPS. When doing plain HTTP transfers over a proxy, the client will send its request to the proxy like this:

GET http://curl.haxx.se/ HTTP/1.1
Host: curl.haxx.se
Accept: */*
User-Agent: curl/7.55.0

... but for HTTPS, which should provide end to end encryption, a client needs to ask the proxy to instead tunnel through the proxy so that it can do TLS all the way, without any middle man, to the server:

CONNECT curl.haxx.se:443 HTTP/1.1
Host: curl.haxx.se:443
User-Agent: curl/7.55.0

When successful, the proxy responds with a "200" which means that the proxy has established a TCP connection to the remote server the client asked it to connect to, and the client can then proceed and do the TLS handshake with that server. When the TLS handshake is completed, a regular GET request is then sent over that established and secure TLS "tunnel" to the server. A GET request that then looks like one that is sent without proxy:

GET / HTTP/1.1
Host: curl.haxx.se
User-Agent: curl/7.55.0
Accept: */*

FTP over HTTP proxy

Things get more complicated when trying to perform transfers over the HTTP proxy using schemes that aren't HTTP. As already described above, HTTP proxies are basically designed only for doing HTTP over them, but as they have this concept of tunneling through to the remote server it doesn't have to be limited to just HTTP.

Also, historically, for decades people have deployed HTTP proxies that recognize FTP URLs, and transparently handle them for the client so the client can almost believe it is HTTP while the proxy has to speak FTP to the remote server in the other end and convert it back to HTTP to the client. On such proxies (Squid and Apache both support this mode for example), this sort of request is possible:

GET ftp://ftp.funet.fi/ HTTP/1.1
Host: ftp.funet.fi
User-Agent: curl/7.55.0
Accept: */*

curl knows this and if you ask curl for FTP over an HTTP proxy, it will assume you have one of these proxies. It should be noted that this method of course limits what you can do FTP-wise and for example FTP upload is usually not working and if you ask curl to do FTP upload over and HTTP proxy it will do that with a HTTP PUT.

HTTP proxy tunnel

curl features an option (--proxytunnel) that lets the user forcible tell the client to not assume that the proxy speaks this protocol and instead use the CONNECT method with establishing a tunnel through the proxy to the remote server.

It should of course be noted that very few deployed HTTP proxies in the wild allow clients to CONNECT to whatever port they like. HTTP proxies tend to only allow connecting to port 443 as that is the official HTTPS port, and if you ask for another port it will respond back with a 4xx response code refusing to comply.

Not HTTP not FTP over HTTP proxy

So HTTP, HTTPS and FTP are sent over the HTTP proxy fine. That leaves us with nineteen more protocols. What happens with them when you ask curl to perform them over a HTTP proxy?

Now we have finally reached the change that has just been merged in curl and changes what curl does.

Before 7.55.0

curl would send all protocols as a regular GET to the proxy if asked to use a HTTP proxy without seeing the explicit proxy-tunnel option. This came from how FTP was done and grew from there without many people questioning it. Of course it wouldn't ever work, but also very few people would actually attempt it because of that.

From 7.55.0

All protocols that aren't HTTP, HTTPS or FTP will enable the tunnel-through mode automatically when a HTTP proxy is used. No more sending funny GET requests to proxies when they won't work anyway. Also, it will prevent users from accidentally leak credentials to proxies that were intended for the server, which previously could happen if you omitted the tunnel option with a few authentication setups.

HTTP/2 proxy

Sorry, curl doesn't support that yet. Patches welcome!

HTTP Workshop s03e02

(Season three, episode two)

Previously, on the HTTP Workshop. Yesterday ended with a much appreciated group dinner and now we're back energized and eager to continue blabbing about HTTP frames, headers and similar things.

Martin from Mozilla talked on "connection management is hard". Parts of the discussion was around the HTTP/2 connection coalescing that I've blogged about before. The ORIGIN frame is a draft for a suggested way for servers to more clearly announce which origins it can answer for on that connection which should reduce the frequency of 421 needs. The ORIGIN frame overrides DNS and will allow coalescing even for origins that don't otherwise resolve to the same IP addresses. The Alt-Svc header, a suggested CERTIFICATE frame and how does a HTTP/2 server know for which origins it can do PUSH for?

A lot of positive words were expressed about the ORIGIN frame. Wildcard support?

Willy from HA-proxy talked about his Memory and CPU efficient HPACK decoding algorithm. Personally, I think the award for the best slides of the day goes to Willy's hand-drawn notes.

Lucas from BBC talked about usage data for iplayer and how much data and number of requests they serve and how their largest share of users are "non-browsers". Lucas mentioned their work on writing a libcurl adaption to make gstreamer use it instead of libsoup. Lucas talk triggered a lengthy discussion on what needs there are and how (if at all) you can divide clients into browsers and non-browser.

Wenbo from Google spoke about Websockets and showed usage data from Chrome. The median websockets connection time is 20 seconds and 10% something are shorter than 0.5 seconds. At the 97% percentile they live over an hour. The connection success rates for Websockets are depressingly low when done in the clear while the situation is better when done over HTTPS. For some reason the success rate on Mac seems to be extra low, and Firefox telemetry seems to agree. Websockets over HTTP/2 (or not) is an old hot topic that brought us back to reiterate issues we've debated a lot before. This time we also got a lovely and long side track into web push and how that works.

Roy talked about Waka, a HTTP replacement protocol idea and concept that Roy's been carrying around for a long time (he started this in 2001) and to which he is now coming back to do actual work on. A big part of the discussion was focused around the wakli compression ideas, what the idea is and how it could be done and evaluated. Also, Roy is not a fan of content negotiation and wants it done differently so he's addressing that in Waka.

Vlad talked about his suggestion for how to do cross-stream compression in HTTP/2 to significantly enhance compression ratio when, for example, switching to many small resources over h2 compared to a single huge resource over h1. The security aspect of this feature is what catches most of people's attention and the following discussion. How can we make sure this doesn't leak sensitive information? What protocol mechanisms exist or can we invent to help out making this work in a way that is safer (by default)?

Trailers. This is again a favorite topic that we've discussed before that is resurfaced. There are people around the table who'd like to see support trailers and we discussed the same topic in the HTTP Workshop in 2016 as well. The corresponding issue on trailers filed in the fetch github repo shows a lot of the concerns.

Julian brought up the subject of "7230bis" - when and how do we start the work. What do we want from such a revision? Fixing the bugs seems like the primary focus. "10 years is too long until update".

Kazuho talked about "HTTP/2 attack mitigation" and how to handle clients doing many parallel slow POST requests to a CDN and them having an origin server behind that runs a new separate process for each upload.

And with this, the day and the workshop 2017 was over. Thanks to Facebook for hosting us. Thanks to the members of the program committee for driving this event nicely! I had a great time. The topics, the discussions and the people - awesome!

HTTP Workshop – London edition. First day.

The HTTP workshop series is back for a third time this northern hemisphere summer. The selected location for the 2017 version is London and this time we're down to a two-day event (we seem to remove a day every year)...

Nothing in this blog entry is a quote to be attributed to a specific individual but they are my interpretations and paraphrasing of things said or presented. Any mistakes or errors are all mine.

At 9:30 this clear Monday morning, 35 persons sat down around a huge table in a room in the Facebook offices. Most of us are the same familiar faces that have already participated in one or two HTTP workshops, but we also have a set of people this year who haven't attended before. Getting fresh blood into these discussions is certainly valuable. Most major players are represented, including Mozilla, Google, Facebook, Apple, Cloudflare, Fastly, Akamai, HA-proxy, Squid, Varnish, BBC, Adobe and curl!

Mark (independent, co-chair of the HTTP working group as well as the QUIC working group) kicked it all off with a presentation on quic and where it is right now in terms of standardization and progress. The upcoming draft-04 is becoming the first implementation draft even though the goal for interop is set basically at handshake and some very basic data interaction. The quic transport protocol is still in a huge flux and things have not settled enough for it to be interoperable right now to a very high level.

Jana from Google presented on quic deployment over time and how it right now uses about 7% of internet traffic. The Android Youtube app's switch to QUIC last year showed a huge bump in usage numbers. Quic is a lot about reducing latency and numbers show that users really do get a reduction. By that nature, it improves the situation best for those who currently have the worst connections.

It doesn't solve first world problems, this solves third world connection issues.

The currently observed 2x CPU usage increase for QUIC connections as compared to h2+TLS is mostly blamed on the Linux kernel which apparently is not at all up for this job as good is should be. Things have clearly been more optimized for TCP over the years, leaving room for improvement in the UDP areas going forward. "Making kernel bypassing an interesting choice".

Alan from Facebook talked header compression for quic and presented data, graphs and numbers on how HPACK(-for-quic), QPACK and QCRAM compare when used for quic in different networking conditions and scenarios. Those are the three current header compression alternatives that are open for quic and Alan first explained the basics behind them and then how they compare when run in his simulator. The current HPACK version (adopted to quic) seems to be out of the question for head-of-line-blocking reasons, the QCRAM suggestion seems to run well but have two main flaws as it requires an awkward layering violation and an annoying possible reframing requirement on resends. Clearly some more experiments can be done, possible with a hybrid where some QCRAM ideas are brought into QPACK. Alan hopes to get his simulator open sourced in the coming months which then will allow more people to experiment and reproduce his numbers.

Hooman from Fastly on problems and challenges with HTTP/2 server push, the 103 early hints HTTP response and cache digests. This took the discussions on push into the weeds and into the dark protocol corners we've been in before and all sorts of ideas and suggestions were brought up. Some of them have been discussed before without having been resolved yet and some ideas were new, at least to me. The general consensus seems to be that push is fairly complicated and there are a lot of corner cases and murky areas that haven't been clearly documented, but it is a feature that is now being used and for the CDN use case it can help with a lot more than "just an RTT". But is perhaps the 103 response good enough for most of the cases?

The discussion on server push and how well it fares is something the QUIC working group is interested in, since the question was asked already this morning if a first version of quic could be considered to be made without push support. The jury is still out on that I think.

ekr from Mozilla spoke about TLS 1.3, 0-RTT, how the TLS 1.3 handshake looks like and how applications and servers can take advantage of the new 0-RTT and "0.5-RTT" features. TLS 1.3 is already passed the WGLC and there are now "only" a few issues pending to get solved. Taking advantage of 0RTT in an HTTP world opens up interesting questions and issues as HTTP request resends and retries are becoming increasingly prevalent.

Next: day two.