Tag Archives: proxy

curl speaks HTTP/2 with proxy

In September 2013 we merged the first code into curl that made it capable of using HTTP/2: HTTP version 2.

This version of HTTP changed a lot of previous presumptions when it comes to transfers, which introduced quite a few challenges to HTTP stack authors all of the world. One of them being that with version 2 there can be more than one transfer using the same connection where as up to that point we had always just had one transfer per connection.

In May 2015 the spec was published.

2023

Now almost eight years since the RFC was published, HTTP/2 is the version seen most frequently in browser responses if we ask the Firefox telemetry data. 44.4% of the responses are HTTP/2.

curl

This year, the curl project has been sponsored by the Sovereign Tech Fund, and one of the projects this funding has covered is what I am here to talk about:

Speaking HTTP/2 with a proxy. More specifically with what is commonly referred to as a “forward proxy.”

Many organizations and companies have setups like the one illustrated in this image below. The user on the left is inside the organization network A and the website they want to reach is on the outside on network B.

HTTP/2 to the proxy

When this is an HTTPS proxy, meaning that the communication to and with the proxy is itself protected with TLS, curl and libcurl are now capable of negotiating HTTP/2 with it.

It might not seem like a big deal to most people, and maybe it is not, but the introduction of this feature comes after some rather heavy lifting and internal refactors over the recent months that have enabled the rearrangement of networking components for this purpose.

Enable

To enable this feature in your libcurl-using application, you first need to make sure you use libcurl 8.1.0 when it ships in mid May and then you need to set the proxy type to CURLPROXY_HTTPS2.

In plain C code it could look like this:

curl_easy_setopt(handle,
                 CURLOPT_PROXYTYPE,
                 CURLPROXY_HTTPS2);
curl_easy_setopt(handle,
                 CURLOPT_PROXY,
                 "https://hostname");

This allows HTTP/2 but will proceed with plain old HTTP/1 if it can’t negotiate the higher protocol version using ALPN.

The old proxy type called just CURLPROXY_HTTPS remains for asking libcurl to stick to HTTP/1 when talking to the proxy. We decided to introduce a new option for this simply because we anticipate that there will be proxies out there that will not work correctly so we cannot throw this feature at users without them asking for it.

command line tool

Using the command line tool, you use a HTTPS proxy exactly like before and then you add this flag to tell the tool that it may try HTTP/2 with the proxy: --proxy-http2.

This also happens to be curl’s 251st command line option.

Shipping and credits

This implementation has been done by Stefan Eissing.

These features have already landed in the master branch and will be part of the pending curl 8.1.0 release, scheduled for release on May 17, 2023.

The dream of auto-detecting proxies

curl, along with every other Internet tool with aspirations, supports proxies. A proxy is a (usually known) middle-man in a network operation; instead of going directly to the remote end server, the client goes via a proxy.

curl has supported proxies since the day it was born.

Which proxy?

Applications that do Internet transfers often would like to automatically be able to do their transfers even when users are trapped in an environment where they use a proxy. To figure out the proxy situation automatically.

Many proxy users have to use their proxy to do Internet transfers.

A library to detect which proxy?

The challenge to figure out the proxy situation is of course even bigger if your applications can run on multiple platforms. macOS, Windows and Linux have completely different ways of storing and accessing the necessary information.

libproxy

libproxy is a well-known library used for exactly this purpose. The first feature request to add support for this into libcurl that I can find, was filed in December 2007 (I also blogged about it) and it has been popping up occasionally over the years since then.

In August 2016, David Woodhouse submitted a patch for curl that implemented support for libproxy (the PR version is here). I was skeptical then primarily because of the lack of tests and docs in libproxy (and that the project seemed totally unresponsive to bug reports). Subsequently we did not merge that pull request.

Almost six years later, in June 2022, Jan Brummer revived David’s previous work and submitted a fresh pull request to add libproxy support in curl. Another try.

The proxy library dream is clearly still very much alive. There are also a fair amount of applications and systems today that are built to use libproxy to figure out the proxy and then tell curl about it.

What is unfortunately also still present, is the unsatisfying state of libproxy. It seems to have changed and improved somewhat since the last time I looked at it (6 years ago), but there several warning signs remaining that make me hesitate.

This is not a dependency I want to encourage curl users to lean and depend upon.

I greatly appreciate the idea of a libproxy but I do not like the (state of the) implementation.

My responsibility

Or rather, the responsibility I think we have as curl maintainers. We ship a product that is used and depended upon by an almost unfathomable amount of users, tools, products and devices.

Our job is to help guide our users so that the entire product, including third party dependencies, become an as safe and secure solution as possible. I am not saying that we can take full responsibility for the security of code outside of our own domain, but I think we should recognize that what we condone and recommend will be used. People read our support for library X as some level of approval.

We should only add support for third party libraries that meets a certain quality threshold. This threshold or bar maybe is (unfortunately) not written down anywhere but is still mostly a soft “gut feeling” based on human reviews of the situation.

What to do

I think the sensible thing to do first, before trying to get curl to use libproxy, is to make sure that there is a library for proxies that we can and want to lean on. That library can be the libproxy of today but it could also be something else.

Some areas in need of attention in libproxy that I recently also highlighted in the curl PR, that would take it closer to meeting the requirements we can ask of a dependency:

  1. Improve the project with docs. We cannot safely rely on a library and its APIs if we don’t know exactly what to expect.
  2. There needs to be at least basic tests that verify its functionality. We cannot fix and improve the library safely if we cannot check that it still works and behaves as expected. Tests is the only reliable way to make sure of this.
  3. Add CI jobs that build the project and run those tests. To help the project better verify that things don’t break along the way.
  4. Consider improving the API. To be fair, it is extremely simple now, but it’s also so simple that that it becomes ineffective and quirky in some use cases.
  5. Allow for external URL parser and URL retrieving etc to avoid blocking, double-parsing and to reuse caches properly. It would be ridiculous for curl to use a library that has its own separate (semi-broken) HTTP transfer and its own (synchronous) name resolving process.

A solid proxy library?

The current libproxy is not even a lot of code, it could perhaps make more sense to just plainly write a new library based on how you would want a library like this to work – and use all the knowledge, magic and experience from libproxy to get the technical parts done correctly. A libproxy-next-generation.

But

This would require that someone really wanted to see this development take place and happen. It would take someone to take lead in the project and push for changes (like perhaps the 5 bullets I listed above). The best would be if someone who would like to use this kind of setup could sponsor a developer half/full time for a while to get a good head start on this.

Based on history, seeing we have had this known use case and this library around for well over a decade and this is the best we have accomplished so far, I am not optimistic that we can turn this ship around. Until then, we simply cannot allow curl to use this dependency.

curl ootw: –proxy-basic

Previous command line options of the week.

--proxy-basic has no short option. This option is closely related to the option --proxy-user, which has as separate blog post.

This option has been provided and supported since curl 7.12.0, released in June 2004.

Proxy

In curl terms, a proxy is an explicit middle man that is used to go through when doing a transfer to or from a server:

curl <=> proxy <=> server

curl supports several different kinds of proxies. This option is for HTTP(S) proxies.

HTTP proxy authentication

Authentication: the process or action of proving or showing something to be true, genuine, or valid.

When it comes to proxies and curl, you typically provide name and password to be allowed to use the service. If the client provides the wrong user or password, the proxy will simply deny the client access with a 407 HTTP response code.

curl supports several different HTTP proxy authentication methods, and the proxy can itself reply and inform the client which methods it supports. With the option of this week, --proxy-basic, you ask curl to do the authentication using the Basic method. “Basic” is indeed very basic but is the actual name of the method. Defined in RFC 7616.

Security

The Basic method sends the user and password in the clear in the HTTP headers – they’re just base64 encoded. This is notoriously insecure.

If the proxy is a HTTP proxy (as compared to a HTTPS proxy), users on your network or on the path between you and your HTTP proxy can see your credentials fly by!

If the proxy is a HTTPS proxy however, the connection to it is protected by TLS and everything is encrypted over the wire and then the credentials that is sent in HTTP are protected from snoopers.

Also note that if you pass in credentials to curl on the command line, they might be readable in the script where you do this from. Or if you do it interactively in a shell prompt, they might be viewable in process listings on the machine – even if curl tries to hide them it isn’t supported everywhere.

Example

Use a proxy with your name and password and ask for the Basic method specifically. Basic is also the default unless anything else is asked for.

curl --proxy-user daniel:password123 --proxy-basic --proxy http://myproxy.example https://example.com

Related

With --proxy you specify the proxy to use, and with --proxy-user you provide the credentials.

Also note that you can of course set and use entirely different credentials and HTTP authentication methods with the remote server even while using Basic with the HTTP(S) proxy.

There are also other authentication methods to selected, with --proxy-anyauth being a very practical one to know about.

Test servers for curl

curl supports some twenty-three protocols (depending on exactly how you count).

In order to properly test and verify curl’s implementations of each of these protocols, we have a test suite. In the test suite we have a set of handcrafted servers that speak the server-side of these protocols. The more used a protocol is, the more important it is to have it thoroughly tested.

We believe in having test servers that are “stupid” and that offer buttons, levers and thresholds for us to control and manipulate how they act and how they respond for testing purposes. The control of what to send should be dictated as much as possible by the test case description file. If we want a server to send back a slightly broken protocol sequence to check how curl supports that, the server must be open for this.

In order to do this with a large degree of freedom and without restrictions, we’ve found that using “real” server software for this purpose is usually not good enough. Testing the broken and bad cases are typically not easily done then. Actual server software tries hard to do the right thing and obey standards and protocols, while we rather don’t want the server to make any decisions by itself at all but just send exactly the bytes we ask it to. Simply put.

Of course we don’t always get what we want and some of these protocols are fairly complicated which offer challenges in sticking to this policy all the way. Then we need to be pragmatic and go with what’s available and what we can make work. Having test cases run against a real server is still better than no test cases at all.

Now SOCKS

“SOCKS is an Internet protocol that exchanges network packets between a client and server through a proxy server. Practically, a SOCKS server proxies TCP connections to an arbitrary IP address, and provides a means for UDP packets to be forwarded.

(according to Wikipedia)

Recently we fixed a bug in how curl sends credentials to a SOCKS5 proxy as it turned out the protocol itself only supports user name and password length of 255 bytes each, while curl normally has no such limits and could pass on credentials with virtually infinite lengths. OK, that was silly and we fixed the bug. Now curl will properly return an error if you try such long credentials with your SOCKS5 proxy.

As a general rule, fixing a bug should mean adding at least one new test case, right? Up to this time we had been testing the curl SOCKS support by firing up an ssh client and having that setup a SOCKS proxy that connects to the other test servers.

curl -> ssh with SOCKS proxy -> test server

Since this setup doesn’t support SOCKS5 authentication, it turned out complicated to add a test case to verify that this bug was actually fixed.

This test problem was fixed by the introduction of a newly written SOCKS proxy server dedicated for the curl test suite (which I simply named socksd). It does the basic SOCKS4 and SOCKS5 protocol logic and also supports a range of commands to control how it behaves and what it allows so that we can now write test cases against this server and ask the server to misbehave or otherwise require fun things so that we can make really sure curl supports those cases as well.

It also has the additional bonus that it works without ssh being present so it will be able to run on more systems and thus the SOCKS code in curl will now be tested more widely than before.

curl -> socksd -> test server

Going forward, we should also be able to create even more SOCKS tests with this and make sure to get even better SOCKS test coverage.

curling over HTTP proxy

Starting in curl 7.55.0 (this commit), curl will no longer try to ask HTTP proxies to perform non-HTTP transfers with GET, except for FTP. For all other protocols, curl now assumes you want to tunnel through the HTTP proxy when you use such a proxy and protocol combination.

Protocols and proxies

curl supports 23 different protocols right now, if we count the S-versions (the TLS based alternatives) as separate protocols.

curl also currently supports seven different proxy types that can be set independently of the protocol.

One type of proxy that curl supports is a so called “HTTP proxy”. The official HTTP standard includes a defined way how to speak to such a proxy and ask it to perform the request on the behalf of the client. curl supports using that over either HTTP/1.1 or HTTP/1.0, where you’d typically only use the latter version if you the first really doesn’t work with your ancient proxy.

HTTP proxy

All that is fine and good. But HTTP proxies were really only defined to handle HTTP, and to some extent HTTPS. When doing plain HTTP transfers over a proxy, the client will send its request to the proxy like this:

GET http://curl.haxx.se/ HTTP/1.1
Host: curl.haxx.se
Accept: */*
User-Agent: curl/7.55.0

… but for HTTPS, which should provide end to end encryption, a client needs to ask the proxy to instead tunnel through the proxy so that it can do TLS all the way, without any middle man, to the server:

CONNECT curl.haxx.se:443 HTTP/1.1
Host: curl.haxx.se:443
User-Agent: curl/7.55.0

When successful, the proxy responds with a “200” which means that the proxy has established a TCP connection to the remote server the client asked it to connect to, and the client can then proceed and do the TLS handshake with that server. When the TLS handshake is completed, a regular GET request is then sent over that established and secure TLS “tunnel” to the server. A GET request that then looks like one that is sent without proxy:

GET / HTTP/1.1
Host: curl.haxx.se
User-Agent: curl/7.55.0
Accept: */*

FTP over HTTP proxy

Things get more complicated when trying to perform transfers over the HTTP proxy using schemes that aren’t HTTP. As already described above, HTTP proxies are basically designed only for doing HTTP over them, but as they have this concept of tunneling through to the remote server it doesn’t have to be limited to just HTTP.

Also, historically, for decades people have deployed HTTP proxies that recognize FTP URLs, and transparently handle them for the client so the client can almost believe it is HTTP while the proxy has to speak FTP to the remote server in the other end and convert it back to HTTP to the client. On such proxies (Squid and Apache both support this mode for example), this sort of request is possible:

GET ftp://ftp.funet.fi/ HTTP/1.1
Host: ftp.funet.fi
User-Agent: curl/7.55.0
Accept: */*

curl knows this and if you ask curl for FTP over an HTTP proxy, it will assume you have one of these proxies. It should be noted that this method of course limits what you can do FTP-wise and for example FTP upload is usually not working and if you ask curl to do FTP upload over and HTTP proxy it will do that with a HTTP PUT.

HTTP proxy tunnel

curl features an option (–proxytunnel) that lets the user forcible tell the client to not assume that the proxy speaks this protocol and instead use the CONNECT method with establishing a tunnel through the proxy to the remote server.

It should of course be noted that very few deployed HTTP proxies in the wild allow clients to CONNECT to whatever port they like. HTTP proxies tend to only allow connecting to port 443 as that is the official HTTPS port, and if you ask for another port it will respond back with a 4xx response code refusing to comply.

Not HTTP not FTP over HTTP proxy

So HTTP, HTTPS and FTP are sent over the HTTP proxy fine. That leaves us with nineteen more protocols. What happens with them when you ask curl to perform them over a HTTP proxy?

Now we have finally reached the change that has just been merged in curl and changes what curl does.

Before 7.55.0

curl would send all protocols as a regular GET to the proxy if asked to use a HTTP proxy without seeing the explicit proxy-tunnel option. This came from how FTP was done and grew from there without many people questioning it. Of course it wouldn’t ever work, but also very few people would actually attempt it because of that.

From 7.55.0

All protocols that aren’t HTTP, HTTPS or FTP will enable the tunnel-through mode automatically when a HTTP proxy is used. No more sending funny GET requests to proxies when they won’t work anyway. Also, it will prevent users from accidentally leak credentials to proxies that were intended for the server, which previously could happen if you omitted the tunnel option with a few authentication setups.

HTTP/2 proxy

Sorry, curl doesn’t support that yet. Patches welcome!

HTTPS proxy with curl

Starting in version 7.52.0 (due to ship December 21, 2016), curl will support HTTPS proxies when doing network transfers, and by doing this it joins the small exclusive club of HTTP user-agents consisting of Firefox, Chrome and not too many others.

Yes you read this correctly. This is different than the good old HTTP proxy.

HTTPS proxy means that the client establishes a TLS connection to the proxy and then communicates over that, which is different to the normal and traditional HTTP proxy approach where the clients speak plain HTTP to the proxy.

Talking HTTPS to your proxy is a privacy improvement as it prevents people from snooping on your proxy communication. Even when using HTTPS over a standard HTTP proxy, there’s typically a setting up phase first that leaks information about where the connection is being made, user credentials and more. Not to mention that an HTTPS proxy makes HTTP traffic “safe” to and from the proxy. HTTPS to the proxy also enables clients to speak HTTP/2 more easily with proxies. (Even though HTTP/2 to the proxy is not yet supported in curl.)

In the case where a client wants to talk HTTPS to a remote server, when using a HTTPS proxy, it sends HTTPS through HTTPS.

Illustrating this concept with images. When using a traditional HTTP proxy, we connect initially to the proxy with HTTP in the clear, and then from then on the HTTPS makes it safe:

HTTP proxyto compare with the HTTPS proxy case where the connection is safe already in the first step:

HTTPS proxyThe access to the proxy is made over network A. That network has traditionally been a corporate network or within a LAN or something but we’re seeing more and more use cases where the proxy is somewhere on the Internet and then “Network A” is really huge. That includes use cases where the proxy for example compresses images or otherwise reduces bandwidth requirements.

Actual HTTPS connections from clients to servers are still done end to end encrypted even in the HTTP proxy case. HTTP traffic to and from the user to the web site however, will still be HTTPS protected to the proxy when a HTTPS proxy is used.

A complicated pull request

This awesome work was provided by Dmitry Kurochkin, Vasy Okhin, and Alex Rousskov. It was merged into master on November 24 in this commit.

Doing this sort of major change in the TLS area in curl code is a massive undertaking, much so because of the fact that curl supports getting built with one out of 11 or 12 different TLS libraries. Several of those are also system-specific so hardly any single developer can even build all these backends on his or hers own machines.

In addition to the TLS backend maze, curl and library also offers a huge amount of different options to control the TLS connection and handling. You can switch on and off features, provide certificates, CA bundles and more. Adding another layer of TLS pretty much doubles the amount of options since now you can tweak everything both in the TLS connection to the proxy as well as the one to the remote peer.

This new feature is supported with the OpenSSL, GnuTLS and NSS backends to start with.

Consider it experimental for now

By all means, go ahead and use it and torture the code and file issues for everything bad you see, but I think we make ourselves a service by considering this new feature set to be a bit experimental in this release.

New options

There’s a whole forest of new command line and libcurl options to control all the various aspects of the new TLS connection this introduces. Since it is a totally separate connection it gets a whole set of options that are basically identical to the server connection but with a –proxy prefix instead. Here’s a list:

  --proxy-cacert 
  --proxy-capath
  --proxy-cert
  --proxy-cert-type
  --proxy-ciphers
  --proxy-crlfile
  --proxy-insecure
  --proxy-key
  --proxy-key-type
  --proxy-pass
  --proxy-ssl-allow-beast
  --proxy-sslv2
  --proxy-sslv3
  --proxy-tlsv1
  --proxy-tlsuser
  --proxy-tlspassword
  --proxy-tlsauthtype

HTTP/2 – 115 days with the RFC

http2Back in March 2015, I asked friends for a forecast on how much HTTP traffic that will be HTTP/2 by the end of the year and we arrived at about 10% as a group. Are we getting there? Remember that RFC 7540 was published on May 15th, so it is still less than 4 months old!

The HTTP/2 implementations page now lists almost 40 reasonably up-to-date implementations.

Browsers

Since then, all browsers used by the vast majority of people have stated that they have or will have HTTP/2 support soon (Firefox, Chrome, Edge, Safari and Opera – including Firefox and Chrome on Android and Safari on iPhone). Even OS support is coming: on iOS 9 the support is coming as we speak and the windows HTTP library is getting HTTP/2 support. The adoption rate so far is not limited by the clients.

Unfortunately, the WGet summer of code project to add HTTP/2 support failed.

(I have high hopes for getting a HTTP/2 enabled curl into Debian soon as they’ve just packaged a new enough nghttp2 library. If things go well, this leads the way for other distros too.)

Servers

Server-side we see Apache’s mod_h2 module ship in a public release soon (possibly in a httpd version 2.4 series release), nginx has this alpha patch I’ve already mentioned and Apache Traffic Server (ATS) has already shipped h2 support for a while and my friends tell me that 6.0 has fixed numerous of their initial bugs. IIS 10 for Windows 10 was released on July 29th 2015 and supports HTTP/2. H2O and nghttp2 have shipped HTTP/2 for a long time by now. I would say that the infrastructure offering is starting to look really good! Around the end of the year it’ll look even better than today.

Of course we’re still seeing HTTP/2 only deployed over HTTPS so HTTP/2 cannot currently get more popular than HTTPS is but there’s also no real reason for a site using HTTPS today to not provide HTTP/2 within the near future. I think there’s a real possibility that we go above 10% use already in 2015 and at least for browser traffic to HTTPS sites we should be able to that almost every single HTTPS site will go HTTP/2 during 2016.

The delayed start of letsencrypt has also delayed more and easier HTTPS adoption.

Still catching up

I’m waiting to see the intermediaries really catch up. Varnish, Squid and HAProxy I believe all are planning to support it to at least some extent, but I’ve not yet seen them release a version with HTTP/2 enabled.

I hear there’s still not a good HTTP/2 story on Android and its stock HTTP library, although you can in fact run libcurl HTTP/2 enabled even there, and I believe there are other stand-alone libs for Android that support HTTP/2 too, like OkHttp for example.

Firefox numbers

Firefox Nightly screenshotThe latest stable Firefox release right now is version 40. It counts 13% HTTP/2 responses among all HTTP responses. Counted as a share of the transactions going over HTTPS, the share is roughly 27%! (Since Firefox 40 counts 47% of the transactions as HTTPS.)

This is certainly showing a share of the high volume sites of course, but there are also several very high volume sites that have not yet gone HTTP/2, like Facebook, Yahoo, Amazon, Wikipedia and more…

The IPv6 comparison

Right, it is not a fair comparison, but… The first IPv6 RFC has been out for almost twenty years and the adoption is right now at about 8.4% globally.

curl and proxy headers

Starting in the next curl release, 7.37.0, the curl tool supports the new command line option –proxy-header. (Completely merged at this commit.)

It works exactly like –header does, but will only include the headers in requests sent to a proxy, while the opposite is true for –header: that will only be sent in requests that will go to the end server. But of course, if you use a HTTP proxy and do a normal GET for example, curl will include headers for both the proxy and the server in the request. The bigger difference is when using CONNECT to a proxy, which then only will use proxy headers.

libcurl

For libcurl, the story is slightly different and more complicated since we’re having things backwards compatible there. The new libcurl still works exactly like the former one by default.

CURLOPT_PROXYHEADER is the new option that is the new proxy header option that should be set up exactly like CURLOPT_HTTPHEADER is

CURLOPT_HEADEROPT is then what an application uses to set how libcurl should use the two header options. Again, by default libcurl will keep working like before and use the CURLOPT_HTTPHEADER list in all HTTP requests. To change that behavior and use the new functionality instead, set CURLOPT_HEADEROPT to CURLHEADER_SEPARATE.

Then, the header lists will be handled as separate. An application can then switch back to the old behavior with a unified header list by using CURLOPT_HEADEROPT set to CURLHEADER_UNIFIED.

What SOCKS is good for

You ever wondered what SOCKS is good for these days?

To help us use the Internet better without having the surrounding be able to watch us as much as otherwise!

There’s basically two good scenarios and use areas for us ordinary people to use SOCKS:

  1. You’re a consultant or you’re doing some kind of work and you are physically connected to a customer’s or a friend’s network. You access the big bad Internet via their proxy or entirely proxy-less using their equipment and cables. This allows the network admin(s) to capture and snoop on your network traffic, be it on purpose or by mistake, as long as you don’t use HTTPS or other secure mechanisms. When surfing the web, it is very easily made to drop out of HTTPS and into HTTP by mistake. Also, even if you HTTPS to the world, the name resolves and more are still done unencrypted and will leak information.
  2. You’re using an open wifi network that isn’t using a secure encryption. Anyone else on that same area can basically capture anything you send and receive.

What you need to set it up? You run

ssh -D 8080 myname@myserver.example.com

… and once you’ve connected, you make sure that you change the network settings of your favourite programs (browsers, IRC clients, mail reader, etc) to reach the Internet using the SOCKS proxy on localhost port 8080. Now you’re done.

Now all your traffic will reach the Internet via your remote server and all traffic between that and your local machine is sent encrypted and secure. This of course requires that you have a server running OpenSSH somewhere, but don’t we all?

If you are behind another proxy in the first place, it gets a little more complicated but still perfectly doable. See my separate SSH through or over proxy document for details.