curl 8 is faster

First: performance is tricky and bechmarking even more so. I will talk some numbers in this post but of course they are what I have measured for my specific tests on my machine. Your numbers for your test cases will be different.

Over the last six months or so, curl has undergone a number of refactors and architectural cleanups. The primary motivations for this have been to improve the HTTP/3 support and to offer HTTP/2 over proxy, but also to generally improve the code, its maintainability and its readability.

A main change is the connection filters I already blogged about, but while working on this a lot of other optimizations and “quirk removals” have been performed. Most of this work done by Stefan Eissing.

So how do all these changes reflect on raw transfer metrics?

Parallelism with TLS

This test case uses a single TCP connection and makes 50 parallel transfers, each being 100 megabytes. The transfer uses HTTP/2 and TLS to a server running on the same host. All done in a single thread in the client.

As a baseline version, I selected curl 7.86.0, which was released in October 2022. The last curl release we shipped before Stefan’s refactor work started. Should work as a suitable before/after comparison.

For this test I built curl and made it use OpenSSL 3.0.8 for TLS and nghttp2 1.52.0 for HTTP/2. The server side is apache2 2.4.57-2, a plain standard installation in my Debian unstable.

python3 tests/http/scorecard.py --httpd h2

On my fairly fast machine, curl on current master completes this test at 2450 MB/sec.

Running the exact same parallel test, built with the same OpenSSL version (and cipher config) and the same nghttp2 version, 7.86.0 transfers those 50 streams at 1040 MB/sec. A 2.36 times speedup!

We still have further ideas on how we can streamline the receiving of data on multiplexed transfers. Future versions might be able to squeeze out even more.

Raw unencrypted HTTP/1

This test simply uses the libcurl multi API to do 5 parallel HTTP/1 transfers – in the same thread. They will then use one connection each and again download from the local apache2 installation. Each file is 100GB so it transfers 500GB and measures how fast it can complete the entire operation.

Running this test program linking with curl 7.86.0 reaches 11320 MB/sec on the same host as before.

Running the exact same program, just pointing out to my 8.1.0-DEV library, the program reports a rather amazing 18104 MB/sec. An 1.6 times improvement.

This difference actually surprised us, because we knew we had some leeway in the HTTP/2 department to “clean up” but I was not aware that we had this much margin to further enhance plain HTTP/1 transfers. We are also not entirely sure what change that made this significant bump possible.

It should probably also be noted that this big gain is in particular when doing them in parallel. If I do a single file transfer with the same program, current libcurl does 3900 MB/sec vs the old at 3700 MB/sec. Clearly the bigger enhancements lie in doing multiple transfers and internal transfer-switching.

Does it matter?

I believe it does. By doing transfers faster, we are more effective and therefor libcurl uses less energy for the same thing than previously. In a world with 20+ billion libcurl installations, every little performance tweak has the opportunity to make a huge dent at scale.

If there are 100 million internet transfers done with curl every day, and we make each transfer take 0.1 second less we save 10 million CPU seconds. That equals 115 days of CPU time saved.

The competition

I have not tried to find out how competitors and alternative Internet transfers libraries perform for the same kind of work loads. Primarily because I don’t think it matters too much, but also because doing fair performance comparisons is really hard and no matter how hard I would try I would be severely biased anyway. I leave that exercise to someone else.

deleting system32\curl.exe

Let me tell you a story about how Windows users are deleting files from their installation and as a consequence end up in tears.

Background

The real and actual curl tool has been shipped as part of Windows 10 and Windows 11 for many years already. It is called curl.exe and is located in the System32 directory.

Microsoft ships this bundled with its Operating system. They get the code from the curl project but Microsoft builds, tests, ships and are in all ways responsible for their operating system.

NVD inflation

As I have blogged about separately earlier, the next brick in the creation of this story is the fact that National Vulnerability Database deliberately inflates the severity levels of security flaws in its vast database. They believe scaremongering serves their audience.

In one particular case, CVE-2022-43552 was reported by the curl project in December 2022. It is a use-after-free flaw that we determined to be severity low and not higher mostly because of the very limited time window you need to make something happen for it to be exploited or abused. NVD set it to medium which admittedly was just one notch higher (this time).

This is not helpful.

“Security scanners”

Lots of Windows users everywhere runs security scanners on their systems with regular intervals in order to verify that their systems are fine. At some point after December 21, 2022, some of these scanners started to detect installations of curl that included the above mentioned CVE. Nessus apparently started this on February 23.

This is not helpful.

Panic

Lots of Windows users everywhere then started to panic when these security applications warned them about their vulnerable curl.exe. Many Windows users are even contractually “forced” to fix (all) such security warnings within a certain time period or risk bad consequences and penalties.

How do you fix this?

I have been asked numerous times about how to fix this problem. I have stressed at every opportunity that it is a horrible idea to remove the system curl or to replace it with another executable. It is very easy to download a fresh curl install for Windows from the curl site – but we still strongly discourage everyone from replacing system files.

But of course, far from everyone asked us. A seemingly large enough crowd has proceeded and done exactly what we would stress they should not: they deleted or replaced their C:\Windows\System32\curl.exe.

The real fix is of course to let Microsoft ship an update and make sure to update then. The exact update that upgrades curl to version 8.0.1 is called KB5025221 and shipped on April 11. (And yes, this is the first time you get the very latest curl release shipped in a Windows update)

The people who deleted or replaced the curl executable noticed that they cannot upgrade because the Windows update procedure detects that the Windows install has been tampered with and it refuses to continue.

I do not know how to restore this to a state that Windows update is happy with. Presumably if you bring back curl.exe to the exact state from before it could work, but I do not know exactly what tricks people have tested and ruled out.

Bad advice

I have been pointed to responses on the Microsoft site answers.microsoft.com done by “helpful volunteers” that specifically recommend removing the curl.exe executable as a fix.

This is not helpful.

I don’t want to help spreading that idea so I will not link to any such post. I have reported this to Microsoft contacts and I hope they can maybe edit or comment those posts soon.

We are not responsible

I just want to emphasize that if you install and run Windows, your friendly provider is Microsoft. You need to contact Microsoft for support and help with Windows related issues. The curl.exe you have in System32 is only provided indirectly by the curl project and we cannot fix this problem for you. We in fact fixed the problem in the source code already back in December 2022.

If you have removed curl.exe or otherwise tampered with your Windows installation, the curl project cannot help you.

Credits

Image by Alexa from Pixabay

Discussions

Hacker news

Google Open Source Peer Bonus award 2023

I am honored to yet again receive a peer bonus award from Google. This is a Google program for which persons like me can be nominated by Googlers and as a result receive grants.

I previously received such an award in 2020.

Update

A few people noticed and have commented on the fact that this letter is signed by Chris DiBona and dated April 19th 2023, while sources say he was let go from Google back in January. Which means one or two of those things are wrong.

curl speaks HTTP/2 with proxy

In September 2013 we merged the first code into curl that made it capable of using HTTP/2: HTTP version 2.

This version of HTTP changed a lot of previous presumptions when it comes to transfers, which introduced quite a few challenges to HTTP stack authors all of the world. One of them being that with version 2 there can be more than one transfer using the same connection where as up to that point we had always just had one transfer per connection.

In May 2015 the spec was published.

2023

Now almost eight years since the RFC was published, HTTP/2 is the version seen most frequently in browser responses if we ask the Firefox telemetry data. 44.4% of the responses are HTTP/2.

curl

This year, the curl project has been sponsored by the Sovereign Tech Fund, and one of the projects this funding has covered is what I am here to talk about:

Speaking HTTP/2 with a proxy. More specifically with what is commonly referred to as a “forward proxy.”

Many organizations and companies have setups like the one illustrated in this image below. The user on the left is inside the organization network A and the website they want to reach is on the outside on network B.

HTTP/2 to the proxy

When this is an HTTPS proxy, meaning that the communication to and with the proxy is itself protected with TLS, curl and libcurl are now capable of negotiating HTTP/2 with it.

It might not seem like a big deal to most people, and maybe it is not, but the introduction of this feature comes after some rather heavy lifting and internal refactors over the recent months that have enabled the rearrangement of networking components for this purpose.

Enable

To enable this feature in your libcurl-using application, you first need to make sure you use libcurl 8.1.0 when it ships in mid May and then you need to set the proxy type to CURLPROXY_HTTPS2.

In plain C code it could look like this:

curl_easy_setopt(handle,
                 CURLOPT_PROXYTYPE,
                 CURLPROXY_HTTPS2);
curl_easy_setopt(handle,
                 CURLOPT_PROXY,
                 "https://hostname");

This allows HTTP/2 but will proceed with plain old HTTP/1 if it can’t negotiate the higher protocol version using ALPN.

The old proxy type called just CURLPROXY_HTTPS remains for asking libcurl to stick to HTTP/1 when talking to the proxy. We decided to introduce a new option for this simply because we anticipate that there will be proxies out there that will not work correctly so we cannot throw this feature at users without them asking for it.

command line tool

Using the command line tool, you use a HTTPS proxy exactly like before and then you add this flag to tell the tool that it may try HTTP/2 with the proxy: --proxy-http2.

This also happens to be curl’s 251st command line option.

Shipping and credits

This implementation has been done by Stefan Eissing.

These features have already landed in the master branch and will be part of the pending curl 8.1.0 release, scheduled for release on May 17, 2023.

trurl manipulates URLs

trurl is a tool in a similar spirit of tr but for URLs. Here, tr stands for translate or transpose.

trurl is a small command line tool that parses and manipulates URLs, designed to help shell script authors everywhere.

URLs are tricky to parse and there are numerous security problems in software because of this. trurl wants to help soften this problem by taking away the need for script and command line authors everywhere to re-invent the wheel over and over.

trurl uses libcurl’s URL parser and will thus parse and understand URLs exactly the same as curl the command line tool does – making it the perfect companion tool.

I created trurl on March 31, 2023.

Some command line examples

Given just a URL (even without scheme), it will parse it and output a normalized version:

$ trurl ex%61mple.com/
http://example.com/

The above command will guess on a http:// scheme when none was provided. The guess has basic heuristics, like for example FTP server host names often starts with ftp:

$ trurl ftp.ex%61mple.com/
ftp://ftp.example.com/

A user can output selected components of a provided URL. Like if you only want to extract the path or the query components from it.:

$ trurl https://curl.se/?search=foobar --get '{path}'
/

Or both (with extra text intermixed):

$ trurl https://curl.se/?search=foobar --get 'p: {path} q: {query}'
p: / q: search=foobar

A user can create a URL by providing the different components one by one and trurl outputs the URL:

$ trurl --set scheme=https --set host=fool.wrong
https://fool.wrong/

Reset a specific previously populated component by setting it to nothing. Like if you want to clear the user component:

$ trurl https://daniel@curl.se/--set user=
https://curl.se/

trurl tells you the full new URL when the first URL is redirected to a second relative URL:

$ trurl https://curl.se/we/are/here.html --redirect "../next.html"
https://curl.se/we/next.html

trurl provides easy-to-use options for adding new segments to a URL’s path and query components. Not always easily done in shell scripts:

$ trurl https://curl.se/we/are --append path=index.html
https://curl.se/we/are/index.html
$ trurl https://curl.se?info=yes --append query=user=loggedin
https://curl.se/?info=yes&user=loggedin

trurl can work on a single URL or any amount of URLs passed on to it. The modifications and extractions are then performed on them all, one by one.

$ trurl https://curl.se localhost example.com 
https://curl.se/
http://localhost/
http://example.com/

trurl can read URLs to work on off a file or from stdin, and works on them in a streaming fashion suitable for filters etc.

$ cat many-urls.yxy | trurl --url-file -
...

More or different

trurl was born just a few days ago, this is what we have made it do so far. There is a high probability that it will change further going forward before it settles on exactly how things ideally should work.

It also means that we are extra open for and welcoming to feedback, ideas and pull-requests. With some luck, this could become a new everyday tool for all of us.

Tell us on GitHub!