Category Archives: Open Source

Open Source, Free Software, and similar

20,000 github stars

In September 2018 I celebrated 10,000 stars, up from 5,000 back in May 2017. We made 1,000 stars on August 12, 2014.

Today I’m cheering for the 20,000 stars curl has received on GitHub.

It is worth repeating that this is just a number without any particular meaning or importance. It just means 20,000 GitHub users clicked the star symbol for the curl project over at curl/curl.

At exactly 08:15:23 UTC today we reached this milestone. Checked with a curl command line like this:

$ curl -s https://api.github.com/repos/curl/curl | jq '.stargazers_count'
20000

(By the time I get around to finalize this post, the count has already gone up to 20087…)

To celebrate this occasion, I decided I was worth a beer and this time I went with a hand-written note. The beer was a Swedish hazy IPA called Amazing Haze from the brewery Stigbergets. One of my current favorites.

Photos from previous GitHub-star celebrations :

Where is HTTP/3 right now?

tldr: the level of HTTP/3 support in servers is surprisingly high.

The specs

The specifications are all done. They’re now waiting in queues to get their final edits and approvals before they will get assigned RFC numbers and get published as such – they will not change any further. That’s a set of RFCs (six I believe) for various aspects of this new stack. The HTTP/3 spec is just one of those. Remember: HTTP/3 is the application protocol done over the new transport QUIC. (See http3 explained for a high-level description.)

The HTTP/3 spec was written to refer to, and thus depend on, two other HTTP specs that are in the works: httpbis-cache and https-semantics. Those two are mostly clarifications and cleanups of older HTTP specs, but this forces the HTTP/3 spec to have to get published after the other two, which might introduce a small delay compared to the other QUIC documents.

The working group has started to take on work on new specifications for extensions and improvements beyond QUIC version 1.

HTTP/3 Usage

In early April 2021, the usage of QUIC and HTTP/3 in the world is measured by a few different companies.

QUIC support

netray.io scans the IPv4 address space weekly and checks how many hosts that speak QUIC. Their latest scan found 2.1 million such hosts.

Arguably, the netray number doesn’t say much. Those two million hosts could be very well used or barely used machines.

HTTP/3 by w3techs

w3techs.com has been in the game of scanning web sites for stats purposes for a long time. They scan the top ten million sites and count how large share that runs/supports what technologies and they also check for HTTP/3. In their data they call the old Google QUIC for just “QUIC” which is confusing but that should be seen as the precursor to HTTP/3.

What stands out to me in this data except that the HTTP/3 usage seems very high: the top one-million sites are claimed to have a higher share of HTTP/3 support (16.4%) than the top one-thousand (11.9%)! That’s the reversed for HTTP/2 and not how stats like this tend to look.

It has been suggested that the growth starting at Feb 2021 might be explained by Cloudflare’s enabling of HTTP/3 for users also in their free plan.

HTTP/3 by Cloudflare

On radar.cloudflare.com we can see Cloudflare’s view of a lot of Internet and protocol trends over the world.

The last 30 days according to radar.cloudflare.com

This HTTP/3 number is significantly lower than w3techs’. Presumably because of the differences in how they measure.

Clients

The browsers

All the major browsers have HTTP/3 implementations and most of them allow you to manually enable it if it isn’t already done so. Chrome and Edge have it enabled by default and Firefox will so very soon. The caniuse.com site shows it like this (updated on April 4):

(Earlier versions of this blog post showed the previous and inaccurate data from caniuse.com. Not anymore.)

curl

curl supports HTTP/3 since a while back, but you need to explicitly enable it at build-time. It needs to use third party libraries for the HTTP/3 layer and it needs a QUIC capable TLS library. The QUIC/h3 libraries are still beta versions. See below for the TLS library situation.

curl’s HTTP/3 support is not even complete. There are still unsupported areas and it’s not considered stable yet.

Other clients

Facebook has previously talked about how they use HTTP/3 in their app, and presumably others do as well. There are of course also other implementations available.

TLS libraries

curl supports 14 different TLS libraries at this time. Two of them have QUIC support landed: BoringSSL and GnuTLS. And a third would be the quictls OpenSSL fork. (There are also a few other smaller TLS libraries that support QUIC.)

OpenSSL

The by far most popular TLS library to use with curl, OpenSSL, has postponed their QUIC work:

“It is our expectation that once the 3.0 release is done, QUIC will become a significant focus of our effort.”

At the same time they have delayed the OpenSSL 3.0 release significantly. Their release schedule page still today speaks of a planned release of 3.0.0 in “early Q4 2020”. That plan expects a few months from the beta to final release and we have not yet seen a beta release, only alphas.

Realistically, this makes QUIC in OpenSSL many months off until it can appear even in a first alpha. Maybe even 2022 material?

BoringSSL

The Google powered OpenSSL fork BoringSSL has supported QUIC for a long time and provides the OpenSSL API, but they don’t do releases and mostly focus on getting a library done for Google. People outside the company are generally reluctant to use and depend on this library for those reasons.

The quiche QUIC/h3 library from Cloudflare uses BoringSSL and curl can be built to use quiche (as well as BoringSSL).

quictls

Microsoft and Akamai have made a fork of OpenSSL available that is based on OpenSSL 1.1.1 and has the QUIC pull-request applied in order to offer a QUIC capable OpenSSL flavor to the world before the official OpenSSL gets their act together. This fork is called quictls. This should be compatible with OpenSSL in all other regards and provide QUIC with an API that is similar to BoringSSL’s.

The ngtcp2 QUIC library uses quictls. curl can be built to use ngtcp2 as well as with quictls,

Is HTTP/3 faster?

I realize I can’t blog about this topic without at least touching this question. The main reason for adding support for HTTP/3 on your site is probably that it makes it faster for users, so does it?

According to cloudflare’s tests, it does, but the difference is not huge.

We’ve seen other numbers say h3 is faster shown before but it’s hard to find up-to-date performance measurements published for the current version of HTTP/3 vs HTTP/2 in real world scenarios. Partly of course because people have hesitated to compare before there are proper implementations to compare with, and not just development versions not really made and tweaked to perform optimally.

I think there are reasons to expect h3 to be faster in several situations, but for people with high bandwidth low latency connections in the western world, maybe the difference won’t be noticeable?

Future

I’ve previously shown the slide below to illustrate what needs to be done for curl to ship with HTTP/3 support enabled in distros and “widely” and I think the same works for a lot of other projects and clients who don’t control their TLS implementation and don’t write their own QUIC/h3 layer code.

This house of cards of h3 is slowly getting some stable components, but there are still too many moving parts for most of us to ship.

I assume that the rest of the browsers will also enable HTTP/3 by default soon, and the specs will be released not too long into the future. That will make HTTP/3 traffic on the web increase significantly.

The QUIC and h3 libraries will ship their first non-beta versions once the specs are out.

The TLS library situation will continue to hamper wider adoption among non-browsers and smaller players.

The big players already deploy HTTP/3.

Updates

I’ve updated this post after the initial publication, and the biggest corrections are in the Chrome/Edge details. Thanks to immediate feedback from Eric Lawrence. Remaining errors are still all mine! Thanks also to Barry Pollard who filed the PR to update the previously flawed caniuse.com data.

curl 7.76.0 adds rustls

I’m happy to announce that we yet again completed a full eight week release cycle and as customary, we end it with a fresh release. Enjoy!

Release presentation

Numbers

the 198th release
6 changes
56 days (total: 8,412)

130 bug fixes (total: 6,812)
226 commits (total: 26,978)
0 new public libcurl function (total: 85)
3 new curl_easy_setopt() option (total: 288)

3 new curl command line option (total: 240)
58 contributors, 34 new (total: 2,356)
24 authors, 11 new (total: 871)
2 security fixes (total: 100)
800 USD paid in Bug Bounties (total: 5,200 USD)

Security

Automatic referer leaks

CVE-2021-22876 is the first curl CVE of 2021.

libcurl did not strip off user credentials from the URL when automatically populating the Referer: HTTP request header field in outgoing HTTP requests, and therefore risks leaking sensitive data to the server that is the target of the second HTTP request.

libcurl automatically sets the Referer: HTTP request header field in outgoing HTTP requests if the CURLOPT_AUTOREFERER option is set. With the curl tool, it is enabled with --referer ";auto".

Rewarded with 800 USD

TLS 1.3 session ticket proxy host mixup

CVE-2021-22890 is a flaw in curl’s OpenSSL backend that allows a malicious HTTPS proxy to trick curl with session tickets and subsequently allow the proxy to MITM the remote server. The problem only exists with OpenSSL and it needs to speak TLS 1.3 with the HTTPS proxy – and the client must accept the proxy’s certificate, which has to be especially crafted for the purpose.

Note that an HTTPS proxy is different than the mode comon HTTP proxy.

The reporter declined offered reward money.

Changes

We list 6 “changes” this time around. They are…

support multiple -b parameters

The command line option for setting cookies can now be used multiple times on the command line to specify multiple cookies. Either by setting cookies by name or by providing a name to a file to read cookie data from.

add –fail-with-body

The command line tool has had the --fail option for a very long time. This new option is very similar, but with a significant difference: this new option saves the response body first even if it returns an error due to HTTP response code that is 400 or larger.

add DoH options to disable TLS verification

When telling curl to use DoH to resolve host names, you can now specify that curl should ignore the TLS certificate verification for the DoH server only. Independently of how it treats other TLS servers that might be involved in the transfer.

read and store the HTTP referer header

This is done with the new CURLINFO_REFERER libcurl option and with the command line tool, --write-out '%{referer}‘.

support SCRAM-SHA-1 and SCRAM-SHA-256 for mail auth

For SASL authentication done with mail-using protocols such as IMAP and SMTP.

A rustls backend

A new optional TLS backend. This is provided via crustls, a C API for the rustls TLS library.

Some Interesting bug-fixes

Again we’ve logged over a hundred fixes in a release, so here goes some of my favorite corrections we did this time:

curl: set CURLOPT_NEW_FILE_PERMS if requested

Due to a silly mistake in the previous release, the new --create-file-mode didn’t actually work because it didn’t set the permissions with libcurl properly – but now it does.

share user’s resolve list with DOH handles

When resolving host names with DoH, the transfers done for that purpose now “inherit” the same --resolve info as used for the normal transfer, which I guess most users already just presumed it did…

bump the max HTTP request size to 1MB

Virtually all internal buffers have length restrictions for security and the maximum size we allowed for a single HTTP request was previously 128 KB. A user with a use-case sending a single 300 KB header turned up and now we allow HTTP requests to be up to 1 MB! I can’t recommend doing it, but now at least curl supports it.

allow SIZE to fail when doing (resumed) FTP upload

In a recent change I made SIZE failures get treated as “file not found” error, but it introduced this regression for resumed uploads because when resuming a file upload and there’s nothing uploaded previously, SIZE is then expected to fail and it is fine.

fix memory leak in ftp_done

The torture tests scored another victory when it proved that when the connection failed at just the correct moment after an FTP transfer is complete, curl could skip a free() and leak memory.

fail if HTTP/2 connection is terminated without END_STREAM

When a HTTP/2 connection is (prematurely) terminated, streams over that connection could return “closed” internally without noticing the premature part. As there was no previous END_STREAM message received for the stream(s), curl should consider that an error and now it does.

don’t set KEEP_SEND when there’s no more HTTP/2 data to be sent

A rare race condition in the HTTP/2 code could make libcurl remain expecting to send data when in reality it had already delivered the last chunk.

With HTTP, use credentials from transfer, not connection

Another cleanup in the code that had the potential to get wrong in the future and mostly worked right now due to lucky circumstances. In HTTP each request done can use its own set of credentials, so it is vital to not use “connection bound” credentials but rather the “transfer oriented” set. That way streams and requests using different credentials will work fine over a single connection even when future changes alter code paths.

lib: remove ‘conn->data’ completely

A rather large internal refactor that shouldn’t be visible on the outside to anyone: transfer objects now link to the corresponding connection object like before, but now connection objects do not link to any transfer object. Many transfers can share the same connection.

adapt to OpenSSL v3’s new const for a few API calls

The seemingly never-ending work to make a version 3 of OpenSSL keeps changing the API and curl is adapting accordingly so that we are prepared and well functioning with this version once it ships “for real” in the future.

Close the connection when downgrading from HTTP/2 to HTTP/1

Otherwise libcurl is likely to reuse the same (wrong) connection again in the next transfer attempt since the connection reuse logic doesn’t take downgrades into account!

Cap initial HTTP body data amount during send speed limiting

The rate limiting logic was previously not correctly applied on the initial body chunk that libcurl sends. Like if you’d tell libcurl to send 50K data with CURLOPT_POSTFIELDS and limit the sending rate to 5K/second.

Celebratory drink

I’ll go for an extra fine cup of coffee today after I posted this. I think I’m worth it. I bet you are too. Go ahead and join me: Hooray for another release!

HOWTO backdoor curl

I’ve previously blogged about the possible backdoor threat to curl. This post might be a little repeat but also a refresh and renewed take on the subject several years later, in the shadow of the recent PHP backdoor commits of March 28, 2021. Nowadays, “supply chain attacks” is a hot topic.

Since you didn’t read that PHP link: an unknown project outsider managed to push a commit into the PHP master source code repository with a change (made to look as if done by two project regulars) that obviously inserted a backdoor that could execute custom code when a client tickled a modified server the right way.

Partial screenshot of a diff of the offending commit in question

The commits were apparently detected very quickly. I haven’t seen any proper analysis on exactly how they were performed, but to me that’s not the ultimate question. I rather talk and think about this threat in a curl perspective.

PHP is extremely widely used and so is curl, but where PHP is (mostly) server-side running code, curl is client-side.

How to get malicious code into curl

I’d like to think about this problem from an attacker’s point of view. There are but two things an attacker need to do to get a backdoor in and a third adjacent step that needs to happen:

  1. Make a backdoor change that is hard to detect and appears innocent to a casual observer, while actually still being able to do its “job”
  2. Get that changed landed in the master source code repository branch
  3. The code needs to be included in a curl release that is used by the victim/target

These are not simple steps. The third step, getting into a release, is not strictly always necessary because there are sometimes people and organizations that run code off the bleeding edge master repository (against our advice I should add).

Writing the backdoor code

As was seen in this PHP attack, it failed rather miserably at step 1, making the attack code look innocuous, although we can suspect that maybe that was done so on purpose. In 2010 there was a lengthy discussion about an alleged backdoor in OpenBSD’s IPSEC stack that presumably had been in place for years and even while that particular backdoor was never proven to be real, the idea that it can be done certainly is.

Every time we fix a security problem in curl there’s that latent nagging question in the back of our collective minds: was this flaw placed here deliberately? Historically, we’ve not seen any such attacks against curl. I can tell this with a high degree of certainty since almost all of the existing security problems detected and reported in curl was done by me…!

The best attack code would probably do something minor that would have a huge impact in a special context for which the attacker has planned to use it. I mean minor as in doing a NULL-pointer dereference or doing a use-after-free or something. This, because doing a full-fledged generic stack based buffer overflow is much harder to land undetected. Maybe going with a single-byte overwrite outside of a malloc could be the way, like it was back in 2016 when such a flaw in c-ares was used as the first step in a multi-flaw exploit sequence to execute remote code as root on ChromeOS…

Ideally, the commit should also include an actual bug-fix that would be the public facing motivation for it.

Get that code landed in the repo

Okay let’s imagine that you have produced code that actually is a useful bug-fix or feature addition but with an added evil twist, and you want that landed in curl. I can imagine several different theoretical ways to do it:

  1. A normal pull-request and land using the normal means
  2. Tricking or forcing a user with push rights to circumvent the review process
  3. Use a weakness somewhere and land the code directly without involving existing curl team members

The Pull Request method

I’ve never seen this attempted. Submit the pull-request to the project the usual means and argue that the commit fixes a bug – which could be true.

This makes the backdoor patch to have to go through all testing and reviews with flying colors to get merged. I’m not saying this is impossible, but I will claim that it is very hard and also a very big gamble by an attacker. Presumably it is a fairly big job just to get the code for this attack to work, so maybe going with a less risky way to land the code is then preferable? But then which way is likely to have the most reliable outcome?

The tricking a user method

Social engineering is very powerful. I can’t claim that our team is immune to that so maybe there’s a way an outsider could sneak in behind our imaginary personal walls and make us take a shortcut for a made up reason that then would circumvent the project’s review process.

We can even include more forced “convincing” such as direct threats against persons or their families: “push this code or else…”. This way of course cannot be protected against using 2fa, better passwords or things like that. Forcing a users to do it is also likely to eventually get known and then immediately make the commit reverted.

Tricking a user doesn’t make the commit avoid testing and scrutinizing after the fact. When the code has landed, it will be scanned and tested in a hundred CI jobs that include a handful of static code analyzers and memory/address sanitizers.

Tricking a user could land the code, but it can’t make it stick unless the code is written as the perfect stealth change. It really needs to be that good attack code to work out. Additionally: circumventing the regular pull-request + review procedure is unusual so I believe it is likely that such commit will be reviewed and commented on after the fact, and there might then be questions about it and even likely follow-up actions.

The exploiting a weakness method

A weakness in this context could be a security problem in the hosting software or even a rogue admin in the company that hosts the main source code git repo. Something that allows code to get pushed into the code repository without it being the result of one of the existing team members. This seems to be the method that the PHP attack was done through.

This is a hard method as well. Not only does it shortcut reviews, it is also done in the name of someone on the team who knows for sure that they didn’t do the commit, and again, the commit will be tested and poked at anyway.

For all of us who sign our git commits, detecting such a forged commit is easy and quickly done. In the curl project we don’t have mandatory signed commits so the lack of a signature won’t actually block it. And who knows, a weakness somewhere could even possibly find a way to bypass such a requirement.

The skip-git-altogether methods

As I’ve described above, it is really hard even for a skilled developer to write a backdoor and have that landed in the curl git repository and stick there for longer than just a very brief period.

If the attacker instead can just sneak the code directly into a release archive then it won’t appear in git, it won’t get tested and it won’t get easily noticed by team members!

curl release tarballs are made by me, locally on my machine. After I’ve built the tarballs I sign them with my GPG key and upload them to the curl.se origin server for the world to download. (Web users don’t actually hit my server when downloading curl. The user visible web site and downloads are hosted by Fastly servers.)

An attacker that would infect my release scripts (which btw are also in the git repository) or do something to my machine could get something into the tarball and then have me sign it and then create the “perfect backdoor” that isn’t detectable in git and requires someone to diff the release with git in order to detect – which usually isn’t done by anyone that I know of.

But such an attacker would not only have to breach my development machine, such an infection of the release scripts would be awfully hard to pull through. Not impossible of course. I of course do my best to maintain proper login sanitation, updated operating systems and use of safe passwords and encrypted communications everywhere. But I’m also a human so I’m bound to do occasional mistakes.

Another way could be for the attacker to breach the origin download server and replace one of the tarballs there with an infected version, and hope that people skip verifying the signature when they download it or otherwise notice that the tarball has been modified. I do my best at maintaining server security to keep that risk to a minimum. Most people download the latest release, and then it’s enough if a subset checks the signature for the attack to get revealed sooner rather than later.

The further-down-the-chain method

As an attacker, get into the supply chain somewhere else: find a weaker link in the chain between the curl release tarball and the target system for your attack . If you can trick or social engineer maybe someone else along the way to get your evil curl tarball to get used there instead of the actual upstream tarball, that might be easier and give you more bang for your buck. Perhaps you target your particular distribution’s or Operating System’s release engineers and pretend to be from the curl project, make up a story and send over a tarball to help them out…

Fake a security advisory and send out a bad patch directly to someone you know build their own curl/libcurl binaries?

Better ways?

If you can think of other/better ways to get malicious code via curl code into a victim’s machine, let me know! If you find a security problem, we will reward you for it!

Similarly, if you can think of ways or practices on how we can improve the project to further increase our security I’ll be very interested. It is an ever-moving process.

Dependencies

Added after the initial post. Lots of people have mentioned that curl can get built with many dependencies and maybe one of those would be an easier or better target. Maybe they are, but they are products of their own individual projects and an attack on those projects/products would not be an attack on curl or backdoor in curl by my way of looking at it.

In the curl project we ship the source code for curl and libcurl and the users, the ones that builds the binaries from that source code will get the dependencies too.

Credits

Image by SeppH from Pixabay

Github steel

I honestly don’t know what particular thing I did to get this, but GitHub gave me a 3D-printed steel version of my 2020 GitHub contribution “matrix”. You know that thing on your GitHub profile that normally looks something like this:

The gift package included this friendly note:

Hi @bagder,

As we welcome 2021, we want to thank and congratulate you on what you brought to 2020. Amidst the year’s challenges, you found time to continue giving back and contributing to the community.

Your hard work, care, and attention haven’t gone unnoticed.

Enclosed is your 2020 GitHub contribution graph, 3D printed in steel. You can also view it by pointing your browser to https://github.co/skyline. It tells a personal story only you can truly interpret.

Please accept this small gift as a token of appreciation on behalf of all of us here at GitHub, and everyone who benefits from your work.

Thank you and all the best for the year ahead!

With <3, from GitHub

I think I’ll put it under one of my screens here on my desk for now. The size is 145 mm x 30 mm x 30 mm. 438 grams.

Thanks GitHub!

Update: the print is done by shapeways.com

curl is 23 years old today

curl’s official birthday was March 20, 1998. That was the day the first ever tarball was made available that could build a tool named curl. I put it together and I called it curl 4.0 since I kept the version numbering from the previous names I had used for the tool. Or rather, I bumped it up from 3.12 which was the last version I used under the previous name: urlget.

Of course curl wasn’t created out of thin air exactly that day. The history can be traced back a little over a year earlier: On November 11, 1996 there was a tool named httpget released. It was developed by Rafael Sagula and this was the project I found and started contributing to. httpget 0.1 was less than 300 lines of a single C file. (The earliest code I still have source to is httpget 1.3, found here.)

I’ve said it many times before but I started poking on this project because I wanted to have a small tool to download currency rates regularly from a web site site so that I could offer them in my IRC bot’s currency exchange.

Small and quick decisions done back then, that would later make a serious impact on and shape my life. curl has been one of my main hobbies ever since – and of course also a full-time job since a few years back now.

On that exact same November day in 1996, the first Wget release shipped (1.4.0). That project also existed under another name prior to its release – and remembering back I don’t think I knew about it and I went with httpget for my task. Possibly I found it and dismissed it because of its size. The Wget 1.4.0 tarball was 171 KB.

After a short while, I took over as maintainer of httpget and expanded its functionality further. It subsequently was renamed to urlget when I added support for Gopher and FTP (driven by the fact that I found currency rates hosted on such servers as well). In the spring of 1998 I added support for FTP upload as well and the name of the tool was again misleading and I needed to rename it once more.

Naming things is really hard. I wanted a short word in classic Unix style. I didn’t spend an awful lot of time, as I thought of a fun word pretty soon. The tool works on URLs and it is an Internet client-side tool. ‘c’ for client and URL made ‘cURL’ seem pretty apt and fun. And short. Very “unixy”.

I already then wanted curl to be a citizen in the Unix tradition of using pipes and stdout etc. I wanted curl to work mostly like the cat command but for URLs so it would by default send the URL to stdout in the terminal. Just like cat does. It would then let us “see” the contents of that URL. The letter C is pronounced as see, so “see URL” also worked. In my pun-liking mind I didn’t need more. (but I still pronounce it “kurl”!)

This is the original logo, created in 1998 by Henrik Hellerstedt

I packaged curl 4.0 and made it available to the world on that Friday. Then at 2,200 lines of code. In the curl 4.8 release that I did a few months later, the THANKS file mentions 7 contributors who had helped out. It took us almost seven years to reach a hundred contributors. Today, that file lists over 2,300 names and we add a few hundred new entries every year. This is not a solo project!

Nothing particular happened

curl was not a massive success or hit. A few people found it and 14 days after that first release I uploaded 4.1 with a few bug-fixes and a multi-decade tradition had started: keep on shipping updates with bug-fixes. “ship early and often” is a mantra we’ve stuck with.

Later in 1998 when we had done more than 15 releases, the web page featured this excellent statement:

Screenshot from the curl web site in December 1998

300 downloads!

I never had any world-conquering ideas or blue sky visions for the project and tool. I just wanted it to do Internet transfers good, fast and reliably and that’s what I worked on making reality.

To better provide good Internet transfers to the world, we introduced the library libcurl, shipped for the first time in the summer of 2000 and that then enabled the project to take off at another level. libcurl has over time developed into a de-facto internet transfer API.

Today, at its 23rd birthday that is still mostly how I view the main focus of my work on curl and what I’m here to do. I believe that if I’ve managed to reach some level of success with curl over time, it is primarily because of one particular quality. A single word:

Persistence

We hold out. We endure and keep polishing. We’re here for the long run. It took me two years (counting from the precursors) to reach 300 downloads. It took another ten or so until it was really widely available and used.

In 2008, the curl website served about 100 GB data every month. This months it serves 15,600 GB – which interestingly is 156 times more data over 156 months! But most users of course never download anything from our site but they get curl from their distro or operating system provider.

curl was adopted in Red Hat Linux in late 1998, became a Debian package in May 1999, shipped in Mac OS X 10.1 in August 2001. Today, it is also shipped by default in Windows 10 and in iOS and Android devices. Not to mention the game consoles, Nintendo Switch, Xbox and Sony PS5.

Amusingly, libcurl is used by the two major mobile OSes but not provided as an API by them, so lots of apps, including many extremely large volume apps bundle their own libcurl build: YouTube, Skype, Instagram, Spotify, Google Photos, Netflix etc. Meaning that most smartphone users today have many separate curl installations in their phones.

Further, libcurl is used by some of the most played computer games of all times: GTA V, Fortnite, PUBG mobile, Red Dead Redemption 2 etc.

libcurl powers media players and set-top boxes such as Roku, Apple TV by maybe half a billion TVs.

curl and libcurl ships in virtually every Internet server and is the default transfer engine in PHP, which is found in almost 80% of the world’s almost two billion websites.

Cars are Internet-connected now. libcurl is used in virtually every modern car these days to transfer data to and from the vehicles.

Then add media players, kitchen and medical devices, printers, smart watches and lots of “smart” IoT things. Practically speaking, just about every Internet-connected device in existence runs curl.

I’m convinced I’m not exaggerating when I claim that curl exists in over ten billion installations world-wide

Alone and strong

A few times over the years I’ve tried to see if curl could join an umbrella organization, but none has accepted us and I think it has all been for the best in the end. We are completely alone and independent, from organizations and companies. We do exactly as we please and we’re not following anyone else’s rules. Over the last few years, sponsorships and donations have really accelerated and we’re in a good position to pay large rewards for bug-bounties and more.

The fact that I and wolfSSL offer commercial curl support has only made curl stronger I believe: it lets me spend even more time working on curl and it makes more companies feel safer with going with curl, which in the end makes it better for all of us.

Those 300 lines of code in late 1996 have grown to 172,000 lines in March 2021.

Future

Our most important job is to “not rock the boat”. To provide the best and most solid Internet transfer library you can find, on as many platforms as possible.

But to remain attractive we also need to follow with the times and adapt to new protocols and new habits as they emerge. Support new protocol versions, enable better ways to do things and over time deprecate the bad things in responsible ways to not hurt users.

In the short term I think we want to work on making sure HTTP/3 works, make the Hyper backend really good and see where the rustls backend goes.

After 23 years we still don’t have any grand blue sky vision or road map items to guide us much. We go where Internet and our users lead us. Onward and upward!

The curl roadmap

23 curl numbers

Over the last few days ahead of this birthday, I’ve tweeted 23 “curl numbers” from the project using the #curl23 hashtag. Those twenty-three numbers and facts are included below.

2,200 lines of code by March 1998 have grown to 170,000 lines in 2021 as curl is about to turn 23 years old

14 different TLS libraries are supported by curl as it turns 23 years old

2,348 contributors have helped out making curl to what it is as it turns 23 years old

197 releases done so far as curl turns 23 years

6,787 bug-fixes have been logged as curl turns 23 years old

10,000,000,000 installations world-wide make curl one of the world’s most widely distributed 23 year-olds

871 committers have provided code to make curl a 23 year old project

935,000,000 is the official curl docker image pull-counter at (83 pulls/second rate) as curl turns 23 years old

22 car brands – at least – run curl in their vehicles when curl turns 23 years old

100 CI jobs run for every commit and pull-request in curl project as it turns 23 years old

15,000 spare time hours have been spent by Daniel on the curl project as it turns 23 years old

2 of the top-2 mobile operating systems bundle and use curl in their device operating systems as curl turns 23

86 different operating systems are known to have run curl as it turns 23 years old

250,000,000 TVs run curl as it turns 23 years old

26 transport protocols are supported as curl turns 23 years old

36 different third party libraries can optionally be built to get used by curl as it turns 23 years old

22 different CPU architectures have run curl as it turns 23 years old

4,400 USD have been paid out in total for bug-bounties as curl turns 23 years old

240 command line options when curl turns 23 years

15,600 GB data is downloaded monthly from the curl web site as curl turns 23 years old

60 libcurl bindings exist to let programmers transfer data easily using any language as curl turns 23 years old

1,327,449 is the total word count for all the relevant RFCs to read for curl’s operations as curl turns 23 years old

1 founder and lead developer has stuck around in the project as curl turns 23 years old

Credits

Image by AnnaER from Pixabay

half of curl’s vulnerabilities are C mistakes

I spent a lot of time and effort digging up the numbers and facts for this post!

Lots of people keep referring to the awesome summary put together by a friendly pseudonymous “Tim” which says that “53 out of 95” (55.7%) security flaws in curl could’ve been prevented if curl had been written in Rust. This is usually in regards to discussions around how insecure C is and what to do about it. I’ve blogged about this topic before, but things change, the world changes and my own view on these matters keep getting refined.

I did my own count: how many of the current 98 published security problems in curl are related to it being written in C?

Possibly due to the slightly different question, possibly because I’ve categorized one or two vulnerabilities differently, possibly because I’m biased as heck, but my count end up at:

51 out of 98 security vulnerabilities are due to C mistakes

That’s still 52%. (you can inspect my analysis and submit issues/pull-requests against the vuln.pm file) and yes, 51 flaws that could’ve been avoided if curl had been written in a memory safe language. This contradicts what I’ve said in the past, but I will also show you below that the numbers have changed and I still was right back then!

Let me also already now say that if you check out the curl security section, you will find very detailed descriptions of all vulnerabilities. Using those, you can draw your own conclusions and also easily write your own blog posts on this topic!

This post is not meant as a discussion around how we can rewrite C code into other languages to avoid these problems. This is an introspection of the C related vulnerabilities in curl. curl will not be rewritten but will continue to support backends written in other languages.

It seems hard to draw hard or definite conclusions based on the CVEs and C mistakes in curl’s history due to the relatively small amounts to analyze. I’m not convinced this is data enough to actually spot real trends, but might be mostly random coincidences.

98 flaws out of 6,682

The curl changelog counts a total of 6,682 bug-fixes at the time of this writing. It makes the share of all vulnerabilities to be 1.46% of all known curl bugs fixed through curl’s entire life-time, starting in March 1998.

Looking at recent curl development: the last three years. Since January 1st 2018, we’ve fixed 2,311 bugs and reported 26 vulnerabilities. Out of those 26 vulnerabilities, 18 (69%) were due to C mistakes. 18 out of 2,311 is 0.78% of the bug-fixes.

We’ve not reported a single C-based vulnerability in curl since September 2019, but six others. And fixed over a thousand other bugs. (There’s another vulnerability pending announcement, a 99th one, to become public on March 31, but that is also not a C mistake.)

This is not due to lack of trying. We’re one of the few small open source projects that pays several hundred dollars for any reported and confirmed security flaw since a few years back.

The share of C based security issues in curl is an extremely small fraction of the grand total of bugs. The security flaws are however of course the most fatal and serious ones – as all bugs are certainly not equal.

But also: not all vulnerabilities are equal. Very few curl vulnerabilities have had a severity level over medium and none has been marked critical.

Unfortunately we don’t have “severity” noted for very many many of the past vulnerabilities, as we only started that practice in 2019 and I’ve spent time and effort to backtrack and fill them in for the 2018 ones, but it’s a tedious job and I probably will not update the remainder soon, if at all.

51 flaws due to C

Let’s dive in to see how they look.

Here’s a little pie chart with the five different C mistake categories that have caused the 51 vulnerabilities. The categories here are entirely my own. No surprises here really. The two by far most common C mistakes that caused vulnerabilities are reading or writing outside a buffer.

Buffer overread – reading outside the buffer size/boundary. Very often due to a previous integer overflow.

Buffer overflow – code wrote more data into a buffer than it was allocated to hold.

Use after free – code used a memory area that had already been freed.

Double free – freeing a memory pointer that had already been freed.

NULL mistakes – NULL pointer dereference and NUL byte mistake.

Addressing the causes

I’ve previously described a bunch of the counter-measures we’ve done in the project to combat some of the most common mistakes we’ve done. We continue to enforce those rules in the project.

Two of the main methods we’ve introduced that are mentioned in that post, are that we have A) created a generic dynamic buffer system in curl that we try to use everywhere now, to avoid new code that handles buffers, and B) we enforce length restrictions on virtually all input strings – to avoid risking integer overflows.

Areas

When I did the tedious job of re-analyzing every single security vulnerability anyway, I also assigned an “area” to each existing curl CVE. Which area of curl in which the problem originated or belonged. If we look at where the C related issues were found, can we spot a pattern? I think not.

“internal” being the number one area, which means that was in generic code that affected multiple protocols or in several cases even entirely protocol independent.

HTTP was the second largest area, but that might just also reflect the fact that it is the by far most commonly used protocol in curl – and there is probably the most amount of protocol-specific code for this protocol. And there were a total of 21 vulnerabilities reported in that area, and 8 out of 21 is 38% C mistakes – way below the total average.

Otherwise I think we can conclude that the mistakes were distributed all over, rather nondiscriminatory…

C mistake history

As curl is an old project now and we have a long history to look back at, we can see how we have done in this regard throughout history. I think it shows quite clearly that age hasn’t prevented C related mistakes to slip in. Even if we are experienced C programmers and aged developers, we still let such flaws slip in. Or at least we don’t find old such mistakes that went in a long time ago – as the reported vulnerabilities in the project have usually been present in the source code for many years at the time of the finding.

The fact is that we only started to take proper and serious counter-measures against such mistakes in the last few years and while the graph below shows that we’ve improved recently, I don’t think we yet have enough data to show that this is a true trend and not just a happenstance or a temporary fluke.

The blue line in the graph shows how big the accumulated share of all security vulnerabilities has been due to C mistakes over time. It shows we went below 50% totally in 2012, only to go above 50% again in 2018 and we haven’t come down below that again…

The red line shows the percentage share the last twelve months at that point. It illustrates that we have had several series of vulnerabilities reported over the years that were all C mistakes, and it has happened rather recently too. During the period one year back from the very last reported vulnerability, we did not have a single C mistake among them.

Finding the flaws takes a long time

C mistakes might be easier to find and detect in source code. valgrind, fuzzing, static code analyzers and sanitizers can find them. Logical problems cannot as easily be detected using tools.

I decided to check if this seems to be the case in curl and if it is true, then C mistakes should’ve lingered in the code for a shorter time until found than other mistakes.

I had a script go through the 98 existing vulnerabilities and calculating the average time the flaws were present in the code until reported, splitting out the C mistake ones from the ones not caused by C mistakes. It revealed a (small) difference:

C mistake vulnerabilities are found on average at 80% of the time other mistakes need to get found. Or put the other way around: mistakes that were not C mistakes took 25% longer to get reported – on average. I’m not convinced the difference is very significant. C mistakes are still shipped in code for 2,421 days – on average – until reported. Looking over the last 10 C mistake vulnerabilities, the average is slightly lower at 2,108 days (76% of the time the 10 most recent non C mistakes were found). Non C mistakes take 3,030 days to get reported on average.

Reproducibility

All facts I claim and provide in this blog post can be double-checked and verified using available public data and freely available scripts.

Discuss

Hacker news

Lobste.rs

Reddit

“I will slaughter you”

You might know that I’ve posted funny emails I’ve received on my blog several times in the past. The kind of emails people send me when they experience problems with some device they own (like a car) and they contact me because my email address happens to be visible somewhere.

People sometimes say I should get a different email address or use another one in the curl license file, but I’ve truly never had a problem with these emails, as they mostly remind me about the tough challenges the modern technical life bring to people and it gives me insights about what things that run curl.

But not all of these emails are “funny”.

Category: not funny

Today I received the following email

From: Al Nocai <[redacted]@icloud.com>
Date: Fri, 19 Feb 2021 03:02:24 -0600
Subject: I will slaughter you

That subject.

As an open source maintainer since over twenty years, I know flame wars and personal attacks and I have a fairly thick skin and I don’t let words get to me easily. It took me a minute to absorb and realize it was actually meant as a direct physical threat. It found its ways through and got to me. This level of aggressiveness is not what I’m prepared for.

Attached in this email, there were seven images and no text at all. The images all look like screenshots from a phone and the first one is clearly showing source code I wrote and my copyright line:

The other images showed other source code and related build/software info of other components, but I couldn’t spot how they were associated with me in any way.

No explanation, just that subject and the seven images and I was left to draw my own conclusions.

I presume the name in the email is made up and the email account is probably a throw-away one. The time zone used in the Date: string might imply US central standard time but could of course easily be phony as well.

How I responded

Normally I don’t respond to these confused emails because the distance between me and the person writing them is usually almost interplanetary. This time though, it was so far beyond what’s acceptable to me and in any decent society I couldn’t just let it slide. After I took a little pause and walked around my house for a few minutes to cool off, I wrote a really angry reply and sent it off.

This was a totally and completely utterly unacceptable email and it hurt me deep in my soul. You should be ashamed and seriously reconsider your manners.

I have no idea what your screenshots are supposed to show, but clearly something somewhere is using code I wrote. Code I have written runs in virtually every Internet connected device on the planet and in most cases the users download and use it without even telling me, for free.

Clearly you don’t deserve my code.

I don’t expect that it will be read or make any difference.

Update below, added after my initial post.

Al Nocai’s response

Contrary to my expectations above, he responded. It’s not even worth commenting but for transparency I’ll include it here.

I do not care. Your bullshit software was an attack vector that cost me a multimillion dollar defense project.

Your bullshit software has been used to root me and multiple others. I lost over $15k in prototyping alone from bullshit rooting to the charge arbitrators.

I have now since October been sandboxed because of your bullshit software so dipshit google kids could grift me trying to get out of the sandbox because they are too piss poor to know shat they are doing.

You know what I did to deserve that? I tried to develop a trade route in tech and establish project based learning methodologies to make sure kids aren’t left behind. You know who is all over those god damn files? You are. Its sickening. I got breached in Oct 2020 through federal server hijacking, and I owe a great amount of that to you.

Ive had to sit and watch as i reported:

  1. fireeye Oct/2020
  2. Solarwinds Oct/2020
  3. Zyxel Modem Breach Oct/2020
  4. Multiple Sigover attack vectors utilizing favicon XML injection
  5. JS Stochastic templating utilizing comparison expressions to write to data registers
  6. Get strong armed by $50billion companies because i exposed bullshit malware

And i was rooted and had my important correspondence all rerouted as some sick fuck dismantled my life with the code you have your name plastered all over. I cant even leave the country because of the situation; qas you have so effectively built a code base to shit all over people, I dont give a shit how you feel about this.

You built a formula 1 race car and tossed the keys to kids with ego problems. Now i have to deal with Win10 0-days because this garbage.

I lost my family, my country my friends, my home and 6 years of work trying to build a better place for posterity. And it has beginnings in that code. That code is used to root and exploit people. That code is used to blackmail people.

So no, I don’t feel bad one bit. You knew exactly the utility of what you were building. And you thought it was all a big joke. Im not laughing. I am so far past that point now.

/- Al

Al continues

Nine hours after I first published this blog post , Al replied again with two additional emails. His third and forth emails to me.

Email 3:

https://davidkrider.com/i-will-slaughter-you-daniel-haxx-se/
Step up. You arent scaring me. What led me here? The 5th violent attempt on my life. Apple terms of service? gtfo, thanks for the platform.

Amusingly he has found a blog post about my blog post.

Email 4:

There is the project: MOUT Ops Risk Analysis through Wide Band Em Spectrum analysis through different fourier transforms.
You and whoever the fuck david dick rider is, you are a part of this.
Federal server breaches-
Accomplice to attempted murder-
Fraud-
just a few.

I have talked to now: FBI FBI Regional, VA, VA OIG, FCC, SEC, NSA, DOH, GSA, DOI, CIA, CFPB, HUD, MS, Convercent, as of today 22 separate local law enforcement agencies calling my ass up and wasting my time.

You and dick ridin’ dave are respinsible. I dont give a shit, call the cops. I cuss them out wheb they call and they all go silent.

I’ve kept his peculiar formatting and typos. In email 4 there was also a PDF file attached named BustyBabes 4.pdf. It is apparently a 13 page document about the “NERVEBUS NERVOUS SYSTEM” described in the first paragraph as “NerveBus Nervous System aims to be a general utility platform that provides comprehensive and complex analysis to provide the end user with cohesive, coherent and “real-time” information about the environment it monitors.”. There’s no mention of curl or my name in the document.

Since I don’t know the status of this document I will not share it publicly, but here’s a screenshot of the front page:

Related

This topic on hacker news and reddit.

I have reported the threat to the Swedish police (where I live).

This person would later apologize.

Transfers vs connections spring cleanup

Warning: this post is full of libcurl internal architectural details and not much else.

Within libcurl there are two primary objects being handled; transfers and connections. The transfers objects are struct Curl_easy and the connection counterparts are struct connectdata.

This is a separation and architecture as old as libcurl, even if the internal struct names have changed a little through the years. A transfer is associated with none or one connection object and there’s a pool with potentially several previously used, live, connections stored for possible future reuse.

A simplified schematic picture could look something like this:

One transfer, one connection and three idle connections in the pool.

Transfers to connections

These objects are protocol agnostic so they work like this no matter which scheme was used for the URL you’re transferring with curl.

Before the introduction of HTTP/2 into curl, which landed for the first time in September 2013 there was also a fixed relationship that one transfer always used (none or) one connection and that connection then also was used by a single transfer. libcurl then stored the association in the objects both ways. The transfer object got a pointer to the current connection and the connection object got a pointer to the current transfer.

Multiplexing shook things up

Lots of code in libcurl passed around the connection pointer (conn) because well, it was convenient. We could find the transfer object from that (conn->data) just fine.

When multiplexing arrived with HTTP/2, we then could start doing multiple transfers that share a single connection. Since we passed around the conn pointer as input to so many functions internally, we had to make sure we updated the conn->data pointer in lots of places to make sure it pointed to the current driving transfer.

This was always awkward and the source for agony and bugs over the years. At least twice I started to work on cleaning this up from my end but it quickly become a really large work that was hard to do in a single big blow and I abandoned the work. Both times.

Third time’s the charm

This architectural “wart” kept bugging me and on January 8, 2021 I emailed the curl-library list to start a more organized effort to clean this up:

Conclusion: we should stop using ‘conn->data’ in libcurl

Status: there are 939 current uses of this pointer

Mission: reduce the use of this pointer, aiming to reach a point in the future when we can remove it from the connection struct.

Little by little

With the help of fellow curl maintainer Patrick Monnerat I started to submit pull requests that would remove the use of this pointer.

Little by little we changed functions and logic to rather be anchored on the transfer rather than the connection (as data->conn is still fine as that can only ever be NULL or a single connection). I made a wiki page to keep an updated count of the number of references. After the first ten pull requests we were down to just over a hundred from the initial 919 – yeah the mail quote says 939 but it turned out the grep pattern was slightly wrong!

We decided to hold off a bit when we got closer to the 7.75.0 release so that we wouldn’t risk doing something close to the ship date that would jeopardize it. Once the release had been pushed out the door we could continue the journey.

Gone!

As of today, February 16 2021, the internal pointer formerly known as conn->data doesn’t exist anymore in libcurl and therefore it can’t be used and this refactor is completed. It took at least 20 separate commits to get the job done.

I hope this new order will help us do less mistakes as we don’t have to update this pointer anymore.

I’m very happy we could do this revamp without it affecting the API or ABI in any way. These are all just internal artifacts that are not visible to the outside.

One of a thousand little things

This is just a tiny detail but the internals of a project like curl consists of a thousand little details and this is one way we make sure the code remains in a good shape. We identify improvements and we perform them. One by one. We never stop and we’re never done. Together we take this project into the future and help the world do Internet transfers properly.

curl –fail-with-body

That’s --fail-with-body, using two dashes in front of the name.

This is a brand new command line option added to curl, to appear in the 7.76.0 release. This function works like --fail but with one little addition and I’m hoping the name should imply it good enough: it also provides the response body. The --fail option has turned out to be a surprisingly popular option but users have often repeated the request to also make it possible to get the body stored. --fail makes curl stop immediately after having received the response headers – if the response code says so.

--fail-with-body will instead first save the body per normal conventions and then return an error if the HTTP response code was 400 or larger.

To be used like this:

curl --fail-with-body -o output https://example.com/404.html

If the page is missing on that HTTPS server, curl will return exit code 22 and save the error message response in the file named ‘output’.

Not complicated at all. But has been requested many times!

This is curl’s 238th command line option.