Category Archives: cURL and libcurl

curl and/or libcurl related

The backdoor threat

— “Have you ever detected anyone trying to add a backdoor to curl?”

— “Have you ever been pressured by an organization or a person to add suspicious code to curl that you wouldn’t otherwise accept?”

— “If a crime syndicate would kidnap your family to force you to comply, what backdoor would you be be able to insert into curl that is the least likely to get detected?” (The less grim version of this question would instead offer huge amounts of money.)

I’ve been asked these questions and variations of them when I’ve stood up in front of audiences around the world and talked about curl and how it is one of the most widely used software components in the world, counting way over three billion instances.

Back door (noun)
— a feature or defect of a computer system that allows surreptitious unauthorized access to data.

So how is it?

No. I’ve never seen a deliberate attempt to add a flaw, a vulnerability or a backdoor into curl. I’ve seen bad patches and I’ve seen patches that brought bugs that years later were reported as security problems, but I did not spot any deliberate attempt to do bad in any of them. But if done with skills, certainly I wouldn’t have noticed them being deliberate?

If I had cooperated in adding a backdoor or been threatened to, then I wouldn’t tell you anyway and I’d thus say no to questions about it.

How to be sure

There is only one way to be sure: review the code you download and intend to use. Or get it from a trusted source that did the review for you.

If you have a version you trust, you really only have to review the changes done since then.

Possibly there’s some degree of safety in numbers, and as thousands of applications and systems use curl and libcurl and at least some of them do reviews and extensive testing, one of those could discover mischievous activities if there are any and report them publicly.

Infected machines or owned users

The servers that host the curl releases could be targeted by attackers and the tarballs for download could be replaced by something that carries evil code. There’s no such thing as a fail-safe machine, especially not if someone really wants to and tries to target us. The safeguard there is the GPG signature with which I sign all official releases. No malicious user can (re-)produce them. They have to be made by me (since I package the curl releases). That comes back to trusting me again. There’s of course no safe-guard against me being forced to signed evil code with a knife to my throat…

If one of the curl project members with git push rights would get her account hacked and her SSH key password brute-forced, a very skilled hacker could possibly sneak in something, short-term. Although my hopes are that as we review and comment each others’ code to a very high degree, that would be really hard. And the hacked person herself would most likely react.

Downloading from somewhere

I think the highest risk scenario is when users download pre-built curl or libcurl binaries from various places on the internet that isn’t the official curl web site. How can you know for sure what you’re getting then, as you couldn’t review the code or changes done. You just put your trust in a remote person or organization to do what’s right for you.

Trusting other organizations can be totally fine, as when you download using Linux distro package management systems etc as then you can expect a certain level of checks and vouching have happened and there will be digital signatures and more involved to minimize the risk of external malicious interference.

Pledging there’s no backdoor

Some people argue that projects could or should pledge for every release that there’s no deliberate backdoor planted so that if the day comes in the future when a three-letter secret organization forces us to insert a backdoor, the lack of such a pledge for the subsequent release would function as an alarm signal to people that something is wrong.

That takes us back to trusting a single person again. A truly evil adversary can of course force such a pledge to be uttered no matter what, even if that then probably is more mafia level evilness and not mere three-letter organization shadiness anymore.

I would be a bit stressed out to have to do that pledge every single release as if I ever forgot or messed it up, it should lead to a lot of people getting up in arms and how would such a mistake be fixed? It’s little too irrevocable for me. And we do quite frequent releases so the risk for mistakes is not insignificant.

Also, if I would pledge that, is that then a promise regarding all my code only, or is that meant to be a pledge for the entire code base as done by all committers? It doesn’t scale very well…

Additionally, I’m a Swede living in Sweden. The American organizations cannot legally force me to backdoor anything, and the Swedish versions of those secret organizations don’t have the legal rights to do so either (caveat: I’m not a lawyer). So, the real threat is not by legal means.

What backdoor would be likely?

It would be very hard to add code, unnoticed, that sends off data to somewhere else. Too much code that would be too obvious.

A backdoor similarly couldn’t really be made to split off data from the transfer pipe and store it locally for other systems to read, as that too is probably too much code that is too different than the current code and would be detected instantly.

No, I’m convinced the most likely backdoor code in curl is a deliberate but hard-to-detect security vulnerability that let’s the attacker exploit the program using libcurl/curl by some sort of specific usage pattern. So when triggered it can trick the program to send off memory contents or perhaps overwrite the local stack or the heap. Quite possibly only one step out of several steps necessary for a successful attack, much like how a single-byte-overwrite can lead to root access.

Any past security problems on purpose?

We’ve had almost 70 security vulnerabilities reported through the project’s almost twenty years of existence. Since most of them were triggered by mistakes in code I wrote myself, I can be certain that none of those problems were introduced on purpose. I can’t completely rule out that someone else’s patch modified curl along the way and then by extension maybe made a vulnerability worse or easier to trigger, could have been made on purpose. None of the security problems that were introduced by others have shown any sign of “deliberateness”. (Or were written cleverly enough to not make me see that!)

Maybe backdoors have been planted that we just haven’t discovered yet?

Discussion

Follow-up discussion/comments on hacker news.

curl author activity illustrated

At the time of each commit, check how many unique authors that had a change committed within the previous 120, 90, 60, 30 and 7 days. Run the script on the curl git repository and then plot a graph of the data, ranging from 2010 until today. This is just under 10,000 commits.

(click for the full resolution version)

git-authors-active.pl is the little stand-alone script I wrote and used for this – should work fine for any git repository. I then made the graph from that using libreoffice.

Easier HTTP requests with h2c

I spend a large portion of my days answering questions and helping people use curl and libcurl. With more than 200 command line options it certainly isn’t always easy to find the correct ones, in combination with the Internet and protocols being pretty complicated things at times… not to mention the constant problem of bad advice. Like code samples on stackoverflow that repeats non-recommended patterns.

The notorious -X abuse is a classic example, or why not the widespread disease called too much use of the –insecure option (at a recent count, there were more than 118,000 instances of “curl --insecure” uses in code hosted by github alone).

Sending HTTP requests with curl

HTTP (and HTTPS) is by far the most used protocol out of the ones curl supports. curl can be used to issue just about any HTTP request you can think of, even if it isn’t always immediately obvious exactly how to do it.

h2c to the rescue!

h2c is a new command line tool and associated web service, that when passed a complete HTTP request dump, converts that into a corresponding curl command line. When that curl command line is then run, it will generate exactly(*) the HTTP request you gave h2c.

h2c stands for “headers to curl”.

Many times you’ll read documentation somewhere online or find a protocol/API description showing off a full HTTP request. “This is what the request should look like. Now send it.” That is one use case h2c can help out with.

Example use

Here we have an HTTP request that does Basic authentication with the POST method and a small request body. Do you know how to tell curl to send it?

The request:

POST /receiver.cgi HTTP/1.1
Host: example.com
Authorization: Basic aGVsbG86eW91Zm9vbA==
Accept: */*
Content-Length: 5
Content-Type: application/x-www-form-urlencoded

hello

I save the request above in a text file called ‘request.txt’ and ask h2c to give the corresponding curl command line:

$ ./h2c < request.txt
curl --http1.1 --header User-Agent: --user "hello:youfool" --data-binary "hello" https://example.com/receiver.cgi

If we add "--trace-ascii dump” to that command line, run it, and then inspect the dump file after curl has completed, we can see that it did indeed issue the HTTP request we asked for!

Web Site

Maybe you don’t want to install another command line tool written by me in your system. The solution is the online version of h2c, which is hosted on a separate portion of the official curl web site:

https://curl.se/h2c/

The web site lets you paste a full HTTP request into a text form and the page then shows the corresponding curl command line for that request.

h2c “as a service”

Inception alert: you can also use the web version of h2c by sending over a HTTP request to it using curl. You’ll then get nothing but the correct curl command line output on stdout.

To send off the same file we used above:

curl --data-urlencode http@request.txt https://curl.se/h2c/

or of course if you rather want to pass your HTTP request to curl on stdin, that’s equally easy:

cat request.txt | curl --data-urlencode http@- https://curl.se/h2c/

Early days, you can help!

h2c was created just a few days ago. I’m sure there are bugs, issues and quirks to iron out. You can help! Files issues or submit pull-requests!

(*) = barring bugs, there are still some edge cases where the exact HTTP request won’t be possible to repeat, but where we instead will attempt to do “the right thing”.

keep finding old security problems

I decided to look closer at security problems and the age of the reported issues in the curl project.

One theory I had when I started to collect this data, was that we actually get security problems reported earlier and earlier over time. That bugs would be around in public release for shorter periods of time nowadays than what they did in the past.

My thinking would go like this: Logically, bugs that have been around for a long time have had a long time to get caught. The more eyes we’ve had on the code, the fewer old bugs should be left and going forward we should more often catch more recently added bugs.

The time from a bug’s introduction into the code until the day we get a security report about it, should logically decrease over time.

What if it doesn’t?

First, let’s take a look at the data at hand. In the curl project we have so far reported in total 68 security problems over the project’s life time. The first 4 were not recorded correctly so I’ll discard them from my data here, leaving 64 issues to check out.

The graph below shows the time distribution. The all time leader so far is the issue reported to us on March 10 this year (2017), which was present in the code since the version 6.5 release done on March 13 2000. 6,206 days, just three days away from 17 whole years.

There are no less than twelve additional issues that lingered from more than 5,000 days until reported. Only 20 (31%) of the reported issues had been public for less than 1,000 days. The fastest report was reported on the release day: 0 days.

The median time from release to report is a whopping 2541 days.

When we receive a report about a security problem, we want the issue fixed, responsibly announced to the world and ship a new release where the problem is gone. The median time to go through this procedure is 26.5 days, and the distribution looks like this:

What stands out here is the TLS session resumption bypass, which happened because we struggled with understanding it and how to address it properly. Otherwise the numbers look all reasonable to me as we typically do releases at least once every 8 weeks. We rarely ship a release with a known security issue outstanding.

Why are very old issues still found?

I think partly because the tools are gradually improving that aid people these days to find things much better, things that simply wasn’t found very often before. With new tools we can find problems that have been around for a long time.

Every year, the age of the oldest parts of the code get one year older. So the older the project gets, the older bugs can be found, while in the early days there was a smaller share of the code that was really old (if any at all).

What if we instead count age as a percentage of the project’s life time? Using this formula, a bug found at day 100 that was added at day 50 would be 50% but if it was added at day 80 it would be 20%. Maybe this would show a graph where the bars are shrinking over time?

But no. In fact it shows 17 (27%) of them having been present during 80% or more of the project’s life time! The median issue had been in there during 49% of the project’s life time!

It does however make another issue the worst offender, as one of the issues had been around during 91% of the project’s life time.

This counts on March 20 1998 being the birth day. Of course we got no reports the first few years since we basically had no users then!

Specific or generic?

Is this pattern something that is specific for the curl project or can we find it in other projects too? I don’t know. I have not seen this kind of data being presented by others and I don’t have the same insight on such details of projects with an enough amount of issues to be interesting.

What can we do to make the bars shrink?

Well, if there are old bugs left to find they won’t shrink, because for every such old security issue that’s still left there will be a tall bar. Hopefully though, by doing more tests, using more tools regularly (fuzzers, analyzers etc) and with more eyeballs on the code, we should iron out our security issues over time. Logically that should lead to a project where newly added security problems are detected sooner rather than later. We just don’t seem to be at that point yet…

Caveat

One fact that skews the numbers is that we are much more likely to record issues as security related these days. A decade ago when we got a report about a segfault or something we would often just consider it bad code and fix it, and neither us maintainers nor the reporter would think much about the potential security impact.

These days we’re at the other end of the spectrum where we people are much faster to jumping to a security issue suspicion or conclusion. Today people report bugs as security issues to a much higher degree than they did in the past. This is basically a good thing though, even if it makes it harder to draw conclusions over time.

Data sources

When you want to repeat the above graphs and verify my numbers:

  • vuln.pm – from the curl web site repository holds security issue meta data
  • releaselog – on the curl web site offers release meta data, even as a CSV download on the bottom of the page
  • report2release.pl – the perl script I used to calculate the report until release periods.

Some things to enjoy in curl 7.55.0

In this endless stream of frequent releases, the next release isn’t terribly different from the previous.

curl’s 167th release is called 7.55.0 and while the name or number isn’t standing out in any particular way, I believe this release has a few extra bells and whistles that makes it stand out a little from the regular curl releases, feature wise. Hopefully this will turn out to be a release that becomes the new “you should at least upgrade to this version” in the coming months and years.

Here are six things in this release I consider worthy some special attention. (The full changelog.)

1. Headers from file

The command line options that allows users to pass on custom headers can now read a set of headers from a given file.

2. Binary output prevention

Invoke curl on the command line, give it a URL to a binary file and see it destroy your terminal by sending all that gunk to the terminal? No more.

3. Target independent headers

You want to build applications that use libcurl and build for different architectures, such as 32 bit and 64 bit builds, using the same installed set of libcurl headers? Didn’t use to be possible. Now it is.

4. OPTIONS * support!

Among HTTP requests, this is a rare beast. Starting now, you can tell curl to send such requests.

5. HTTP proxy use cleanup

Asking curl to use a HTTP proxy while doing a non-HTTP protocol would often behave in unpredictable ways since it wouldn’t do CONNECT requests unless you added an extra instruction. Now libcurl will assume CONNECT operations for all protocols over an HTTP proxy unless you use HTTP or FTP.

6. Coverage counter

The configure script now supports the option –enable-code-coverage. We now build all commits done on github with it enabled, run a bunch of tests and measure the test coverage data it produces. How large share of our source code that is exercised by our tests. We push all coverage data to coveralls.io.

That’s a blunt tool, but it could help us identify parts of the project that we don’t test well enough. Right now it says we have a 75% coverage. While not totally bad, it’s not very impressive either.

Stats

This release ships 56 days since the previous one. Exactly 8 weeks, right on schedule. 207 commits.

This release contains 114 listed bug-fixes, including three security advisories. We list 7 “changes” done (new features basically).

We got help from 41 individual contributors who helped making this single release. Out of this bunch, 20 persons were new contributors and 24 authored patches.

283 files in the git repository were modified for this release. 51 files in the documentation tree were updated, and in the library 78 files were changed: 1032 lines inserted and 1007 lines deleted. 24 test cases were added or modified.

The top 5 commit authors in this release are:

  1. Daniel Stenberg
  2. Marcel Raad
  3. Jay Satiro
  4. Max Dymond
  5. Kamil Dudka

The curl bus factor

bus factor: the minimum number of team members that have to suddenly disappear from a project before the project stalls due to lack of knowledgeable or competent personnel.

Projects should strive to survive

If a project is worth using and deploying today and if it is a project worth sending patches to right now, it is also a project that should position itself to survive a loss of key individuals. However unlikely or unfortunate such an event would be.

Tools to calculate bus factor

All the available tools that determine the bus factor for a given project only run on code and check for commits, code churn or check how many files each person has done a significant share of changes in etc.

This number is really impossible to figure out without tools and tools really cannot take “general knowledge” into account, or “this person answers a lot of email on the list”, or this person has 48k in reputation on stack overflow already for responding to questions about the project.

The bus factor as evaluated by a tool pretty much has to be about amount of code, size of code or number of code changes, which may or may not be a good indicator of who knows what about the code. Those who author and commit changes probably have a good idea but a real problem is that you can’t reverse that view and say that just because you didn’t commit or change something, you don’t know. Do you know more about the code if you did many commits? Do you know more about the code if you changed more lines of code?

We can’t prove or assume lack of knowledge or interest by an absence of commits, edits or changes. And yet we can’t calculate bus factor if there’s no tool or way to calculate it.

A look at curl

curl is soon 20 years old and boasts 22k something commits. I’m the author of about 57% of them, and the second-most committer (who’s not involved anymore) has about 12%. That makes two committers having done 15.3k commits out of the 22k. If we for simplicity calculate bus factor based on commit numbers, we’d need 8580 commits from others and I would stop completely, to reach bus factor >2 (when the 2 top committers have less than 50% of the commits), which at the current commit rate equals in about 5 years. And it would take about 3 years to just push the factor above 1. So even when new people joins the project, they have a really hard time to significantly change the bus factor…

The image above shows the relative share of commits done in the curl project’s git source code repository (as a share of the total amount) by the top 4 commiters from January 1 2010 to July 5 2017 (click for higher resolution). The top dotted line shows the combined share of all four (at 82% right now) and the dark blue line is my share. You can see how my commit share has shrunk from 72% down to 57% over these last 7.5 years. If this trend holds, I’ll have less than 50% of the total commits done in curl in 3-4 years.

At the same time, the thicker light blue line that climbs up into the right is the total number of authors in the git repository, which recently surpassed 500 as you can see. (The line uses the right Y-axes)

We’re approaching 1600 individually named contributors thanked in the project and every release we do (we ship one every 8 weeks) has around 40 contributors, out of which typically around half are newcomers. The long tail is very long and the amount of drive-by just-once contributors is high. Also note how the number 1600 is way higher than the 500 something that has authored commits. Lots of people contribute in other ways.

When we ask our users “why don’t you contribute (more) to the project?” (which we do annually) what do they answer? They say its because 1) everything works, 2) I don’t have time 3) things get fixed fast enough 4) I don’t know the programming language 5) I don’t have the energy.

First as the 6th answer (at 5% 2017) comes “other” where some people actually say they wouldn’t know where to start and so on.

All of this taken together: there are no visible signs of us suffering from having a low bus factor. Lots of signs that people can do things when they want to if I don’t do it. Lots of signs that the code and concepts are understood.

Lots of signs that a low bus factor is not a big problem here. Or perhaps rather that the bus factor isn’t really as low as any tool would calculate it.

What if I…

Do I know who would pick up the project and move on if I die today? No. We’re a 100% volunteer-driven project. We create one of the world’s most widely used software components (easily more than three billion instances and counting) but we don’t know who’ll be around tomorrow to work on it. I can’t know because that’s not how the project works.

Given the extremely wide use of our stuff, given the huge contributor base, given the vast amounts of documentation and tests I think it’ll work out.

Just because you have a large bus factor doesn’t necessarily make the project a better place to ask questions. We’ve seen projects in the past where N persons involved are all from the same company and when that company removes its support for that project those people all go away. High bus factor, no people to ask.

Finally, let me just add that I would of course love to have many more committers and contributors in the curl project, and I think we would be an even better project if we did. But that’s a separate issue.

“OPTIONS *” with curl

(Note: this blog post as been updated as the command line option changed after first publication, based on comments to this very post!)

curl is arguably a “Swiss army knife” of HTTP fiddling. It is one of the available tools in the toolbox with a large set of available switches and options to allow us to tweak and modify our HTTP requests to really test, debug and torture our HTTP servers and services.

That’s the way we like it.

In curl 7.55.0 it will take yet another step into this territory when we finally introduce a way for users to send “OPTION *” and similar requests to servers. It has been requested occasionally by users over the years but now the waiting is over. (brought by this commit)

“OPTIONS *” is special and peculiar just because it is one of the few specified requests you can do to a HTTP server where the path part doesn’t start with a slash. Thus you cannot really end up with this based on a URL and as you know curl is pretty much all about URLs.

The OPTIONS method was introduced in HTTP 1.1 already back in RFC 2068, published in January 1997 (even before curl was born) and with curl you’ve always been able to send an OPTIONS request with the -X option, you just were never able to send that single asterisk instead of a path.

In curl 7.55.0 and later versions, you can remove the initial slash from the path part that ends up in the request by using –request-target. So to send an OPTION * to example.com for http and https URLs, you could do it like:

$ curl --request-target "*" -X OPTIONS http://example.com
$ curl --request-target "*" -X OPTIONS https://example.com/

In classical curl-style this also opens up the opportunity for you to issue completely illegal or otherwise nonsensical paths to your server to see what it does on them, to send totally weird options to OPTIONS and similar games:

$ curl --request-target "*never*" -X OPTIONS http://example.com

$ curl --request-target "allpasswords" http://example.com

Enjoy!

curl doesn’t spew binary anymore

One of the least favorite habits of curl during all these years, I’ve been told, is when users forget to instruct the command line tool where to store the downloaded file and as a direct consequence, curl instead sends a lot of binary “gunk” to the terminal. The end result of that is at best just a busload of weird-looking characters on the screen, but with just a little bit of bad luck it can also lock up the terminal completely or change it in other ways.

Starting in curl 7.55.0 (from this commit), curl will inspect the beginning of each download that has been told to get sent to the terminal (tty!) and attempt to detect and prevent raw binary output to get sent there. The code is only simply looking for a binary zero in the data.

$ curl https://example.com/image.jpg
Warning: Binary output can mess up your terminal. Use "--output -" to tell curl to output it to your terminal anyway, or consider "--output <FILE>" to save to a file.

As the warning message says, there’s an option to use to switch off this emergency check for when you truly know what you’re doing and you don’t need curl to prevent you from doing this. Then you just tell curl explicitly that you want the output to stdout, with “–output -” (or “-o -” for a shorter version):

$ curl -o - https://example.com/binblob.img

We’re eager to get your input and feedback on how this works. We are aware of the risk of false positives for UTF-16 and UTF-32 outputs, but we think they are rare enough to not make this a huge problem.

This feature should be able to drastically reduce the risk for this:

Pipes

(Update, added after the initial posting.)

So many have remarked or otherwise asked how this affects when stdout is piped into something else. It doesn’t affect that! The whole point of this check is to only show the warning message if the binary output is sent to the terminal. If you instead pipe the output to another program or if you redirect the output with >, that will not trigger this warning but will instead continue just like before. Just like you’d expect it to.

curl: read headers from file

Starting in curl 7.55.0 (since this commit), you can tell curl to read custom headers from a file. A feature that has been asked for numerous times in the past, and the answer has always been to write a shell script to do it. Like this:

#!/bin/sh
while read line; do
  args="$args -H '$line'";
done
curl $args $URL

That’s now a response of the past (or for users stuck on old curl versions). We can now instead tell curl to read headers itself from a file using the curl standard @filename way:

$ curl -H @headers https://example.com

… and this also works if you want to just send custom headers to the proxy you do CONNECT to:

$ curl --proxy-headers @headers --proxy proxy:8080 https://example.com/

(this is a pure curl tool change that doesn’t affect libcurl, the library)