Six billion docker pulls

We provide an official curl container.

Why would you use curl in a container? We actually don’t ask, we just provide the image, but I can think of a few reasons…

  • it is an easy way to use a modern curl version in a system that otherwise ships an ancient version. So many people are stuck on legacy distros with ancient curl versions.
  • it is an easy way to make use of a consistent fixed version from many places independently of what particular curl versions those systems otherwise can offer
  • CI jobs
  • other elaborate explanations

Six billion as of now

The official curl docker repository now (as of 06:43 UTC April 24, 2024) reports that the curl container has been pulled more than six billion times. Currently, people seem to be pulling the curl image from docker.com at a rate of 2-3 million pulls per day (about 25 per second).

It shall be noted that a pull does not necessary imply a download. The pull is a a check and the client may already have the latest version downloaded. It is therefore not equal to six billion downloads.

We started offering docker images to the world with curl 7.65.3, July 19 2019. Six billion pulls in 1832 days makes an average of 38 pulls/second through all this time. Less than five years.

How do I know the pull counter reached six billion? I asked their API:

curl https://hub.docker.com/v2/repositories/curlimages/ -s | jq .results[0].pull_count

Sponsored OSS

We do not pay Docker anything for this service of theirs. They also do not pay anything to us for our service. The Docker Sponsored OSS program lists conditions that might make us disqualified for being part of it, but as long as you don’t tell them I won’t. And hey, at least the first six billion pulls have been served.

Other repositories

You can also opt to pull the container from other repositories like quay and GitHub. I have not included their pull counters in this.

curl is just the hobby

Jan Gampe took things to the next level by actually making this cross-stitch out of the pattern I previously posted online. The flowers really gave it an extra level of charm I think.

This quote is from a comment by an upset user on my blog, replying to one of my previous articles about curl.

Fact check: while curl is my hobby, I also work on curl as a full-time job. It is a business and I serve and communicate with many customers on a daily basis. curl provides service to way more than a billion people. I claim that every human being on the planet that is Internet-connected uses devices or services every day that run curl.

The pattern

curl in San Francisco

Meanwhile, another “curl craft” seen in the wild recently is this ad in San Francisco (photo by diego).

The full command line looks like:

curl --request PUT \
--url https://api.stytch.com/v1/b2b/organizations/{ID} \
-d '{
"mfa_policy": "REQUIRED_FOR_ALL",
"mfa_methods": "RESTRICTED",
"allowed_mfa_methods": ["totp", "sms_otp"]
}'

I would personally perhaps protest against the use of PUT for POSTing JSON, but nobody asked me.

Update

Quote from the hacker news thread about the ad shown in the photo from San Francisco:

hi, Stytch team member here who worked on the PUT request ad from Daniel’s post — feel free to AMA

TL;DR on ‘why pay to put a curl request on an ad’ is what you all have already said — (1) unique concentration of tech in SF; and (2) to specifically engage software engineers. More on each…

(1) We wouldn’t have run this ad in Sydney, or New York, or LA. SF definitely has an uniquely tech-oriented culture, and in particular has lots of startups in our ideal customer persona (ICP) at Stytch – in this case, engineers building B2B SaaS apps.

But in addition to the people who live in SF, even more software engineers visit the city for conferences, or events, or to fundraise. For example, while our ads are up over the next month, SF will host Stripe Sessions and POSTCon (two entire conferences focused around APIs), plus RSA (security focused).

(2) And although even with that, only a small segment of the pop will understand the ads, those people might be intrigued enough to actually look at them – and our ultimate goal is to get more engineers to look at our code & our docs. Another perk is that engineers can’t use ad blockers if the ad is on a bus shelter đŸ™‚

So that was a bit of the thinking for us – on why SF & why a PUT request on a billboard. We’re also making the physical ads into an anchor for a ‘marketing moment’ for Stytch — so pair offline ads with digital marketing, as well. So if we’re successful, maybe you’ll see more on that, soon.

Verified curl

Don’t trust. Verify.

I could not resist making a fake book cover

Here follows a brief description on how you can detect if the curl package would ever make an xz.

xz (and its library liblzma) was presumably selected as a target because it is an often used component and by extension via systemd it often used by openssh in several Linux distros. libcurl is probably an even more widely used software component and if infected, could potentially serve as an effective vessel to distribute evil into the world.

Conceivably, the xz attackers have infiltrated more than one other Open Source project to cover their bases. Which ones?

No inexplicable binary blobs

First, you can verify that there are no binary blobs stored in git that could host an encrypted attack payload, planted there for the future.

Every file in the curl git repository has a benign meaning and purpose. As part of the products, the documentation, tooling or the test suites etc.

Without any secret “hide-out” in the git repository, you know that any backdoor needs to be provided either in plain code or using some crazy weird steganography. Or get inserted into the tarballs with content not present in git – read on for how to verify that this is not happening.

No disabled fuzzers

The xz attack could have been detected by proper fuzzing (and valgrind use) which is why the attacker made sure to sneakily disable such automated checks of code.

While somewhat hard to verify, you can make sure that no such activities have been done in curl’s fuzzing or curl’s automated and CI testing,

No hidden payloads in tarballs

In the curl project we generate several files as part of the release process and those files end up in the release tarball. This means that not all files in the tarball are found in the git repository. (Because we don’t commit generated files.)

The generated files are produced with a small set of tools, and these tools use the source code available in git at the release tag. Anyone can check out the same code from that same release tag, make sure to have the corresponding tools (and versions) installed and then generate a version of the tarball themselves – to verify that this tarball indeed becomes an identical copy of the public release.

That process verifies that the tarballs we ship are generated only with legitimate tools and that the release contents originate only from the files present in git. No secret sauce added in the process. No malicious code can get inserted.

Reproducible tarballs

We have recently improved reproducibility as a direct result of the post xz-attack debate. To make sure that a repeated tarball creation actually produces the exact same results, but also to make it easier for others to verify our release tarballs. With more documentation (releases now contain documentation of exactly which tools and versions that generated the tarball) and by making it easier to run the exact same virtual machine and tool setup like the one that created the release. We aim to soon provide a Dockerfile to make this process even smoother.

We also verify tarball reproducibility in a CI job: generating a release tarball with a given timestamp produces the identical binary output when done again for the same timestamp.

Signed tarballs

As an extra detail, everyone can also verify that the released tarballs are in fact shipped by me Daniel personally, as they are always signed with my GPG key as part of the release process. This should at least prove that new releases are made with the same keys as previous ones were, which should with a reasonable probability be me.

The signatures also help verify that the tarballs have not been tampered with in transition, from the point I generated them to the moment they land in your download directory. curl downloads are normally distributed via a third-party CDN which we normally trust of course, but if it would ever be breached or similar, a modified tarball would be detected when the digital signature is verified.

We do not provide checksums for the tarballs simply because providing checksums next to the downloads adds almost no extra verification. If someone can tamper with the tarballs, they can probably update the webpage with a fake checksum as well.

Signed commits

All commits done to curl done by me are signed, You can verify that I did them. Not all committers in the project do them signed, unfortunately. We hope to increase the share going forward. Over the last 365 days, 73% of the curl commits were signed.

These signatures only verify that the commits where done by a maintainer of the curl project (or someone who controls that account). A maintainer you may not trust and who might not be known under their real name and you do not even know in which country they live. And of course, even a trusted maintainer can suddenly go rogue.

Is the content in git benign?

The process above only verifies that tarballs are indeed generated (only) from contents present in git and that they are unaltered from the moment I made them.

How do you know that the contents in git does not contain any backdoors or other vulnerabilities?

Without trusting anyone else’s opinions and without just relying on the fact that you can run the test suite, fuzzers and static code analyzers without finding anything, you can review it. Or pay someone else to review it.

We have had curl audited several times by external organizations, but can you trust claimed random audits?

Anonymous contributors

We regularly accept contributions from anonymous and pseudonymous contributors in curl – and we always have. Our policy says that if a contribution is good: if it passes review and all tests run green, we have no reason to deny it – in the name of progress and improvement. That is also why we accept even single-letter typo fixes: even a very small fix is a step in the right direction.

A (to me) surprisingly large amount of contributions are done by people who do not state a full real name. They may chose to be anonymous for various reasons – we do not ask. Maybe they fear retaliation if they would propose something that ends up buggy? Sometimes people want to hide their affiliation/origin so that their contribution is not associated with the organization they work at. Another reason sometimes mentioned is that women do it to avoid revealing themselves as female. etc. As I said: we do not ask so I cannot tell for sure.

Anonymous maintainers

We do not have anonymous maintainers, but we don’t actually have rules against it.

Right now, we have 18 members in the GitHub curl organization with the rights to push commits. I have not met all of them. I have not even seen the faces of all of them. They have all proven themselves worthy of their administrative rights based on their track record. I cannot know if anyone of them is using a false identity and I do not ask nor keep track in which country they reside. A former top maintainer in the curl projected even landed a large amount of changes under a presumed/false name during several years.

If a curl maintainer suddenly goes rogue and attempts to land malicious content, our only effective remedy is review. To rely on the curl community to have eyes on the changes and to test things.

Can curl be targeted?

I think it would be very hard but I can of course not rule it out. Possibly I am also blind for our weaknesses because I have lived with them for so long.

Everyone can help the greater ecosystem by verifying a package or two. If we all tighten all screws just a little bit more, things will get better.

Vulnerabilities

I maintain that planting a backdoor in curl code is so infuriatingly hard to achieve that efforts and energy are probably much rather spent on finding security vulnerabilities and ways to exploit them. After all, we have 155 published past vulnerabilities in curl so far, out of which 42 have been at severity high or critical.

I can be fairly sure that none of those 42 somewhat serious issues were deliberately planted, because just about every one of them were found in code that I personally authored…

People often ask. But I have never seen a backdoor attempt in curl. Maybe that is just me being naive.

Credits

Top Image by Gerd Altmann from Pixabay. Fake book cover by Daniel Stenberg.