I’ve previously said that curl is one of the most widely used software components in the world with its estimated over ten billion installations, and I’m getting questions about it every now and then.
— Is curl the most widely used software component in the world? If not, which one is?
We can’t know for sure which products are on the top list of the most widely deployed software components. There’s no method for us to count or estimate these numbers with a decent degree of certainty. We can only guess and make rough estimates – and it also depends on exactly what we count. And quite probably also depending on who‘s doing the counting.
First, let’s acknowledge that SQLite already hosts a page for mostly deployed software module, where they speculate on this topic (and which doesn’t even mention curl). Also, does this count number of devices running the code or number of installs? If we count devices, does virtual machines count? Is it the number of currently used installations or total number of installations done over the years?
Choices
The SQLite page suggests four contenders for the top-5 list and I think it is pretty good:
- zlib (the original implementation)
- libpng
- libjpeg
- sqlite
I will go out on a limb and say that the two image libraries in the list, while of course very widely used, are not typically used on devices without screens and in the IoT world of today, such devices are fairly common. Light bulbs, power switches, networking gear etc. I think it might imply that they are slightly less used than the others in the list. Secondarily, libjpeg seems to not actually be around, but there are a few other successors that are used? Ie not a single implementation.
All top components are Open Source (sqlite’s situation is special but they still call it open source), and I don’t think it is a coincidence.
Are there other contenders not mentioned here? I figure maybe some of the operating systems for the tiniest devices that ship in the billions could be there. But I’m not sure there’s any such obvious market dominant player. There are other compression libraries too, but I doubt they reach the levels of zlib at this moment.
Someone brings up the Linux kernel, which certainly is very well used, but all Android devices, servers, windows 10 etc probably don’t make the unit count go over 7 billion and I believe that in virtually all Linux these kernel installs, curl, zlib and sqlite also run…
Similarly to how SQLite forgot to mention curl, I might of course also have a blind eye for some other really well-used code block.
The finalists
We end up with three finalists:
- zlib
- sqlite
- libcurl
I think it is impossible for us to rank these three in an order with any good certainty. If we look at that sqlite list of where it is used, we quickly recognize that zlib and libcurl are deployed in pretty much all of them as well. The three modules have a huge overlap and will all be installed in billions of devices, while of course there are also plenty that only install one or two of them.
I just can’t figure out the numbers that would rank these modules in the top-list.
The SQLite page says: our best guess is that SQLite is the second mostly widely deployed software library, after libz. They might of course be right. Or wrong. They also don’t specify or explain how they do that guess.
libc
Whenever I’ve mentioned widely used components in the past, someone has brought up “libc” as a contender. But since there are many different libc implementations and they are typically done for specific platforms/operating systems, I don’t think any single of the libc implementations actually reach the top-5 list.
zlib in curl/sqlite
Many people says zlib, partly because curl uses it, but then I have to add that zlib is an optional dependency for curl and I know many, including large volume, users that ship products with libcurl that doesn’t use zlib at all. One very obvious and public example, is the curl.exe
shipped in Windows 10 – that’s maybe one billion installs of curl that don’t bundle zlib.
If I understand things correctly, the situation is similar in sqlite: it doesn’t always ship with a zlib dependency.
The poll
I asked my twitter followers which one of these three components they guess is the most widely used one. Very unscientifically and of course skewed towards libcurl (since I asked and I have a curl bias),
The over 2,000 respondents voted libcurl with a fairly high margin.
What did I miss?
Did I miss a contender?
Have I overlooked some stats that make one of these win?
Updates: Since this was originally posted, I have had OpenSSL, expat and the Linux kernel proposed to me as additional finalists and possibly most-used components.
My guess is zlib, compression is everywhere.
I was one of your Twitter followers that voted for zlib. I think SQLite over estimates their own importance… Sure it’s used in every browser and in many apps, but I’ be surprised if there aren’t more apps that need to compress or communicate over the Internet than there are apps that need to have a small database.
So of those 3 libraries I would rank them:
1. zlib
2. libcurl
3. sqlite
Could be very much ex aequo curl and sqlite, they are shipped oh so often in IoT, Android, iPhones, cars
The image libraries are used in very many more places than just devices that have a display. For starters, anything that might generate images to serve them over the net… meaning a whole swath of the IoT. Anything that wants to examine images for any reason as well; thumbnail generation, whatever. It’s very likely that if you are installing any kind of derivative of some kind of stock OS (such as when you’re setting up a VM or container) then something in there will have a dependency on an image library.
I wouldn’t disagree that curl is probably much more ubiquitous than the image libraries, but I do think you are underestimating them.
Are we suggesting windows 10 doesn’t install by default without zlib in there somewhere? I suspect it does, so curl.exe not requiring it on windows still doesn’t count against it.
zlib is *everywhere*. Especially on space-constrained IoT devices.
Not sure, but zlib came out in the 90s when MS was very much Not Invented Here with FOSS. I wouldn’t be surprised if they implemented the same algorithms independently.
People laugh uncertainly when I tell them that “counting things” is the hardest problem in data science.
I’d wager that openssl is huge too