The curl project’s source code has been hosted on GitHub since March 2010. I wrote a blog post in 2013 about what I missed from the service back then and while it has improved significantly since then, there are still features I miss and flaws I think can be fixed.
For this reason, I’ve created and now maintain a dedicated git repository with feedback “items” that I think could enhance GitHub when used by a project like curl:
The purpose of this repository is to allow each entry to be on-point and good feedback to GitHub. I do not expect GitHub to implement any of them, but the better we can present the case for every issue, the more likely I think it is that we can gain supporters for them.
What makes curl “special”
I don’t think curl is a unique project in the way we run and host it. But there are a few characteristics that maybe make it stand out a little from many other projects hosted on GitHub:
curl is written in C. It means it cannot be part of the “dependency” and related security checks etc that GitHub provides
we are git users but we are not GitHub exclusive: we allow participation in and contributions to the project without a GitHub presence
we are an old-style project where discussions, planning and arguments about project details are held on mailing lists (outside of GitHub)
we have strict ideas about how git commits should be done and how the messages should look like etc, so we cannot accept merges done with the buttons on the GitHub site
You can help
Feel free to help me polish the proposals, or even submit new ones, but I think they need to be focused and I will only accept issues there that I agree with.
In a world that is now gradually adopting HTTP/3 (which, as you know, is implemented over QUIC), the problem with the missing API for QUIC is still a key problem.
There are a number of existing QUIC library implementation now since a few years back, and they are slowly maturing. The QUIC protocol became RFC 9000 and friends, but the most popular TLS libraries still don’t provide the necessary APIs to make QUIC libraries possible to use them.
Example that makes people want HTTP/3
For a long time, many people and projects (including yours truly) in the QUIC community were eagerly following the OpenSSL Pull Request 8797, which introduced the necessary QUIC APIs into OpenSSL. This change brought the same API to OpenSSL that BoringSSL already provides and as such the API has already been used and tested out by several independent implementations.
Implementations have a problem to ship to the world based on BoringSSL since that’s a TLS library without versions and proper releases, so it is not a good choice for the big wide world. OpenSSL is already the most widely used TLS library out there and lots of applications are already made to use that.
Delays made quictls happen
The OpenSSL PR8797 was delayed back in February 2020 on when the OpenSSL management committee (OMC) decreed that they would not deal with that PR until after their pending 3.0.0 release had shipped.
“It is our expectation that once the 3.0 release is done, QUIC will become a significant focus of our effort.”
OpenSSL then proceeded and their 3.0.0 release was delayed significantly compared to their initial time schedule.
In March 2021, Microsoft and Akamai announcedquictls, an OpenSSL fork with the express idea to ship OpenSSL + the QUIC API. They didn’t want to wait for OpenSSL to do it.
Several QUIC libraries can now use quictls. quictls has kept their fork up to date and now offers the equivalent of OpenSSL 3.0.0 + the QUIC API.
While we’ve been waiting for OpenSSL to adopt the API.
OpenSSL makes a turn instead
Then came the next blow to everyone’s expectations. An autumn surprise. On October 13, the OpenSSL OMC announces:
The focus for the next releases is QUIC, with the objective of providing a fully functional QUIC implementation over a series of releases (2-3).
OpenSSL has decided to implement a complete QUIC stack on their own and with the given time line it sounds like it will take them a few years (?) to ship. And instead of providing the API lots of implementers have been been waiting for so long, they explicitly say that it is a non-goal at the start:
The MVP will not contain a library API for an HTTP/3 implementation (it is a non-goal of the initial release).
I didn’t write my own QUIC implementation but I’ve followed the work of several of the implementations fairly closely and it is fairly complicated journey they set out for themselves – for very unclear reasons. There already exist several high quality QUIC libraries, why does OpenSSL think they need to make yet another one? They seem to be overloaded with work already before, which the long delays of the 3.0.0 release seemed to show, how are they going to be able to add a complete new stack implementation of top of this? The future will tell.
On October 20 2021, the pull request that was created in April 2019, is finally closed for real as a “won’t fix”.
Where are we now?
The lack of a QUIC API in OpenSSL has held us back and with this move from OpenSSL, it will continue to hold us back for an uncertain amount of time going forward.
QUIC stacks will have to stick to using or switching to other libraries.
James Snell, one of the key contributors on the QUIC and HTTP/3 work in nodejs tweeted:
I’ve previously said that curl is one of the most widely used software components in the world with its estimated over ten billion installations, and I’m getting questions about it every now and then.
— Is curl the most widely used software component in the world? If not, which one is?
We can’t know for sure which products are on the top list of the most widely deployed software components. There’s no method for us to count or estimate these numbers with a decent degree of certainty. We can only guess and make rough estimates – and it also depends on exactly what we count. And quite probably also depending on who‘s doing the counting.
First, let’s acknowledge that SQLite already hosts a page for mostly deployed software module, where they speculate on this topic (and which doesn’t even mention curl). Also, does this count number of devices running the code or number of installs? If we count devices, does virtual machines count? Is it the number of currently used installations or total number of installations done over the years?
The SQLite page suggests four contenders for the top-5 list and I think it is pretty good:
zlib (the original implementation)
I will go out on a limb and say that the two image libraries in the list, while of course very widely used, are not typically used on devices without screens and in the IoT world of today, such devices are fairly common. Light bulbs, power switches, networking gear etc. I think it might imply that they are slightly less used than the others in the list. Secondarily, libjpeg seems to not actually be around, but there are a few other successors that are used? Ie not a single implementation.
Are there other contenders not mentioned here? I figure maybe some of the operating systems for the tiniest devices that ship in the billions could be there. But I’m not sure there’s any such obvious market dominant player. There are other compression libraries too, but I doubt they reach the levels of zlib at this moment.
Someone brings up the Linux kernel, which certainly is very well used, but all Android devices, servers, windows 10 etc probably don’t make the unit count go over 7 billion and I believe that in virtually all Linux these kernel installs, curl, zlib and sqlite also run…
Similarly to how SQLite forgot to mention curl, I might of course also have a blind eye for some other really well-used code block.
We end up with three finalists:
I think it is impossible for us to rank these three in an order with any good certainty. If we look at that sqlite list of where it is used, we quickly recognize that zlib and libcurl are deployed in pretty much all of them as well. The three modules have a huge overlap and will all be installed in billions of devices, while of course there are also plenty that only install one or two of them.
I just can’t figure out the numbers that would rank these modules in the top-list.
The SQLite page says: our best guess is that SQLite is the second mostly widely deployed software library, after libz. They might of course be right. Or wrong. They also don’t specify or explain how they do that guess.
Whenever I’ve mentioned widely used components in the past, someone has brought up “libc” as a contender. But since there are many different libc implementations and they are typically done for specific platforms/operating systems, I don’t think any single of the libc implementations actually reach the top-5 list.
zlib in curl/sqlite
Many people says zlib, partly because curl uses it, but then I have to add that zlib is an optional dependency for curl and I know many, including large volume, users that ship products with libcurl that doesn’t use zlib at all. One very obvious and public example, is the curl.exeshipped in Windows 10 – that’s maybe one billion installs of curl that don’t bundle zlib.
If I understand things correctly, the situation is similar in sqlite: it doesn’t always ship with a zlib dependency.
I asked my twitter followers which one of these three components they guess is the most widely used one. Very unscientifically and of course skewed towards libcurl (since I asked and I have a curl bias),
The over 2,000 respondents voted libcurl with a fairly high margin.
What did I miss?
Did I miss a contender?
Have I overlooked some stats that make one of these win?
Updates: Since this was originally posted, I have had OpenSSL, expat and the Linux kernel proposed to me as additional finalists and possibly most-used components.
There’s this new TV-show on Swedish Television (SVT) called Hackad (“hacked” in English), which is about a team of white hat hackers showing the audience exactly how vulnerable lots of things, people and companies are and how they can be hacked using various means. In the show the hackers show how they hack into peoples accounts, their homes and their devices.
Generally this is done in a rather non-techy way as they mostly describe what they do in generic terms and not very specifically or with technical details. But in some short sequences the camera glances over a screen where source code or command lines are shown.
I’ve joked with friends and said that we should have a competition to see whom among us have the largest number of curl installations in their homes. This is of course somewhat based on that I claim that there are more than ten billion curl installations in the world. That’s more installations than humans. How many curl installations does an average person have?
Amusingly, someone also asked me this question at curl presentation I did recently.
I decided I would count my own installations to see what number I could possibly come up with, ignoring the discussion if I’m actually could be considered “average” in this regard or not. This counting includes a few assumptions and estimates, but this isn’t a game we can play with complete knowledge. But no crazy estimates, just reasonable ones!
I decided to count my entire household’s amount just to avoid having to decide exactly which devices to include or not. I’m counting everything that is “used regularly” in my house (things that haven’t been used within the last 12 months don’t count). We’re four persons in my household. Me, my wife and my two teenage kids.
Okay. Let the game begin. This is the Stenberg household count of October, 2021.
Computer Operating Systems
4: I have two kids who have one computer each at home. One Windows 10 and one macOS. They also have one ChromeOS laptop each for school.
3: My wife has no less than three laptops with Windows 10 for work and for home.
3: I have three computers I use regularly. One Windows 10 laptop and two Debian Linuxes (laptop + desktop).
1: We have a Windows 10 NUC connected to the living room TV.
Subtotal: 11 full fledged computers.
Tricky. In the Linux machines, the curl installation is often shared by all users so just because I use multiple tools (like git) that use curl doesn’t increase the installation count. Presumably this is also the same for most macOS and ChromeOS apps.
On Windows however, applications that use libcurl use their own private build (as Windows itself doesn’t provide libcurl, only the curl tool) so they would count as additional installations. But I’m not sure how much curl is used in the applications my family use on Windows. I don’t think my son for example plays any of those games in which I know they use curl.
I do however have (I counted!) 8 different VMs installed in my two primary development machines, running Windows, Linux (various distros for curl testing) and FreeBSD and they all have curl installed in them. I think they should count.
Subtotal: 8 (at least)
Phone and Tablet Operating Systems
2: Android phones. curl is part of AOSP and seem to be shipped bundled by most vendor Androids as well.
1: Android tablet
2: iPhones. curl has been part of iOS since the beginning.
1: iOS tablet
Phone and tablet apps
6 * 5: Youtube, Instagram. Spotify, Netflix, Google photos are installed in all of the mobile devices. Lots of other apps and games also use libcurl of course. I’ve decided to count low.
Subtotal: 30 – 40 yeah, the mobile apps really boost the amount.
TV, router, NAS, printer
1: an LG TV. This is tricky since I believe the TV operating system itself uses curl and I know individual apps do, and I strongly suspect they run their own builds so more or less every additional app on the TV run its own curl installation…
1: An ASUS wifi router I’m “fairly sure” includes curl
1: A Synology NAS I’m also fairly sure has curl
1: My printer/scanner is an HP model. I know from “sources” that pretty much every HP printer made has curl in them. I’m assuming mine does too.
Subtotal: 4 – 9
I have half a dozen wifi-enabled powerplugs in my house but to my disappointment I’ve not found any evidence that they use curl.
I have a Peugeot e2008 (electric) car, but there are no signs of curl installed in it and my casual Google searches also failed me. This could be one of the rarer car brands/models that don’t embed curl? Oh the irony.
I have a Fitbit Versa 3 watch, but I don’t think it runs curl. Again, my googling doesn’t show any signs of that, and I’ve found no traces of my Ember coffee cup using curl.
My fridge, washing machine, dish washer, stove and oven are all “dumb”, not network connected and not running curl. Gee, my whole kitchen is basically curl naked.
We don’t have game consoles in the household so we’re missing out on those possible curl installations. I also don’t have any bluray players or dedicated set-top/streaming boxes. We don’t have any smart speakers, smart lightbulbs or fancy networked audio-players. We have a single TV, a single car and have stayed away from lots of other “smart home” and IoT devices that could be running lots of curl.
Subtotal: lots of future potential!
11 + 8 + 6 + 30to40 + 4to9 = 59 to 74 CIPH (curl installations per household). If we go with the middle estimate, it means 66.
16.5 CIPC (curl installations per capita)
If the over 16 curl installations per person in just this household is an indication, I think it may suggest that my existing “ten billion installations” estimate is rather on the low side… If we say 10 is a fair average count and there are 5 billion Internet connected users, yeah then we’re at 50 billion installations…
The half-hour presentation will include details such as:
Basic fundamentals in the libcurl API and a look on the common data types and concepts.
Setting up and understanding a first libcurl transfer.
Differences between the two primary libcurl transfer interfaces: easy and multi.
A look at the most commonly used libcurl options
Suggestions on how and where to take the next steps
The plan is to make this presentation work independently of platform, compiler and IDE choice and it will focus on C/C++ code. Still, since most libcurl bindings are very “thin” and often mimics the C API fairly closely, it should be valuable and provide good information even for you who plan to write your libcurl-using applications in other languages.
We’ll also end the session with a Q&A-part of course so queue up your questions!
The presentation will be recorded and made available after the fact.
In the curl project we keep track of and say thanks to every single contributor. That includes persons who report bugs or security problems, who run infrastructure for us, who assist in debugging or fixing problems as well as those who author code or edit the website. Those who have contributed to make curl to what it is.
Exactly today October 4th 2021, we reached 2,500 names in this list of contributors for the first time. 2,500 persons since the day curl was created back in March 1998. 2,500 contributors in 8599 days. This means that on average we’ve seen one new contributor helping out in the project every 3.44 days for almost twenty-four years. Not bad at all.
As can be seen on the graph below plotting the number of people in the THANKS file, the rate of newcomers have increased slowly over the years and we’ve added new names at the rate of about two hundred per year recently. There’s a chance that we will add the next 2,500 names to the list faster than twenty-four years. The latest 1,000 contributors have been added since the beginning of 2017, so in less than five years.
The thanks page on the website is usually synced at release time so it is always a little bit behind compared to what’s recorded in the curl git repository.
The graph bump back in 2005: it was a one-time sweep-up where I went through our entire history and made sure that all names of people who were previously mentioned and who had helped were added correctly to the document. Since then, we’ve kept better track and make sure to add new names as we go along.
We of course collect the names of the contributors primarily by the use of scripts, which is also the best way to avoid some slipping through.
We always mention contributors and helpers in git commits, and they should be “marked” correctly for scripts to be able to extract them
We keep a list of contributors per-release in the RELEASE-NOTES document. When we commit updates to RELEASE-NOTES, we use the fixed commit message ‘synced’ to have our tools use that as a marker.
To get the updated list of contributors since the previous update of RELEASE-NOTES, we use the scripts/contributors.sh script.
For some TLS connections you want the secrets you exchange over them to remain private for decades to come.
So what if someone in the future produces a computer system that can crack all the common current encryption algorithms in no time and they already have past secret communications stored?
Such a possible future computer system that might do this is believed to be the quantum computer. There are early and tiny versions of such machines already in existence, but they are far from strong enough to be cracking any strong ciphers today. The question is then how long it takes until they will be able to do that, and thus for how long recorded secret communications can expect to remain secret. 10 years? 20? 30?
If there’s a capable quantum computer made available in let’s say twenty years time, our currently most common TLS ciphers are then rendered next to worthless in twenty years. If you want your communication to remain private even after the introduction of quantum computers, you need post-quantum safe algorithms for your TLS data, and you need a post-quantum curl to use those ciphers for your transfers!
My colleagues at wolfSSL have recently been working on making sure that the library with the same name has support for a set of ciphers that are post-quantum safe. That work has been merged into wolfSSL’s git repository and will be part of a future pending release. That “future release” is hopefully just a few weeks off now.
In association with that, we’ve also made sure that curl built with wolfSSL can take advantage of these powers. The necessary curl changes for this have landed in git and will be part of the pending curl 7.80.0 release.
Use it with curl
To make your curl transfers post-quantum safe today, all you need to do is:
make sure you have a wolfSSL build and install with the proper algorithms enabled
build curl from git (or wait for the 7.80.0 release) and tell it to use wolfSSL for TLS
specify a post-quantum curve when you invoke curl
curl --curve SABER_LEVEL5 https://example.com
The success of such a TLS 1.3 handshake with a server then of course also requires that you communicate with a server that conversely also supports quantum-safe algorithms. This not terribly common yet.
The primary curl pull-request for this feature was authored by Anthony Hu.
I work a lot on my own. I mean, I plan a lot of what to do on a daily basis myself, I execute a lot of it myself and I push my code and changes to various git repositories, often solo. I work quite a lot.
In a lot of the cases, I work together with one or more persons in each individual case, but very often that’s one or a few different persons involved in each and every one.
Yet I work at a company with colleagues, friends, managers and sales people who occasionally wonder what I’ve been up to recently and what I’m working on right now.
To share information, to combat my feeling of working in complete solitude and to better sync work with colleagues, I’ve been sending out a weekly report every Friday. It briefly explains what I did this week, what I blogged about and what I’m up to the next week.
I’ve done this on and off since I joined wolfSSL, and a while ago it dawned on me that since I do most of my work on open source code and in general in the open, I could just as well just make my “reports” available to the entire world. Or rather: those who care and are interested can find them and read them!
Minor details are still hush hush
Since I do commercial curl work with and for other companies, I need to not spill the beans on things like actual secrets and most company names will be anonymized. I hope that won’t interfere too much.
I decided to make it available on GitHub like this:
I’ve been traveling this road for a while. Here’s my collection of 15 of the most common mistakes and issues people will run into when writing applications and services that use libcurl. I’ve also done recorded presentations on this topic that you can watch if you prefer that medium.
Most of these issues are shared among application authors independently of what language the program is written in – as libcurl bindings tend to be very thin and more or less expose the API in the same way the C API does. Some mistakes are however C and C++ specific.
1. Skipping the documentation!
Nothing in my list here is magic, hidden or unknown. Everything is documented and well-known. The by far most common mistakes are done by people not reading up, rushing a bit too fast and sometimes making a little too many assumptions. Of course there’s also occasional copy-and-pasting from bad examples going on. The web is full of questionable source snippets to get inspiration from.
We spend a significant amount of time and energy on making sure the documentation is accurate, detailed and thorough. Many mistakes can be avoided by simply reading up a little more first!
This sounds like such an obvious thing but we keep seeing this happen over and over again: users write code that uses libcurl functions but they don’t check the return codes.
If libcurl detects an error it will return an error code. Whenever libcurl doesn’t do what you expected it to do, it very often turns out to have returned an error code to the application that explains the behavior. We work hard at making sure libcurl functions return the correct return codes!
The libcurl examples we host on the curl web site (and ship in curl tarballs) are mostly done without error checks – for the sole purpose of making them smaller and easier to read as that removes code that isn’t strictly about libcurl.
3. Forgetting the verbose option
CURLOPT_VERBOSE is the libcurl user’s best friend. Whenever your transfer fails or somehow doesn’t do what you expected it to, switching on verbose mode should be one of the first actions as it often gives you a lot of clues about what’s going on under the hood.
Of course, you can also go further and use CURLOPT_DEBUGFUNCTION to get every more details, but usually you can save that for the more complicated issues.
HTTP/1.1 301 Moved Permanently Server: M4gic server/3000 Retry-After: 0 Location: https://curl.se/ Content-Length: 0 Accept-Ranges: bytes Date: Thu, 07 May 2020 08:59:56 GMT Connection: close
When you let libcurl handle redirects, consider limiting to what protocols you should allow redirects (CURLOPT_REDIR_PROTOCOLS), and of course you must remember that crafty users will figure out ways to redirect responses to potentially malicious servers given the chance.
Do not set custom HTTP methods on requests that follow redirects.
6. Let users set (parts of) the URL
Don’t do that. Unless you have considered the consequences and make sure you deal with them appropriately.
If you really insist that you need to let your users set the URL, restrict and carefully filter exact what parts and with what they can change it to.
The reason is of course that libcurl often supports other protocols than the one(s) you had in mind when you write your application. And users can do other crafty things to make host names point to other servers (which of course TLS based protocols will reject), abuse free-form URL input fields to pass on unexpected data (sometimes including newlines and other creative things) to your servers or have your application talk to malicious servers.
You can limit what protocols your application supports with CURLOPT_PROTOCOLS and you can parse URLs with the curl_url_set() function family before you pass them to curl to make sure given URLs make sense!
7. Setting HTTP method
Setting the custom HTTP request method with CURLOPT_CUSTOMREQUEST is most often done completely unnecessary, frequently causing problems and only very rarely actually done correctly.
The primarily problems with setting this option are:
if you also ask libcurl to follow redirects, this custom method will be used in follow-up requests as well, even if the server indicate wanting a different one in the HTTP response code
it doesn’t actually change libcurl’s behavior or expectations, it only changes the string libcurl sends in the request.
8. Disabled certificate checks
libcurl allows applications to disable TLS certificate checks with the two options CURLOPT_SSL_VERIFYPEER and CURLOPT_SSL_VERIFYHOST. This is powerful and at times very handy while developing and/or experimenting. It is also a very bad thing to ship in your product or deploy in your live service.
Disabling the certificate check effectively removes the TLS protection from the connections!
Searching for these option names using source code search engines or just on github will show you hundreds or thousands of applications that leave these checks disabled. Don’t be like them!
9. Assume zero terminated data in callbacks
libcurl has a series of different callbacks in its API. Some of these callbacks delivers data to the application and that data is then typically offered with a pointer and a size of that data.
The documentation very clearly stipulates that this data is not zero terminated – you cannot and should not use C functions on the data that works on “C strings” (that assume a terminating, trailing, zero byte). It seems especially common when the data that is delivered is something like HTTP headers, which is text based data and seems to lure people into assuming a zero terminator.
10. C++ strings are not C strings
libcurl is a C library with a C API for maximum portability and availability, yet a large portion of libcurl users are actually writing their programs in C++.
This is not a problem. You can use the libcurl API perfectly fine from C++.
Passing “strings” to libcurl must however be done with the C approach: you pass a pointer to a zero terminated buffer. If you pass a reference to a C++ string object, libcurl will not know what it is and it will not get or use the string correctly. It will fail in mysterious ways!
Something like this:
// Keep the URL as a C++ string object
// Pass it to curl as a C string!
curl_easy_setopt(curl, CURLOPT_URL, str.c_str());
11. Threading mistakes
libcurl is thread-safe, but there are some basic rules and limitations that you need to follow and adhere to, as detailed in the document linked to:
This callback might be called none, one, two or many times. Never assume you will get a certain amount of calls. The number of invokes is independent of the data amount and vary rather because of network, server, kernel or other reasons. Don’t assume the same invocation pattern will repeat!