The webinar will be live and we end with a Q&A session where you can ask me anything, in particular about this release and curl post quantum, but not necessarily limited to that…
I still remember the RFC number off the top of my head for the first multipart formdata spec that I implemented support for in curl. Added to curl in version 5.0, December 1998. RFC 1867.
Multipart formdata is the name of the syntax for how HTTP clients send data in a HTTP POST when they want to send binary content and multiple fields. Perhaps the most common use case for users is when uploading a file or an image with a browser to a website. This is also what you fire off with curl’s -F command line option.
RFC 1867 was published in November 1995 and it has subsequently been updated several times. The most recent incarnation for this spec is now known as RFC 7578, published in July 2015. Twenty years of history, experiences and minor adjustments. How do they affect us?
I admit to having dozed off a little at the wheel and I hadn’t really paid attention to the little tweaks that slowly had been happening in the multipart formata world until Ryan Sleevi woke me up.
Percent-encoding
While I wasn’t looking, the popular browsers have now all switched over to a different encoding style for field and file names within the format. They now all use percent-encoding, where we originally all used to do backslash-escaping! I haven’t actually bothered to check exactly when they switched, primarily because I don’t think it matters terribly much. They do this because this is now the defined syntax in WHATWG’s HTML spec. Yes, another case of separate specs diverging and saying different things for what is essentially the same format.
curl 7.80.0 is probably the last curl version to do the escaping the old way and we are likely to switch to the new way by default starting in the next release. The work on this is done by Patrick Monnerat in his PR.
For us in the curl project it is crucial to maintain behavior, but it is also important to be standard compliant and to interoperate with the big world out there. When all the major browsers have switched to percent-encoding I suspect it is only a matter of time until servers and server-side scripts start to assume and require percent-encoding. This is a tricky balancing act. There will be users who expect us to keep up with the browsers, but also some that expect us to maintain the way we’ve done it for almost twenty-three years…
libcurl users at least, will be offered a way to switch back to use the old escaping mechanism to make sure applications that know they work with older style server decoders can keep their applications working.
Landing
This blog post is made public before the PR mentioned above has been merged in order for people to express opinions and comments before it is done. The plan is to have it merged and ship in curl 7.81.0, to be released in January 2022.
The official gitstats page shows that I’ve committed changes on almost 4,600 separate days since the year 2000.
16,000 commits is 13,413 commits more than the person with the second-most number of commits: Yang Tse (2587 commits). He has however not committed anything in curl since he vanished from the project in 2013.
I have also done 4,700 commits in the curl-www repository, but that’s another story.
For a long time I have been wanting to avoid us to ever reach curl version 7.100.0. I strongly suspect that going three-digits in the minor number will cause misunderstandings and possibly even glitches in people’s comparison scripts etc. If nothing else, it is just a very high number to use in a version string and I believe we would be better off by starting over. Reset the clock so to speak.
Given that, a curl version 8.0.0 is inevitably going to have to happen and since we do releases every 8 weeks and we basically bump the version number in just about every release, there is a limited amount of time left to avoid the minor number to reach 100. We just shipped curl 7.80.0, so we have less than 20 release cycles in the worst case; a few years.
A while ago it struck me that we have a rather big anniversary coming up, also within a few years, and that is curl’s 25th birthday.
Let’s combine the two !
On March 20, 2023, curl turns 25 years old. On that same day, the plan is to release curl 8.0.0, the first major number bump in (by then) 23 years.
The major number bump will happen independently of what features we add or not, independently if there are any new bells and whistles to celebrate or “just” a set of bug-fixes landed. As we always do, the release is time based and not feature based, but the different unique property that this particular release will have is that it will also reset the minor version number and increase the major version number!
In the regular curl release cycles we always do releases every eight weeks on Wednesdays and if we ever adjust the cycle, we do it with full weeks and stick to releasing on Wednesdays.
March 20, 2023 however is a Monday. curl 8.0.0 will therefore also be different in the way that it will not be released on a Wednesday. We will also have to adjust the cycle period before and after this release, since they cannot then be eight weeks cycles. The idea is to go back to Wednesdays and the regular cycles again after 8.0.0 has happened.
This new option provides yet another knob for applications to control and limit connection reuse. Using this option, the application sets an upper limit that specifies that connections may not be older than this when being reused. It can for example be used to make sure connections don’t “get stuck” on one single specific load-balancer for extended periods of time.
This new callback gets called immediately before the request is started (= “Pre req”). Gives the application a heads up and some details about what is just about to start.
libssh2: SHA256 fingerprint support
This is a bump up from the previous MD5 version. Make sure that curl connections to the correct host with a cryptographically strong fingerprint check.
When a URL API function returns an error it does so using a CURLUcode type. Now there’s a function to convert this error code into an error message.
support UNC paths in file: URLs on Windows
The URL parser now understands UNC paths when parsing URLs on Windows.
allow setting of groups/curves with wolfSSL
The wolfSSL backend now allows setting of specific curves for TLS 1.3 connections, which allows users to use post quantum algorithms if wolfSSL is built to support them!
Bug-fixes
This is another release with more than one hundred individual bug-fixes, and here are a selected few I think might be worth highlighting.
more hyper work
I’ve done numerous small improvements in this cycle to take the hyper backend closer to become a solid member of the curl backend family.
print curl --help descriptions aligned right
When listing available options with --help or -h, the list is now showing the descriptions right-aligned which makes the output more easy-to-read in my opinion. Illustration:
store remote IP address for QUIC connections too
HTTP/3 uses QUIC and with this bug fixed, the %{remote_ip} variable for --write-out works there as well – as you’d expect. This fixes the underlying CURLINFO_PRIMARY_IP option.
reject HTTP response codes < 100
The HTTP response parser no longer accepts response code numbers below 100 as a legitimate protocol. The HTTP protocol has never specified any such code so this should not cause any problems. Lots of other HTTP clients already enforce this requirement too.
do not wait for writable socket if there’s no remote HTTP/2 window
If curl runs out of remote HTTP/2 window for a stream while uploading, ie the other end says it can’t receive any data right now, curl would still wait for the socket to be writable which would cause really bad busy-loops.
get libssh2 version at runtime
curl now asks libssh2 for its version string in runtime instead of showing the version that was used back when the curl binary was built, as it might very well be upgraded dynamically after the build!
require all man pages to use the same section headers in the same order
We tighten the bolts even more to make the libcurl documentation consistent. Now all libcurl man pages have to feature the same set of headers in the same order to not cause test failure. This includes a required example section. We also added an extra check to detect a common backslash-wrong-formatting mistake that we’ve previously done several times in man page examples.
NTLM: use DES_set_key_unchecked with OpenSSL
Turns out that the code that is implemented to use OpenSSL for doing NTLM authentication was using a function call that returns error if a “bad” key is used. NTLM v1 being a very weak algorithm in general makes it very easy to end up calling the function with such a weak key and then the NTLM authentication failed…
openssl: if verifypeer is not requested, skip the CA loading
If peer verification is disabled for a transfer when curl is built to use OpenSSL or one of its forks, curl will now completely skip the loading of the CA cert bundle. It was basically only used for being able to show in the verbose output if there was a match or not – that was then ignored anyway – and by skipping the load step the TLS handshake will use less memory and CPU resources.
urlapi: URL decode percent-encoded host names
The URL parser did not accept percent encoded host names. Now it does. Note however that libcurl will not by default percent-encode the host name when extracting a URL for the purpose of keeping IDN names working. It’s a little complicated.
ngtcp2: use QUIC TLS RFC9001
We switch over to use the “real” QUIC identifier when setting up connections instead of using the identifier made for a previous draft version of the protocol. curl still asks h3-29 for HTTP/3 since that RFC has still not shipped, but includes h3 as well – since it turns out some servers assume plain h3 when the final QUIC v1 version is used for transport.
a failed etag save now only fails that transfer
Specifying a file name to save etags in will from now on only fail those transfers using that file. If you specify more transfers that use another file or not use etags at all, those transfers can still get done.
Added test case for checksrc!
The custom tool we use for checking code style compliance, checksrc, has become quite advanced and now checks for a lot of source code details, and when we once again improved it in this release cycle we also added a test case for the tool itself to make sure it maintains its functionality even when we keep improving it going forward!
Next
The next release is planned for January 5, 2022. We have several changes queued up as pull requests already so I’d say it is likely that it then will become version 7.81.0.
Support?
I offer commercial support and curl related contracting for you and your company!
The curl project’s source code has been hosted on GitHub since March 2010. I wrote a blog post in 2013 about what I missed from the service back then and while it has improved significantly since then, there are still features I miss and flaws I think can be fixed.
For this reason, I’ve created and now maintain a dedicated git repository with feedback “items” that I think could enhance GitHub when used by a project like curl:
The purpose of this repository is to allow each entry to be on-point and good feedback to GitHub. I do not expect GitHub to implement any of them, but the better we can present the case for every issue, the more likely I think it is that we can gain supporters for them.
What makes curl “special”
I don’t think curl is a unique project in the way we run and host it. But there are a few characteristics that maybe make it stand out a little from many other projects hosted on GitHub:
curl is written in C. It means it cannot be part of the “dependency” and related security checks etc that GitHub provides
we are git users but we are not GitHub exclusive: we allow participation in and contributions to the project without a GitHub presence
we are an old-style project where discussions, planning and arguments about project details are held on mailing lists (outside of GitHub)
we have strict ideas about how git commits should be done and how the messages should look like etc, so we cannot accept merges done with the buttons on the GitHub site
You can help
Feel free to help me polish the proposals, or even submit new ones, but I think they need to be focused and I will only accept issues there that I agree with.
I’ve previously said that curl is one of the most widely used software components in the world with its estimated over ten billion installations, and I’m getting questions about it every now and then.
— Is curl the most widely used software component in the world? If not, which one is?
We can’t know for sure which products are on the top list of the most widely deployed software components. There’s no method for us to count or estimate these numbers with a decent degree of certainty. We can only guess and make rough estimates – and it also depends on exactly what we count. And quite probably also depending on who‘s doing the counting.
First, let’s acknowledge that SQLite already hosts a page for mostly deployed software module, where they speculate on this topic (and which doesn’t even mention curl). Also, does this count number of devices running the code or number of installs? If we count devices, does virtual machines count? Is it the number of currently used installations or total number of installations done over the years?
Choices
The SQLite page suggests four contenders for the top-5 list and I think it is pretty good:
zlib (the original implementation)
libpng
libjpeg
sqlite
I will go out on a limb and say that the two image libraries in the list, while of course very widely used, are not typically used on devices without screens and in the IoT world of today, such devices are fairly common. Light bulbs, power switches, networking gear etc. I think it might imply that they are slightly less used than the others in the list. Secondarily, libjpeg seems to not actually be around, but there are a few other successors that are used? Ie not a single implementation.
All top components are Open Source (sqlite’s situation is special but they still call it open source), and I don’t think it is a coincidence.
Are there other contenders not mentioned here? I figure maybe some of the operating systems for the tiniest devices that ship in the billions could be there. But I’m not sure there’s any such obvious market dominant player. There are other compression libraries too, but I doubt they reach the levels of zlib at this moment.
Someone brings up the Linux kernel, which certainly is very well used, but all Android devices, servers, windows 10 etc probably don’t make the unit count go over 7 billion and I believe that in virtually all Linux these kernel installs, curl, zlib and sqlite also run…
Similarly to how SQLite forgot to mention curl, I might of course also have a blind eye for some other really well-used code block.
The finalists
We end up with three finalists:
zlib
sqlite
libcurl
I think it is impossible for us to rank these three in an order with any good certainty. If we look at that sqlite list of where it is used, we quickly recognize that zlib and libcurl are deployed in pretty much all of them as well. The three modules have a huge overlap and will all be installed in billions of devices, while of course there are also plenty that only install one or two of them.
I just can’t figure out the numbers that would rank these modules in the top-list.
The SQLite page says: our best guess is that SQLite is the second mostly widely deployed software library, after libz. They might of course be right. Or wrong. They also don’t specify or explain how they do that guess.
libc
Whenever I’ve mentioned widely used components in the past, someone has brought up “libc” as a contender. But since there are many different libc implementations and they are typically done for specific platforms/operating systems, I don’t think any single of the libc implementations actually reach the top-5 list.
zlib in curl/sqlite
Many people says zlib, partly because curl uses it, but then I have to add that zlib is an optional dependency for curl and I know many, including large volume, users that ship products with libcurl that doesn’t use zlib at all. One very obvious and public example, is the curl.exeshipped in Windows 10 – that’s maybe one billion installs of curl that don’t bundle zlib.
If I understand things correctly, the situation is similar in sqlite: it doesn’t always ship with a zlib dependency.
The poll
I asked my twitter followers which one of these three components they guess is the most widely used one. Very unscientifically and of course skewed towards libcurl (since I asked and I have a curl bias),
The over 2,000 respondents voted libcurl with a fairly high margin.
What did I miss?
Did I miss a contender?
Have I overlooked some stats that make one of these win?
Updates: Since this was originally posted, I have had OpenSSL, expat and the Linux kernel proposed to me as additional finalists and possibly most-used components.
There’s this new TV-show on Swedish Television (SVT) called Hackad (“hacked” in English), which is about a team of white hat hackers showing the audience exactly how vulnerable lots of things, people and companies are and how they can be hacked using various means. In the show the hackers show how they hack into peoples accounts, their homes and their devices.
Generally this is done in a rather non-techy way as they mostly describe what they do in generic terms and not very specifically or with technical details. But in some short sequences the camera glances over a screen where source code or command lines are shown.
Similar to the fictional mr Robot, a readily available tool to use to accomplish what you want is of course… curl. In episode 4, we can easily spot curl command lines in several different shots.
Jesper Larsson is one of the hackers in the show and he responded on this blog post about them using curl, on Twitter:
Have you been curious about getting your feet wet with doing Internet transfers with libcurl but reasons (excuses?) have kept you away? Maybe it has felt as a too big step to take?
Fear not, on October 21 I’m doing a free webinar on Getting started with libcurl detailing useful first steps on how to get your initial application off the ground!
The half-hour presentation will include details such as:
Basic fundamentals in the libcurl API and a look on the common data types and concepts.
Setting up and understanding a first libcurl transfer.
Differences between the two primary libcurl transfer interfaces: easy and multi.
A look at the most commonly used libcurl options
Suggestions on how and where to take the next steps
The plan is to make this presentation work independently of platform, compiler and IDE choice and it will focus on C/C++ code. Still, since most libcurl bindings are very “thin” and often mimics the C API fairly closely, it should be valuable and provide good information even for you who plan to write your libcurl-using applications in other languages.
We’ll also end the session with a Q&A-part of course so queue up your questions!
The presentation will be recorded and made available after the fact.
Register
To participate on the live event, skip over and sign up for it.
The event will take place on October 21, 2021 at 10:00 PDT (check your time zone)
In the curl project we keep track of and say thanks to every single contributor. That includes persons who report bugs or security problems, who run infrastructure for us, who assist in debugging or fixing problems as well as those who author code or edit the website. Those who have contributed to make curl to what it is.
Exactly today October 4th 2021, we reached 2,500 names in this list of contributors for the first time. 2,500 persons since the day curl was created back in March 1998. 2,500 contributors in 8599 days. This means that on average we’ve seen one new contributor helping out in the project every 3.44 days for almost twenty-four years. Not bad at all.
The 2,500th recorded contributor was Ryan Mast who brought pull-request 7809.
Thank you everyone who have helped us so far!
As can be seen on the graph below plotting the number of people in the THANKS file, the rate of newcomers have increased slowly over the years and we’ve added new names at the rate of about two hundred per year recently. There’s a chance that we will add the next 2,500 names to the list faster than twenty-four years. The latest 1,000 contributors have been added since the beginning of 2017, so in less than five years.
The thanks page on the website is usually synced at release time so it is always a little bit behind compared to what’s recorded in the curl git repository.
2005
The graph bump back in 2005: it was a one-time sweep-up where I went through our entire history and made sure that all names of people who were previously mentioned and who had helped were added correctly to the document. Since then, we’ve kept better track and make sure to add new names as we go along.
Scripting
We of course collect the names of the contributors primarily by the use of scripts, which is also the best way to avoid some slipping through.
We always mention contributors and helpers in git commits, and they should be “marked” correctly for scripts to be able to extract them
We keep a list of contributors per-release in the RELEASE-NOTES document. When we commit updates to RELEASE-NOTES, we use the fixed commit message ‘synced’ to have our tools use that as a marker.
To get the updated list of contributors since the previous update of RELEASE-NOTES, we use the scripts/contributors.sh script.