Do you remember August 26 2002? I can’t say I particularly do but the curl git log remembers for us that it was on that day we added this TODO item:
Add an option that prevents cURL from overwriting existing local files. When used, and there already is an existing file with the target file name (either -O or -o), a number should be appended (and increased if already existing).
That idea hadn’t even been listed for twenty years before it was converted into code by HexTheDragon and landed in curl the other day (with this commit). To get included in the pending curl 7.83.0 release.
--no-clobber
This new command line option (curl’s 247th) is called --no-clobber and it works as suggested already back in 2002. If the output file already exists at the time when curl wants to create it, it will instead append a number to the end of the name. If that file also exists, curl retries iteratively with numbers up to a 100 before it gives up and returns error.
To help you write even cooler scripts. Oh, and the -w variable %{filename_effective} will show this actually used file name.
We have just merged curl’s 246th command line option: --remove-on-error (with this commit). To be included in the upcoming curl 7.83.0 release.
This command line option is quite simple and does exactly what the name suggests. If you tell curl to download something into a local file and something goes wrong in that transfer – that makes curl return an error – this option will make curl remove that file rather than leaving the leftovers on disk in a possibly partial file.
The option is in fact slightly more useful in more complicated cases, like when you want to download lots files in parallel and some of them might fail and you rather only keep the files that actually were transferred successfully:
Welcome to the 206th curl release, 59 days since we shipped curl 7.81.0. The extra three days because I was away on the day the release would normally have been done. (I call it Impartial Content as a little play on the HTTP 206 response code message.)
the 206th release 2 change 59 days (total: 8,751) 173 bug-fixes (total: 7,691) 266 commits (total: 28,321) 0 new public libcurl function (total: 86) 0 new curl_easy_setopt() option (total: 295) 1 new curl command line option (total: 245) 67 contributors, 39 new (total: 2,597) 43 authors, 24 new (total: 1,014) 0 security fixes (total: 111) 0 USD paid in Bug Bounties (total: 16,900 USD)
Changes
There are only two changes this time around:
The JSON option
With the new --json command line option, curl suddenly made it more convenient to send JSON from command lines and shell scripts.
MesaLink: removed support
curl supports a crazy amount of different TLS libraries, but the amount was now decreased by one (to 13) as we officially drop support for MesaLink. The library is not developed anymore so we don’t want to encourage future users to go down that road.
Bug-fixes
Here are some of my favorite bug-fixes from this release cycle.
bearssl
We landed three notable fixes for the bearssl backend which should make users of it happy. For cert expiration, incomplete CA certs and session resumption.
strlen call removals
After I posted a library-call count to the mailing list that showed quite a large number of calls to strlen(), a cleanup race started that subsequently reduced the number of calls by over 60% in some use cases! Primarily replaced by compile-time constants.
configure requires –with-nss-deprecated
To build curl with the NSS backend using configure, you now need to confirm this choice by also passing on --with-nss-deprecated to make it clear to users what the future looks like for our NSS support.
erase some more sensitive command line arguments
After a lengthy discussion we now hide even more command line arguments from appearing in ps output (on systems that support it). Since the hiding is done by curl itself, there is still a short moment during which they will be visible, plus that we cannot hide everything so there is still a risk that some argument might leak information unwillingly. That is the nature of command line arguments. Use the config file concept or stdin etc to work around that.
NPN is deprecated
The TLS extension NPN is now marked “deprecated” and is scheduled for removal in six months unless someone yells very loudly and explains why not. This extension was once used to negotiate SPDY and early HTTP/2 but have no purpose these days. The browsers removed support for it several years ago.
allow CURLOPT_HTTPHEADER change “:scheme”
The only pseudo header for HTTP/2 and HTTP/3 that couldn’t be modified by a user can now be changed at will.
remove support for TPF, Netware, vxWorks
Support for these platforms for which the code haven’t been modified for the last decade or so, and therefore are highly unlikely to still work, were dropped. After this, I had it confirmed that you can still build curl for vxWorks using the “regular” build!
remove support for CURL_DOES_CONVERSIONS
After support for TPF was dropped, we took the next step and removed support for the charset conversion functions necessary to run curl on non-ASCII platforms such EBCDIC using ones. As TPF was the only/last platform such platform we supported, this cleanup improved lots of code paths.
allow user callbacks to call curl_multi_assign
A regression in 7.81.0 made curl_multi_assign() return error if used from a callback.
http3: quiche and ngtcp2 fixes
We landed several fixes in both HTTP/3 backends, improving the situation for everyone who plays HTTP/3 with curl.
reduce memory use when FTP is disabled
After several cleanups the total memory footprint for builds with FTP and/or proxy support disabled has been reduced.
check for ~/.config/curlrc too
curl now also checks for its default “config file” in the path mentioned above.
DNS options that need c-ares now fail without it
The command line tool offers a set of functions to control DNS specific details, and since those options only work if libcurl was built to use c-ares and not at all if it was built to use another resolver backend, curl will now correctly return error when one of those options are used when libcurl can’t execute them.
keep trailing dot in host name
If there is a trailing dot after the host name in the URL, that dot is now kept in the name when used everywhere internally – except for the SNI field in TLS.
wolfssl: when SSL_read() returns zero, check the error
Even while obviously very rare, curl could wrongly return an “end of transfer” prematurely before this fix.
Next
The next release is scheduled to ship on April 27, 2022.
In the beginning of every year I jot down a couple of “larger things” I would like to work on during the year. Some of the ideas are more in the maybe category and meant to be tested on the audience and users to see what others think and want, others are features I believe are ripe and ready for addition.
I talk about what I personally plan and consider to work on as I cannot control or decide what volunteers and random contributors will do this year, but of course with the hope that feedback from users and customers will guide me.
If you have use cases, applications or devices with needs or interests in particular topics going forward: let me know! At this webinar or outside of it.
The curl roadmap 2022 webinar will happen on February 17th at 09:00 PST (17:00 UTC, 18:00 CET). Register to attend on this link:
I’m the lead developer in the curl project. We make the command line tool curl and the library libcurl, for doing Internet transfers. The command line tool uses the library for all the internet transfer heavy lifting.
The command line tool is somewhat of a shell binding to access libcurl.
We make these things. We recently surpassed 1,000 authors. I lead the project and I have done the most number of commits per month in curl for the last 79 months, and in fact in 222 of the 267 months we have stats for.
The tool came first
curl was born in 1998 as a command line tool only. Two years in, we (well, mostly me actually) remodeled some internals and shipped the first libcurl to the world in August 2000. The idea was (and still is) to provide the same Internet transfer powers the tool has to any application or device out there that need it.
Code sizes
The library side of things is right now about 85% of the total product code. A little over 120,000 lines right now, including comments.
Pie chart showing code distribution between tool and library
Users at scale
Many users think of the curl project as equivalent to the curl tool and the command line tool certainly has a lot of users. It is available for and on all the popular platforms. It is impossible to count curl command users, but millions should not be an exaggeration.
While it seems likely that more users are using the tool than is writing applications that use libcurl, each product, service or device that uses libcurl can themselves scale up the volumes. A single libcurl user can use libcurl in an application used by billions. A few hundreds or thousands of libcurl users populate the world with things transferring data with our library. (A noticeable share of the current Internet traffic is likely driven by libcurl.)
The net result is therefore that libcurl runs in several thousand times more installations than the tool.
If we visualize the number of curl users as a yellow ball and the libcurl installations as a green ball, putting them next to each other would look something like this:
Planet curl is barely visible next to planet libcurl. Install and user numbers are estimated.
(Image math: 3 million curl users, 10 billion libcurl installations. Yellow sphere radius is 90 vs the green’s 1340)
Complexity
Making a command line tool is much easier than doing a library. The command line tool has just one entry point and the interface is limited. A command line and associated files and pipes to read from. In our case, the tool lets libcurl deal with a lot of platform specifics which makes the command line code generic and mostly identical on all platforms it runs on.
A library has many more entry points, each that needs to be written to care for what the users might pass in to them. libcurl has code to work with millions of build combinations, and (right now) up to 35 different external libraries. (13 different TLS libraries, 3 different SSH libraries, 2 different QUIC/HTTP/3 libraries, 3 different compression libraries etc.)
libcurl is used in many more challenging environments such as niche operating systems, scaled up to thousands of concurrent transfers, systems that never exits and in builds with a creative set of features disabled at build-time.
Compared to the curl tool, libcurl is way more advanced. A bug in the command line tool is often much easier to fix than one in the library. Such bugs are also rarer since it is much simpler and a smaller amount of code.
Where it matters
Because of what I have outlined above, I focus my curl work on library related changes. Both when it comes to bug-fixes and adding features. It scales better to the world, and as one of the designers and architects of most solutions used in libcurl I am in many cases a suitable engineer to work on many of the more complicated problems.
It is smarter for the entire project for me to leave the slightly less complicated problems and more easily understood features to be fixed and added by others. It scales better. After all, I have a finite number of hours per week to spend on curl. I want to make the best use of my time as I can.
Therefore, I tend to leave tool-related things for others to work on.
Where it pays
I work on curl for a living. The companies that pay for support have higher priority than almost any other bug reports, and they tend to be libcurl related. Keeping my paying customers happy is crucial to me. And in a funny way also to others, since that work usually end up benefiting all curl users.
Where its fun
As I work on curl full-time during my workdays and also during a good chunk of my spare time, I need to “lighten up” my work at times and get some variation. Sometimes I can go about and find new little things to work on in the project that maybe isn’t top priority by any means, but are things that could use some polish and are different enough from what I spent the rest of my week on. To give me variation. To keep the fun.
Also, sometimes scratching the surface on a new somewhat forgotten place brings up more important stuff.
Theory meets practice
Years ago I even for a while considered to hand over maintenance of the curl tool to someone else and more distinctly say that I would only work on the library as they are separate entities and could possibly benefit from being worked on independently from each other.
My idea of focusing my work on the more complicated issues, to work on design and architecture and help newcomers find their way into the code doesn’t always work out.
I never gave up maintenance of the tool and a lot of things that someone else could implement or fix in the project aren’t, which makes me eventually get to work on that too anyway. For the good of the project. Also, it makes my work day more varied and fun if I take occasional strolls around the project every now and then and put on some new paint on areas I find could use some.
Summary
Most of my work efforts go into libcurl matters. But I work on the tool too.
The curl “cockpit” is yet again extended with a new command line option: --json. The 245th command line option.
curl is a generic transfer tool for sending and receiving data across networks and it is completely agnostic as to what it transfers or even why.
To allow users to craft all sorts of transfers, or requests if you will, it offers a wide range of command line options. This flexibility has made it possible for a large number of users to keep using curl even as network ecosystems and its (HTTP) use have changed over time.
Craft content to send
curl offers a few convenience options for creating contents to send in HTTP “uploads” . Options such as -F for building multi-part formposts, and --data-urlencode for URL-encoding POST data.
JSON is the new popular kid
When curl was born, JSON didn’t exist. Over the decades since, JSON has grown to become a very commonly used format for structured data, often used in “REST API” calls and more. curl command lines sending and receiving JSON are very common now.
When asking users today what features they would like to see added in curl, and when asking them what good features they see in curl alternatives, a huge chunk of them mention JSON support in one way or another.
Partly because dealing with JSON with curl sometimes make the involved command lines rather long and quirky.
JSON in curl
The discussion has been ignited in the curl community about what, if anything, we should do in curl to make it a smoother and better tool when working with JSON. The offered opinions range from nothing (“curl is content agnostic”) to full-fledged JSON generator, parser and pretty-printer (or a combination in between). I don’t think we are ready to put our collective foot down on where we should go with this.
Introducing --json
While the discussion is ongoing, we still bring a first JSON oriented feature to the tool as a first step. The --json option. This is a new option that basically works as an alias, or shortcut, to sending JSON to an endpoint. You use it like this:
If you want another Content-Type or Accept header, then you can override them as usual with -H. If you use multiple --json options on the same command line, the contents from them will be concatenated before sent off.
Ships!
This new json option was merged in this commit and will be available and present in curl 7.82.0, to be released in early March 2022. Of course you can build it from git or a daily snapshot already now to test it out!
Future JSON stuff
I want to at least experiment a little with some level of JSON creation for curl. To help users who want to create basic JSON on the command line to send off. Lots of users run into problems with this since JSON uses double quotes a lot and then the combination of quoting and unquoting and the shell often make users confused or even frustrated.
Exactly how or to what level is still to be determined. Your opinions and feedback are very valuable and helpful.
jo+curl+jq
This trinity of commands helps you send correctly formatted data and show the returned JSON easily. Like in a command line similar to:
jo name=daniel tool=curl | curl --json @- https://httpbin.org/post | jq
In this presentation I talk about how libcurl’s most important aspect is the stable ABI and API. If we just maintain those, we can change the internals however we like.
libcurl has a system with build-time selected components, called backends. They are usually powered by third party libraries. With the recently added HTTP backend there are now seven “flavors” of backends.
A backend can be provided by a library written in rust, as long as that rust component provides a C interface that libcurl can use. When it does, it being rust or not is completely transparent and a non-issue for libcurl.
curl currently supports components written in rust for three different backends:
The number of commit authors in curl has gradually been increasing over time.
1998-03-20: 1 author recorded in the premiere curl release. It actually took until May 30th 2001 for the second recorded committer (partly of course because we used CVS). 1167 days to bump up the number one notch.
2014-05-06: 250 authors took 5891 days to reach.
2017-08-18: 500 authors required additional 1200 days
2019-12-20: 750 authors was reached after only 854 more days
2022-01:30: reaching 1000 authors took an additional 772 days.
Jan-Piet Mens became the thousandth author by providing pull-request #8354.
These 1,000 authors have done in total 28,163 commits. This equals 28.2 commits on average while the median commit author only did a single one – more than 60% of the authors are one-time committers after all! Only fourteen persons in the project have authored more than one hundred commits.
It took us 8,717 days to reach 1,000 authors, making it one new committer in the project per 8.7 days on average, over the project’s life time so far. The latest 250 authors have “arrived” at a rate of one new author per 3.1 days.
In 2021 we had more commit authors than ever before and also more first-timers than ever. It will certainly be a challenge to reach that same level or maybe even break the record again in 2022.
It took almost 24 years to reach 1,000 authors. How long until we double this?
Visualized on video
To celebrate this occasion, I did a fresh gource video of curl development so far; for the duration of the first 1,000 committers. I spiced it up by adding 249 annotations of source related facts laid out in time.
If you look at the curl/curl repository on GitHub, it appears to be short of a few hundred committers. It says 792 right now.
This is because GitHub can’t count commit authors. The theory is that they somehow ignore committers that do not have GitHub accounts. My author count is instead based on plain git access and standard git commands run against a plain git clone:
I have had my share of adventures with URL parsers and their differences in the past. The current state of my research on the topic of (failed) URL interoperability remains available in this GitHub document.
Use one and only one
There is still no common or standard URL syntax format in sight. A string that you think looks like a URL passed to one URL parser might be considered fine, but passed to a second parser it might be rejected or get interpreted differently. I believe the state of URLs in the wild has never before been this poor.
The problem
If you parse a URL with parser A and make conclusions about the URL based on that, and then pass the exact same URL to parser B and it draws different conclusions and properties from that, it opens up not only for strange behaviors but in some cases for downright security vulnerabilities.
This is easily done when you for example use two different libraries, frameworks or libraries that need to work on that URL, but the repercussions are not always easy to see at once.
The report EXPLOITING URL PARSERS: THE GOOD, BAD, AND INCONSISTENT (by Noam Moshe, Sharon Brizinov, Raul Onitza-Klugman and Kirill Efimov) was published today and I have had the privilege to have read and worked with the authors a little on this prior to its release.
As you see in the report, it shows that problems very similar to those mr Tsai reported and exploited back in 2017 are still present today, although perhaps in slightly different ways.
As the report shows, the problem is not only that there are different URL standards and that every implementation provides a parser that is somewhere in between both specs, but on top of that, several implementations often do not even follow the existing conflicting specifications!
The report authors also found and reported a bug in curl’s URL parser (involving percent encoded octets in host names) which I’ve subsequently fixed so if you use the latest curl that one isn’t present anymore.
curl’s URL API
In the curl project we attempt to help applications and authors to reduce the number of needed URL parsers in any given situation – to a large part as a reaction to the Tsai presentation from 2017 – with the URL API we introduced for libcurl in 2018.
Thanks to this URL parser API, if you are already using libcurl for transfers, it is easy to also parse and treat URLs elsewhere exactly the same way libcurl does. By sticking to the same parser, there is a significantly smaller risk that repeated parsing bring surprises.
Other work-arounds
If your application uses different languages or frameworks, another work-around to lower the risk that URL parsing differences will hurt you, is to use a single parser to extract the URL components you need in one place and then work on the individual components from that point on. Instead of passing around the full URL to get parsed multiple times, you can pass around the already separated URL parts.
Future
I am not aware of any present ongoing work on consolidating the URL specifications. I am not even aware of anyone particularly interested in working on it. It is an infected area, and I will get my share of blow-back again now by writing my own view of the state.
The WHATWG probably say they would like to be the steward of this and they are generally keen on working with URLs from a browser standpoint. It limits them to a small number of protocol schemes and from my experience, getting them to interested in changing something for the the sake of aligning with RFC 3986 parsers is hard. This is however the team that more than any other have moved furthest away from the standard we once had established. There are also strong anti-IETF sentiments oozing there. The WHATWG spec is a “living specification” which means it continues to change and drift away over time.
The IETF published RFC 3986 back in 2005, they saw the RFC 3987 pretty much fail and then more or less gave up on URLs. I know there are people and working groups there who would like to see URLs get brought back to the agenda (as I’ve talked to a few of them over the years) and many IETFers think that the IETF is the only group that can do it proper, but due to the unavoidable politics and the almost certain collision course against (and cooperation problems with) WHATWG, it is considered a very hot potato that barely anyone wants to hold. There are also strong anti-WHATWG feelings in some areas of the IETF. There is just a too small of a chance of a successful outcome from something that mostly likely will take a lot of effort, will, thick skin and backing from several very big companies.
We are stuck here. I foresee yet another report to be written a few years down the line that shows more and new URL problems.