For C programmers like me, who want to write portable programs that can get built and run on the widest possible array of machines and platforms, we often need to write conditional code. Code that use #ifdefs for particular conditions.
Such ifdefs expressions often need to check for a particular compiler, an operating system or perhaps even a specific version of of one those things.
Back in April 2002, Bjørn Reese created the predef project and started collecting this information on a page on Sourceforge. I found it a super useful idea and I have tried to contribute what I have learned through the years to the effort.
The collection of data has been maintained and slowly expanded over the years thanks to contributions from many friends.
The predef documents have turned out to be a genuine goldmine and a resource I regularly come back to time and time again, when I work on projects such as curl, c-ares and libssh2 and need to make sure they remain functional on a plethora of systems. I can only imagine how many others are doing the same for their projects. Or maybe should do the same…
Now, in July 2022, the team is moving the predef documents over to a brand new GitHub organization. The point being to making it more accessible and allow edits and future improvements using standard pull-requests to make ease the maintenance of them.
On July 1st 2017, exactly five years ago today, the OSS-Fuzz project “adopted” curl into their program and started running fuzz tests against it.
OSS-Fuzz is a project run by Google and they do fuzzing on a large amount of open source projects: OSS-Fuzz aims to make common open source software more secure and stable by combining modern fuzzing techniques with scalable, distributed execution.
That initial adoption of curl into OSS-Fuzz was done entirely by Google themselves and its fuzzing integration was rough and not ideal but it certainly got the ball rolling.
Later in in the fall of 2017, Max Dymond stepped up and seriously improved the curl-fuzzer so that it would better test protocols and libcurl options deeper and to a higher degree. (Max subsequently got a grant from Google for his work.)
Fuzzing curl
Fuzzing is really the next-level thing all projects should have a go at when there is no compiler warning left to fix and the static code analyzers all report zero defects, because fuzzing has a way to find silly mistakes, off-by-ones etc in a way no analyzers can and that is very hard for human reviewers to match.
curl is a network tool and library written in C, it is meant to handle non-trusted data properly and doing it wrongly can have serious repercussions.
For the curl project, OSS-Fuzz has found several silly things and bad code paths especially after that rework Max put in. We have no less than 6 curl CVEs registered in 2017 and 2018 giving OSS-Fuzz (at least partial) credit for their findings.
In the first year or two, OSS-Fuzz truly kept us on our toes and we fixed numerous other flaws as well that weren’t CVE worthy but still bugs. At least 36 separate ones to date.
The beauty of OSS-Fuzz
For a little Open Source project such as curl, this service is great and in case you never worked with it let me elaborate what makes me enjoy it so much.
Google runs all the servers and keeps the service running. We don’t have to admin it, spend energy on maintaining etc.
They automatically pull the latest curl code from our master branch and keep fuzzing it. All day, every day, non-stop.
When they find a problem, we get a bug submitted and an email sent. The bug report they create contains stack traces and a minimized and reproducible test case. In most cases, that means I can download a small file that is often less than a hundred bytes, and run my locally built fuzzer code with this input and see the same issue happen in my end.
The bug they submit is initially closed for the public and only accessible to select team of curl developers.
90 days after the initial bug was filed, it is made public automatically.
The number of false positives are remarkably low, meaning that almost every time the system emails me, it spotted a genuine problem.
Fuzzing per commit
In 2020, the OSS-Fuzz team introduced CIFuzz, which allows us to run a little bit of fuzzing – for a limited time – on a PR as part of the CI pipeline. Using this, we catch the most obvious mistakes even before we merge the code into our master branch!
After the first few years of us fixing bad code paths, and with the introduction of CIFuzz, we nowadays rarely get reports from OSS-Fuzz anymore. And when we do, we have it complain on code that hasn’t been released yet. The number of CVEs detected in curl by OSS-Fuzz has therefore seemingly stopped at 6.
Taking fuzzing further
After five years of fuzzing, we would benefit from extending and improving the “hooks” where and how the fuzzing is made so that we can help it find and reach code paths with junk that it so far hasn’t reached. There are of course still potentially many undetected flaws in curl remaining because of the limited set of entry point we have for the fuzzer.
I hope OSS-Fuzz continues and lives on for a long time. It certainly helps curl a lot!
Saturday June 18: I had some curl time in the afternoon and I was just about to go edit the four security advisories I had pending for the next release, to brush up the language and check that they read fine, when it dawned on me.
These particular security advisories were still in draft versions but maybe 90% done. There were details, like dates and links to current in-progress patches, left to update. I also like to reread them a few times, especially in a webpage rendered format, to make sure they are clear and accurate in describing the problem, the solution and all other details, before I consider them ready for publication.
I checked out my local git branch where I expected the advisories to reside. I always work on pending security details in a local branch named security-next-release or something like that. The branch and its commits remain private and undisclosed until everything is ready for publication.
(I primarily use git command lines in terminal windows.)
The latest commits in my git log output did not show the advisories so I did a rebase but git promptly told me there was nothing to rebase! Hm, did I use another branch this time?
It took me a few seconds to realize my mistake. I saw four commits in the git master branch containing my draft advisories and then it hit me: I had accidentally pushed them to origin master and they were publicly accessible!
The secrets I was meant to guard until the release, I had already mostly revealed to the world – for everyone who was looking.
How
In retrospect I can’t remember exactly how the mistake was done, but I clearly committed the CVE documents in the wrong branch when I last worked on them, a little over a week ago. The commit date say June 9.
On June 14, I got a bug report about a problem with curl’s .well-known/security.txt file (RFC 9116) where it was mentioned that our file didn’t have an Expires: keyword in spite of it being required in the spec. So I fixed that oversight and pushed the update to the website.
When doing that push, I did not properly verify exactly what other changes that would be pushed in the same operation, so when I pressed enter there, my security advisories that had accidentally been committed in the wrong branch five days earlier and still were present there were also pushed to the remote origin. Swooosh.
Impact
The advisories are created in markdown format, and anyone who would update their curl-www repository after June 14 would then get them into their local repository. Admittedly, there probably are not terribly many people who do that regularly. Anyone could also browse them through the web interface on github. Also probably not something a lot of people do.
These pending advisories would however not appear on the curl website since the build files were not updated to generate the HTML versions. If you could guess the right URL, you could still get the markdown version to show on the site.
Nobody reported this mistake in the four days they were visible before I realized my own mistake (and nobody has reported it since either). I then tried googling the CVE numbers but no search seemed to find and link to the commits. The CVE numbers were registered already so you would mostly get MITRE and other vulnerability database listings that were still entirely without details.
Decision
After some quick deliberations with my curl security team friends, we decided expediting the release was the most sensible thing to do. To reduce the risk that someone takes advantage of this and if they do, we limit the time window before the problems and their fixes become known. For curl users security’s sake.
Previously, the planned release date was set to July 1st – thirteen days away. It had already been adjusted somewhat to not occur on its originally intended release Wednesday to cater for my personal summer plans.
To do a proper release with several security advisories I want at least a few days margin for the distros mailing list to prepare before we go public with everything. There was also the Swedish national midsummer holiday coming up next weekend and I did not feel like ruining my family’s plans and setup for that, so I picked the first weekday after midsummer: June 27th.
While that is just four days earlier than what we had previously planned, I figure those four days might be important and if we imagine that someone finds a way to exploit one of these problems before then, then at least we shorten the attack time window by four days.
When working with my security advisories, I must pay more attention and be more careful with which branch I commit to.
When pushing commits to the website and I know I have pending security sensitive details locally that have not been revealed yet, I should make it a habit to double-check that what I am about to push is only and nothing but what I expect to be there.
Simultaneously, I have worked using this process for many years now and this is the first time I did this mistake. I do not think we need to be alarmist about it.
Welcome to take the next step with us in this never-ending stroll.
Release presentation
Numbers
the 209th release 8 changes 47 days (total: 8,865) 123 bug-fixes (total: 7,980) 214 commits (total: 28,787) 0 new public libcurl function (total: 88) 2 new curl_easy_setopt() option (total: 297) 1 new curl command line option (total: 248) 51 contributors, 20 new (total: 2,652) 35 authors, 13 new (total: 1,043) 4 security fixes (total: 125) Bug Bounties total: 34,660 USD
Security
This is another release in which scrutinizing eyes have been poking around and found questionable code paths that could be lead to insecurities. We announce four new security advisories this time – all found and reported by Harry Sintonen. This bumps mr Sintonen’s curl CVE counter up to 17; the number of security problems in curl found and reported by him alone.
A malicious server can serve excessive amounts of Set-Cookie: headers in a HTTP response to curl and curl stores all of them. A sufficiently large amount of (big) cookies make subsequent HTTP requests to this, or other servers to which the cookies match, create requests that become larger than the threshold that curl uses internally to avoid sending crazy large requests (1048576 bytes) and instead returns an error.
curl supports “chained” HTTP compression algorithms, meaning that a server response can be compressed multiple times and potentially with different algorithms. The number of acceptable “links” in this “decompression chain” was unbounded, allowing a malicious server to insert a virtually unlimited number of compression steps.
When curl saves cookies, alt-svc and hsts data to local files, it makes the operation atomic by finalizing the operation with a rename from a temporary name to the final target file name.
In that rename operation, it might accidentally widen the permissions for the target file, leaving the updated file accessible to more users than intended.
When curl does FTP transfers secured by krb5, it handles message verification failures wrongly. This flaw makes it possible for a Man-In-The-Middle attack to go unnoticed and even allows it to inject data to the client.
Changes
We have no less than eight different changes logged this time. Two are command line changes and the rest are library side.
These are two options that have not been used by anyone for an extended period of time, and starting now they have no functionality left. Using them has no effect.
The point here is that you can check if global init is thread-safe in your particular libcurl build.
CURLINFO_CAPATH/CAINFO: get default CA paths
As the default values for these values are typically figured out and set at build time, applications might appreciate being able to figure out what they are set to by default.
CURLOPT_SSH_HOSTKEYFUNCTION
For libssh2 enabled builds, you can now set a callback for hostkey verification.
deprecate RANDOM_FILE and EGDSOCKET
The libcurl version of the change mentioned above for the command line. The CURLOPT_RANDOM_FILE and CURLOPT_EGDSOCKET options no longer do anything. They most probably have not been used by any application for a long time.
unix sockets to socks proxy
You can now tell (lib)curl to connect to a SOCKS proxy using unix domain sockets instead of traditional TCP.
Bugfixes
We merged way over a hundred bugfixes in this release. Below are descriptions of some of the fixes I think are particularly interesting to highlight and know about.
improved cmake support for libpsl and libidn2
more powers to the cmake build
address cookie secure domain overlay
Addressed issues when identically named cookies marked secure are loaded over HTTPS and then again over HTTP and vice versa. Cookies are complicated.
make repository REUSE compliant
Being REUSE compliant makes we now have even better order and control of the copyright and licenses used in the project.
headers API no longer EXPERIMENTAL
The header API is now officially a full member of the family.
reject overly many HTTP/2 push-promise headers
curl would accept an unlimited number of headers in a HTTP/2 push promise request, which would eventually lead to out of memory – starting now it will instead reject and cancel such ridiculous streams earlier.
restore HTTP header folding behavior
curl broke the previous HTTP header behavior in the 7.83.1 release, and it has now been restored again. As a bonus, the headers API supports folded headers as well. Folding headers being the ones that are the rare (and deprecated) continuation headers that start with a whitespace.
skip fake-close when libssh does the right thing
Previously, libssh would, a little over-ambitiously, close our socket for us but that has been fixed and curl is adjusted accordingly.
check %USERPROFILE% for .netrc on Windows
A few other tools apparently look for and use .netrc if found in the %USERPROFILE% directory, so by making curl also check there, we get better cross tool .netrc behavior.
There were lots of big and small changes in the HTTP/3 backend powered by ngtcp2.
provide a fixed fake host name in NTLM
curl no longer tries to provide the actual local host name when doing NTLM authentication to reduce information leakage. Instead, curl now uses the same fixed fake host name that Firefox uses when speaking NTLM: WORKSTATION.
return error from “lethal” poll/select errors
A persistent error in select() or poll() could previously be ignored by libcurl and not result in an error code returned to the user, making it loop more than necessary.
After a redirect or if doing multi-stage authentication, the --path-as-is status would be dropped.
support CURLU_URLENCODE for curl_url_get
This is useful when for example you ask the API to accept spaces in URLs and you want to later extract a valid URL with such an embedded space URL encoded
The REUSE project is an effort to make Open Source projects provide copyright and license information (for all files) in a machine readable way.
When a project is fully REUSE compliant, you can easily figure out the copyright and license situation for every single file it holds.
The easiest way to accomplish this is to make sure that all files have the correct header with the appropriate copyright info and SPDX-License-Identifier specified, but it also has ways to provide that meta data in adjacent files – for files where prepending that info isn’t sensible.
What we needed to do
We were already in a fairly good place before this push. We have a script that verifies the presence of copyright header in files (including checking the end year vs the latest git commit), with a list of files that were deliberately skipped.
The biggest things we needed to do were
Add the SPDX identifier all over
Make sure that the skipped files also have copyright and licensing info provided
Add a CI job that verifies that we remain compliant
I also ended up adjusting our own copyright scan script to use the REUSE metadata files instead of its own ignore filters which also made it even easier for us to make sure we are and remain compatible — that every single files in the curl git repository has a known and documented license and copyright situation.
As a bonus, the cleanup work helped us detect an example file that stood out which we got relicensed and we removed two older files that had their own unique licenses (without any good reason).
There are 3518 files in the curl git repository this exact moment.
Compliant!
Starting mid-June 2022, curl is 100% REUSE compliant. curl 7.84.0 will be the first release done in this status.
Motivation
I think it is a good idea to have perfect control over the copyright and license situation for every single file, and to make sure that the situation is documented enough and to a level that allows anyone and everyone to check it out and learn how things lie. No surprises.
Companies have obviously figured out this info before to a degree that they have been satisfied with since curl is widely used even commercial since a long time. But I believe that by providing the information in an even easier and more descriptive way makes things even better. For existing and future users.
I also think that the low threshold for us to reach this compliance was a factor. We were almost there already. We just need to polish up some small details and I think it made it worth it.
This cleanup also makes sure we have perfect control and knowledge of the license situation, now and going forward. I think this can be expected from a project aiming for gold standard.
The curl SPDX license identifier
Keen readers will notice that curl has its own license identifier. It is called the curl license. Not MIT, X or a BSD variation. curl.
The reason for this is good old stupidity. In January 2001 we adopted the MIT license for use in the project because we believed it better matches what we want compared to the previous license situation. We started out with a dual license situation together with the MPL license we used previously, but the MPL part was removed completely in October 2002.
For reasons that have since been forgotten, we thought it was a good idea to edit the license text. To trim it a little. Since August 2002, the license text that started out as an MIT/X license is no longer a perfect copy. It is a derivative . Very similar and almost identical. But it’s not the same.
When the SPDX project created their set of identifiers for well-used licenses out in the FOSS world they decided that the curl license is different enough from the MIT/X license to treat it separately and give it its own identifier. I know of no other project than curl that uses this particular edited version of the MIT license.
In hindsight, I believe the editing of the license text back in 2002 was dumb. I regret it, but I will not change it again. I think we can live with this situation pretty good.
Credits
Most of the heavy lifting necessary to make curl compliant was done by Max Mehl.
Once again I’ve collected the numbers, generated graphs, scratched my head and tried to understand what users mean and how to best use this treasure trove of user feedback.
The curl user survey 2022 ran for two full weeks in the end of May. Here is the document with all the numbers, graphs and analysis from this year’s data.
You will learn what protocols curl users use (HTTPS and HTTP), which TLS backend is the most popular (OpenSSL) and which the top platform is (Linux). And a lot more.
Spoiler: the results are not terribly different than last year and the year before that!
If you have specific feedback on the analysis itself, then I’m all ears. I’m not statistics scholar or anything, but I believe all the numbers, graphs and data I present in there are accurate barring my mistakes of course.
Twenty-one years ago, in May 2001 we introduced the global initialization function to libcurl version 7.8 called curl_global_init().
The main reason we needed this separate function to get called before anything else was used in libcurl, was that several of libcurl’s dependencies at the time (including OpenSSL and GnuTLS) had themselves thread-unsafe initialization procedures.
This rather lame characteristic found in several third party dependencies made the libcurl function inherit that property: not thread-safe. A nasty “feature” in a library that otherwise prides itself for being thread-safe and in many ways working at “it should”. A function that is specifically marked as thread unsafe was not good. Is not good.
Still, we were victims of circumstances and if these were the dependencies we were going to use, this is what we needed to do.
Occasionally, this limitation has poked people in the eye and really hurt them since it makes some use cases really difficult to realize.
Dependencies improved
Over the following decades, the dependencies libcurl use have almost all shaped up and removed the thread-unsafe property of their initialization procedures.
We also slowly cleaned away other code that happened to also fall into the init function out of laziness and convenience because it was there and could be used (or perhaps abused).
Eventually, we were basically masters of our own faith again. The closet was all cleared out and the scrubby leftovers we had sloppily left in there had been cleaned up and gotten converted to proper thread-safe code.
The task of finally making curl_global_init() thread-safe was brought up and attempted a little half-assed a few times but was never pulled through all the way.
The challenges always included that we want to avoid relying on thread library and that we are supporting building libcurl with C89 compilers etc.
Finally, the spring cleaning of 2022
Thanks to work spear-headed by Thomas Guillem who came bursting in with a clear use-case in mind where he felt he really need this to work, and voila now the next libcurl release (7.84.0) features a thread-safe init.
If configure finds support for _Atomic (a C11 feature) or it runs on a new enough Windows version (this should cover a vast amount of platforms), libcurl can now do its own spinlock implementation that makes the init function thread-safe and independent of threading libraries.
Before this, the latest refreshed specification of HTTP/1.1 was done in the RFC 7230 series, published in June 2014. After that, HTTP/2 was done in the spring of 2015 and recently the HTTP/3 spec has been a work in progress.
To better reflect this new world of multiple HTTP versions and an HTTP protocol ecosystem that has some parts that are common for all versions and some other parts that are specific for each particular version, the team behind this refresh has been working on this updated series.
My favorite documents in this “cluster” are:
HTTP Semantics
RFC 9110 basically describes how HTTP works independently of and across versions.
The .netrc file is used to hold user names and passwords for specific host names and allows tools to login to those systems automatically without having to prompt the user for the credentials while avoiding having to use them in command lines. The .netrc file is typically set without group or world read permissions (0600) to reduce the risk of leaking those secrets.
History
Allegedly, the .netrc file format was invented and first used for Berknet in 1978 and it has been used continuously since by various tools and libraries. Incidentally, this was the same year Intel introduced the 8086 and DNS didn’t exist yet.
.netrc has been supported by curl (since the summer of 1998), wget, fetchmail, and a busload of other tools and networking libraries for decades. In many cases it is the only cross-tool way to provide credentials to remote systems.
The .netrc file use is perhaps most widely known from the “standard” ftp command line client. I remember learning to use this file when I wanted to do automatic transfers without any user interaction using the ftp command line tool on unix systems in the early 1990s.
Example
A .netrc file where we tell the tool to use the user name daniel and password 123456 for the host user.example.com is as simple as this:
machine user.example.com
login daniel
password 123456
Those different instructions can also be written on the same single line, they don’t need to be separated by newlines like above.
Specification
There is no and has never been any standard or specification for the file format. If you google .netrc now, the best you get is a few different takes on man pages describing the format in a high level. In general this covers our needs and for most simple use cases this is good enough, but as always the devil is in the details.
The lack of detailed descriptions on how long lines or fields to accept, how to handle special character or white space for example have left the implementers of the different code basis to decide by themselves how to handle those things.
The horse left the barn
Since numerous different implementations have been done and have been running in systems for several decades already, it might be too late to do a spec now.
This is also why you will find man pages out there with conflicting information about the support for space in passwords for example. Some of them explicitly say that the file format does not support space in passwords.
Passwords
Most fields in the .netrc work fine even when not supporting special characters or white space, but in this age we have hopefully learned that we need long and complicated passwords and thus having “special characters” in there is now probably more common than back in the 1970s.
Writing a .netrc file with for example a double-quote or a white space in the password unfortunately breaks tools and is not portable.
I have found at least three different ways existing tools do the parsing, and they are all incompatible with each other.
curl parser (before 7.84.0)
curl did not support spaces in passwords, period. The parser split all fields at the following space or newline and accepted whatever is in between. curl thus supported any characters you want, except spaceand newlines . It also did not “unquote” anything so if you wanted to provide a password like ""llo (with two leading double-quotes), you would use those five bytes verbatim in the file.
wget parser
This parser allows a space in the password if you provide it quoted within double-quotes and use a backslash in front of the space. To specify the same ""llo password mentioned above, you would have to write it as "\"\"llo".
fetchmail parser
Also supports spaces in passwords. Here the double-quote is a quote character itself so in order to provide a verbatim double-quote, it needs to be doubled. To specify the same ""llo password mentioned above, you would have to write it as """"llo – that is with four double-quotes.
What is the best way?
Changing any of these parsers in an effort to unify risk breaking existing use cases and scripts out in the wild with outraged users as a result. But a change could also generate a few happy users too who then could better share the same .netrc file between tools.
In my personal view, the wget parser approach seems to be the most user friendly one that works perhaps most closely to what I as a user would expect. So that’s how I went ahead and made curl work.
What to do
Users will of course be stuck with ancient versions for a long time and this incompatibility situation will remain for the foreseeable future. I can think of a few work-arounds users can do to cope:
Avoid space, tabs, newline and various quotes in passwords
Use separate .netrc files for separate tools
Provide passwords using other means than .netrc – with curl you can for example explore using –config instead
Future curl supports quoting
We are changing the curl parser somewhat in the name of compatibility with other tools (read wget) and curl will allow quoted strings in the way wget does it, starting in curl 7.84.0. While this change risks breaking a few command lines out there (for users who have leading double-quotes in their existing passwords), I think the change is worth doing in the name of compatibility and the new ability to use spaces in passwords.
A little polish after twenty-four years of not supporting spaces in user names or passwords.
This option is for when you use curl to do many requests in a single command line, but you want curl to not do them as quickly as possible. You want curl to do them no more often than at a certain interval. This is a way to slow down the request frequency curl would otherwise possibly use. Tell curl to do the transfers no faster than…
This is a completely different and separate option from the transfer speed rate limit option --limit-rate that has existed for a long time.
A primary reason for using this option is when the server end has a certain capped acceptance rate or other cases where you know it makes no sense to do the requests faster than at a certain interval.
With this new option, you specify the maximum transfer frequency you allow curl to use – in number of transfer starts per time unit (sometimes called request rate) with the new --rate option.
Set the fastest allowed rate with --rate "N/U" where N is an integer number and U is a time unit. Supported units are ‘s’ (for second), ‘m’ (for minute), ‘h’ (for hour) and ‘d’ (for day, as in a 24 hour unit). U is the time unit. The default time unit, if no “/U” is provided, is number of transfers per hour.
For example, to make curl not do its requests faster than twice per minute, use --rate 2/m but if you rather want 25 per hour, then use --rate 25/h.
If curl is provided several URLs and a single transfer completes faster than the allowed rate, curl will wait before it kicks off the next transfer in order to maintain the requested rate and not go faster. If curl is told to allow 10 requests per minute, it will not start the next request until 6 seconds have elapsed since the previous transfer started.
This option has no effect when --parallel is used. Primarily because you then ask for the transfers to happen simultaneously and we have not figured out how this option should affect such transfers!
This functionality uses a millisecond resolution timer internally. If the allowed frequency is set to more than 1000 transfer starts per second, it will instead run unrestricted.
When retrying transfers, enabled with --retry, the separate retry delay logic is used and not this setting.
Rate-limiting response headers
There is ongoing work to standardize HTTP response headers for the purpose or rate-limiting. (See RateLimit Header Fields for HTTP.) Using these headers, a server can tell the client what the maximum allowed transfer rate is and it can adapt.
This new command line option however, does nothing with any such new headers, but I think it would make sense to make a future version able to check for the rate-limit headers and, if opted-in, adapt to those instead of the frequency set by the user.
A challenge with these ratelimit headers vs the --rate command line option is of course that the response headers for this will return the rules for a given site/API, and curl might have been told to talk to many different sites which might all have different (or no) rates in their headers. Also, the command line option works for all protocols curl supports, not just HTTP(S).
Ship it
This feature is due in the pending curl 7.84.0 release.