Tag Archives: Development

more views on curl vulnerabilities

July 10, 2025 Daniel Stenberg 2 Comments

This is an intersection of two of my obsessions: graphs and vulnerability data for the curl project.

In order to track and follow every imaginable angle of development, progression and (possible) improvements in the curl project we track and log lots of metadata.

In order to educate and inform users about past vulnerabilities, but also as a means for the project team to find patterns and learn from past mistakes, we extract and document every detail.

Do we improve?

The grand question. Let’s get back to this a little later. Let’s first walk through some of the latest additions to the collection of graphs on the curl dashboard.

The here data is mostly based on the 167 published curl vulnerabilities to date.

vulnerability severity distribution

Twenty years ago, we got very few vulnerability reports. The ones we got were only for the most serious problems and lots of the smaller problems were just silently fixed without being considered anything else than bugs.

Over time, security awareness has become more widespread and nowadays many more problems are reported. Because people are more vigilant, more people are looking and problems are now more often considered security problems. In recent years also because we offer monetary rewards.

This development is clearly visible in this new graph showing the severity distribution among all confirmed curl vulnerabilities through time. It starts out with the first report being a critical one, adding only high severity ones for a few years until the first low appears in 2006. Today, we can see that almost half of all reports so far has been graded medium severity. The dates in the X-axis are when the reports were submitted to us.

Severity distribution in code

One of the tricky details with security reports is that they tend to identify a problem that has existed in code already for quite some time. For a really long time even in many cases. How long you may ask? I know I did.

I created a graph to illustrate this data already years ago, but it was a little quirky and hard to figure out. What you learn after a while trying to illustrate data over time as a graph, is sometimes you need to try a few different ways and layouts before it eventually “speaks” to you. This is one of those cases.

For every confirmed vulnerability report we receive, we backtrack and figure out exactly which the first release was that shipped the vulnerability. For the last decades we also identify the exact commit that brought it and of course the exact commit that fixed it. This way, we know the exact age of every vulnerability we ever had.

Hold on to something now, because here comes an information dense graph if there ever was one.

There is a dot in the graph for every known vulnerability
The X-axis is the date the vulnerability was fixed
The Y-axis is the number of years the flaw existed in code before we fixed it
The color of each dot indicates the severity level of the vulnerability (see the legend)

To guide the viewer, there is also a few diagonal lines. They show the release dates of a number of curl versions. I’ll explain below how they help.

Now, look at the graph here and I’ll continue below.

Yes, you are reading it right. If you count the dots above the twenty year line, you realize that no less than twelve of the flaws existed in code that long before found and fixed. Above the fifteen year line is almost too many to even count.

If you check how many dots that are close to the the “4.0” diagonal line, it shows how many bugs that have been found throughout the decades that were introduced in code not long after the initial curl release. The other diagonal lines help us see around which particular versions other bugs were introduced.

The green dotted median line we see bouncing around is drawn where there are exactly as many older reports as there are newer. It has hovered around seven years for several recent years but has fallen down to about six recently. Probably too early to tell if this is indeed a long-term evolution or just a temporary blip.

The average age is even higher, about eight years.

You can spot a cluster of fixed issues in 2016. It remains the year with most number of vulnerabilities reported and fixed in curl: 24. Partly because of a security audit.

A key take-away here is that vulnerabilities linger a long time before found. It means that whatever we change in code today, we cannot see the exact effect on vulnerability frequency until many years into the future. We can’t even know exactly how long time we need to tell for sure.

Current knowledge, applied to old data

The older the projects gets, the more we learn about mistakes we did in the past. The more we realize that some of the past releases were quite riddled with vulnerabilities. Something nobody knew back then.

For every release ever made from the first curl release in 1998 we increase a counter for every vulnerability we now know was present. Make it a different color depending on vulnerability severity.

If we lay all this out in a graph, it becomes an interesting “mountain range” style look. In the end of 2013, we shipped a release that contained no less than (what we now know were) 87 security problems.

In this image we can spot that around 2017, the amount of high severity flaws present in the code decreased and they have been almost extinct since 2019. We also see how the two critical flaws thankfully only existed for brief periods.

However. Recalling that the median time for a vulnerability to exist before getting reported is six years, we know that there is a high probability that at least the rightmost 6-10 years of the graph is going to look differently when we redraw this same graph 6-10 years into the future. We simply don’t know how different it will be.

Did we do anything different in the project starting 2017? I have not been able to find any major distinct thing that stands out. We still only had a dozen CI builds but we started fuzzing curl that year. Maybe that is the change that is now visible?

C mistakes

curl is written in C and C is not a memory-safe language. People keep suggesting that we should rewrite it in other languages. In jest and for real. (Spoiler: we won’t rewrite it in any language.)

To get a feel for how much the language itself impacts our set of vulnerabilities, we analyze every flaw and assess if it is likely to have been avoided had we not used C. By manual review. This helps us satisfy our curiosity. Let me be clear that the mistakes are still ours and not because of the language. They are our mistakes that the language did not stop or prevent.

To also get a feel for how or if this mistake rate changes over time, I decided to use the same mountain layout as the previous graph: iterate over all releases and this time count the vulnerabilities they had and instead separate them only between C mistakes and not C mistakes. In the graph the amount of C mistakes is shown in a red-brown nuance.

C mistakes among the vulnerabilities present in code

The dotted line shows the share of the total that is C mistakes, and the Y axis for that is on the right side.

Again, since it takes six years to get half of the reports, we must take at least the rightmost side of the graph as temporary as it will be changed going forward.

The trend looks like we are reducing the share of C bugs though. I don’t think there is anything that suggests that such bugs would be harder to detect than others (quite the opposite actually) so even if we know the graph will change, we can probably say with some certainty that the C mistake rate has indeed been reduced the last six seven years? (See also writing C for curl on how we work consciously on this.)

Do we improve?

I think (hope?) we are, even if the graphs are still not reliably showing this. We can come back here in 2030 or so and verify. It would be annoying if we weren’t.

We do much more testing than ever: more test cases, more CI jobs with more build combinations, using more and better analyzer tools. Combined with concerned efforts to make us write better code that helps us reduce mistakes.

cURL and libcurl, Development

Writing C for curl

April 7, 2025 Daniel Stenberg 14 Comments

It is a somewhat common question to me: how do we write C in curl to make it safe and secure for billions of installations? Some precautions we take and decisions we make. There is no silver bullet, just guidelines. As I think you can see for yourself below they are also neither strange nor surprising.

The ‘c’ in curl does not and never did stand for the C programming language, it stands for client.

Disclaimer

This text does in no way mean that we don’t occasionally merge security related bugs. We do. We are human. We do mistakes. Then we fix them.

Testing

We write as many tests as we can. We run all the static code analyzer tools we can on the code – frequently. We run fuzzers on the code non-stop.

C is not memory-safe

We are certainly not immune to memory related bugs, mistakes or vulnerabilities. We count about 40% of our security vulnerabilities to date to have been the direct result of us using C instead of a memory-safe language alternative. This is however a much lower number than the 60-70% that are commonly repeated, originating from a few big companies and projects. If this is because of a difference in counting or us actually having a lower amount of C problems, I cannot tell.

Over the last five years, we have received no reports identifying a critical vulnerability and only two of them were rated at severity high. The rest ( 60 something) have been at severity low or medium.

We currently have close to 180,000 lines of C89 production code (excluding blank lines). We stick to C89 for widest possible portability and because we believe in continuous non-stop iterating and polishing and never rewriting.

Readability

Code should be easy to read. It should be clear. No hiding code under clever constructs, fancy macros or overloading. Easy-to-read code is easy to review, easy to debug and easy to extend.

Smaller functions are easier to read and understand than longer ones, thus preferable.

Code should read as if it was written by a single human. There should be a consistent and uniform code style all over, as that helps us read code better. Wrong or inconsistent code style is a bug. We fix all bugs we find.

We have tooling that verify basic code style compliance.

Narrow code and short names

Code should be written narrow. It is hard on the eyes to read long lines, so we enforce a strict 80 column maximum line length. We use two-spaces indents to still allow us to do some amount of indent levels before the column limit becomes a problem. If the indent level becomes a problem, maybe it should be split up in several sub-functions instead?

Also related: (in particular local) identifiers and names should be short. Having long names make them hard to read, especially if there are multiple ones that are similar. Not to mention that they can get hard to fit within 80 columns after some amount of indents.

So many people will now joke and say something about wide screens being available and what not but the key here is readability. Wider code is harder to read. Period. The question could possibly be exactly where to draw the limit, and that’s a debate for every project to have.

Warning-free

While it should be natural to everyone already, we of course build all curl code entirely without any compiler warning in any of the 220+ CI jobs we perform. We build curl with all the most picky compiler options that exist with the set of compilers we use, and we silence every warning that appear. We treat every compiler warning as an error.

Avoid “bad” functions

There are some C functions that are just plain bad because of their lack of boundary controls or local state and we avoid them (gets, sprintf, strcat, strtok, localtime, etc).

There are some C functions that are complicated in other ways. They have too open ended functionality or do things that often end up problematic or just plain wrong; they easily lead us into doing mistakes. We avoid sscanf and strncpy for those reasons.

We have tooling that bans the use of these functions in our code. Trying to introduce use of one of them in a pull request causes CI jobs to turn red and alert the author about their mistake.

Buffer functions

Years ago we found ourselves having done several mistakes in code that were dealing with different dynamic buffers. We had too many separate implementations working on dynamically growing memory areas. We unified this handling with a new set of internal help functions for growing buffers and now made sure we only use these. This drastically reduces the need for realloc(), which helps us avoid mistakes related to that function.

Each dynamic buffer also has its own maximum size set, which in its simplicity also helps catching mistakes. In the current libcurl code, we have 80 something different dynamic buffers.

Parsing functions

I mentioned how we don’t like sscanf. It is a powerful function for parsing, but it often ends up parsing more than what the user wants (for example more than one space even if only one should be accepted) and it has weak (non-existing) handling of integer overflows. Lastly, it steers users into copying parsed results around unnecessarily, leading to superfluous uses of local stack buffers or short-lived heap allocations.

Instead we introduced another set of helper functions for string parsing, and over time we switch all parser code in curl over to using this set. It makes it easier to write strict parsers that only match exactly what we want them to match, avoid extra copies/mallocs and it does strict integer overflow and boundary checks better.

Monitor memory function use

Memory problems often involve a dynamic memory allocation followed by a copy of data into the allocated memory area. Or perhaps, if the allocation and the copy are both done correctly there is no problem but if either of them are wrong things can go bad. Therefor we aim toward minimizing that pattern. We rather favor strdup and memory duplication that allocates and copies data in the same call – or uses of the helper functions that may do these things behind their APIs. We run a daily updated graph in the curl dashboard that shows memory function call density in curl. Ideally, this plot will keep falling over time.

It can perhaps also be added that we avoid superfluous memory allocations, in particular in hot paths. A large download does not need any more allocations than a small one.

Double-check multiplications

Integer overflows is another area for concern. Every arithmetic operation done needs to be done with a certainty that it does not overflow. This is unfortunately still mostly a manual labor, left for human reviews to detect.

64-bit support guaranteed

In early 2023 we dropped support for building curl on systems without a functional 64-bit integer type. This simplifies a lot of code and logic. Integer overflows are less likely to trigger and there is no risk that authors accidentally believe they do 64-bit arithmetic while it could end up being 32-bit in some rare builds like could happen in the past. Overflows and mistakes can still happen if using the wrong type of course.

Maximum string length

To help us avoid mistakes on strings, in particular with integer overflows, but also with other logic, we have a general check of all string inputs to the library: they do not accept strings longer than a set limit. We deem that any string set that is longer is either just a blatant mistake or some kind of attempt (attack?) to trigger something weird inside the library. We return error on such calls. This maximum limit is right now eight megabytes, but we might adjust this in the future as the world and curl develop.

keep master golden

At no point in time is it allowed to break master. We only merge code into master that we believe is clean, fine and runs perfect. This still fails at times, but then we do our best at addressing the situation as quickly as possible.

Always check for and act on errors

In curl we always check for errors and we bail out without leaking any memory if (when!) they happen. This includes all memory operations, I/O, file operations and more. All calls.

Some developers are used to modern operating systems basically not being able to return error for some of those, but curl runs in many environments with varying behaviors. Also, a system library cannot exit or abort on errors, it needs to let the application take that decision.

APIs and ABIs

Every function and interface that is publicly accessible must never be changed in a way that risks breaking the API or ABI. For this reason and to make it easy to spot the functions that need this extra precautions, we have a strict rule: public functions are prefixed with “curl_” and no other functions use that prefix.

Everyone can do it

Thanks to the process of human reviewers, plenty of automatic tools and an elaborate and extensive test suite, everyone can (attempt to) write curl code. Assuming you know C of course. The risk that something bad would go in undetected, is roughly equal no matter who the author of the code is. The responsibility is shared.

Go ahead. You can do it!

Comments

Hacker news. Low Level on YouTube.

cURL and libcurl, Development

Adding curl release candidates

February 28, 2025 Daniel Stenberg

Heading towards curl release number 266 we have decided to spice up our release cycle with release candidates in an attempt to help us catch regressions better earlier.

It has become painfully obvious to us in the curl development team that over the past few years we have done several dot-zero releases in which we shipped quite terrible regressions. Several times those regressions have been so bad or annoying that we felt obligated to do quick follow-up releases a week later to reduce friction and pain among users.

Every such patch release have caused pain in our souls and have worked as proof that we to some degree failed in our mission.

We have thousands of tests. We run several hundred CI jobs for every change that verify them. We simply have too many knobs, features, build configs, users and combinations of them all to be able to catch all possible mistakes ourselves.

Release candidates

Decades ago we sometimes did release candidates, but we stopped. We have instead shipped daily snapshots, which is basically what a release would look like, packaged every day and made available. In theory this should remove the need and use of release candidates as people can always just get the latest snapshots, try those out and report problems back to us.

We are also acutely aware of the fact that only releases get tested properly.

Are release candidates really going to make a difference? I don’t know. I figure it is worth a shot. Maybe it is a matter of messaging and gathering the troops around these specific snapshots and by calling out the need for the testing to get done, maybe it will happen at least to some extent?

Let’s attempt this for a while and then come back in a few years and evaluate if it has seemed to help or otherwise improve the regression rate or not.

Release cycle

We have a standard release cycle in the curl project that is exactly eight weeks. When things run smoothly, we ship a new release on a Wednesday every 56 days.

The release cycle is divided into three periods, or phases, that control what kind of commits us maintainers are permitted to merge. Rules to help us ship solid software.

Immediately after a release, we have a ten day cool down period during which we absorb reactions and reports from the release. We only merge bugfixes and we are prepared to do a patch release if we need to.

Ten days after the release, we open the feature window in which we allow new features and changes to the project. The larger things. Innovations, features etc. Typically these are the most risky things that may cause regressions. This is a three-week period and those changes that do not get merged within this window get another chance again next cycle.

The longest phase is the feature freeze that kicks in twenty-five days before the pending release. During this period we only merge bugfixes and is intended to calm things down again, smooth all the frictions and rough corners we can find to make the pending release as good as possible.

Adding three release candidates

The first release candidate (rc1) is planned to ship on the same day we enter feature freeze. From that day on, there will be no more new features before the release so all the new stuff can be checked out and tested. It does not really make any sense to do a release candidate before that date.

We will highlight this release candidate and ask that everyone who can (and want) tests this one out and report every possible issue they find with it. This should be the first good opportunity to catch any possible regressions caused by the new features.

Nine days later we ship rc2. This will be done no matter what bugreports we had on rc1 or what possible bugs are still pending etc. This candidate will have additional bugfixes merged.

The final and third release candidate (rc3) is then released exactly one week before the pending release. A final chance to find nits and perfect the pending release.

I hope I don’t have to say this, but you should not use the release candidates in production, and they may contain more bugs than what a regular curl release normally does.

Technically

The release candidates will be created exactly like a real release, except that there will not be any tags set in the git repository and they will not be archived. The release candidates are automatically removed after a few weeks.

They will be named curl-X.Y.Z-rcN, where x.y.z is the version of the pending release and N is the release candidate number. Running “curl -V” on this build will show “x.y.x-rcN” as the version. The libcurl includes will say it is version x.y.z, so that applications can test out preprocessor conditionals etc exactly as they will work in the real x.y.z release.

You can help!

You can most certainly help us here by getting one of the release candidates when they ship and try it out in your use cases, your applications, your pipelines or whatever. And let us know how it runs.

I will do something on the website to help highlight the release candidates once there is one to show, to help willing contributors find them.

cURL and libcurl

Eighteen years of ABI stability

October 30, 2024 Daniel Stenberg

Exactly eighteen years ago today, on October 30 2006, we shipped curl 7.16.0 that among a whole slew of new features and set of bugfixes bumped the libcurl SONAME number from 3 to 4.

ABI breakage

This bump meant that libcurl 7.16.0 was not binary compatible with the previous releases. Users could not just easily and transparently bump up to this version from the previous, but they had to check their use of libcurl and in some cases adjust source code.

This was not the first ABI breakage in the curl project, but at this time our use base was larger than at any of the previous bumps and this time people complained about the pains and agonies such a break brought them.

We took away FTP features

In the 7.16.0 release we removed a few FTP related features and their associated options. Before this release, you could use curl to do “third party” transfers over FTP, and in this release you could no longer do that. That is a feature when the client (curl) connects to server A and instructs that server to communicate with server B and do file transfers among themselves, without sending data to and from the client.

This is an FTP feature that was not implemented well in curl and it was poorly tested. It was also a feature that barely no FTP server allowed and subsequently this was not used by many users. We ripped it out.

A near pitchfork situation

Because so few people used the removed features, barely anyone actually noticed the ABI breakage. It remained theoretical to most users and I believe that detail only made people more upset over the SONAME bump because they did not even see the necessity: we just made their lives more complicated for no benefit (to them).

The Debian project even decided to override our decision “no, that is not an ABI breakage” and added a local patch in their build that lowered the SONAME number back to 3 again in their builds. A patch they would stick to for many years to come.

The obvious friction this bump caused, even when in reality it actually did not affect many users and the loud feedback we received, made a huge impact on me. It had not previously dawned on me exactly how important this was.

I decided there and then to do the utmost to never go through this again. To put ABI compatibility at the top of the priority list. Make it one of the most fundamental key properties of libcurl.

Do. Not. Break. The. ABI

(we don’t break the API either)

A never-breaking ABI

The decision was initially made to avoid the negativity the bump brought, but I have since over time much more come to appreciate the upsides.

Application authors everywhere can always and without risk keep upgrading to the latest libcurl.

It sounds easy and simple, but the impact is huge. The examples, the documentation, the applications, everything can just always upgrade and continue. As libcurl over time has become even more popular and compared to 2006, used in many magnitudes more installations, it has grown into an even more important aspect of the curl life. Possibly the single most important properly of curl.

There is a small caveat here and that is that we occasionally of course have bugs and regressions, so when I say that users can always upgrade, that is true in the sense that we have not broken the ABI since. We have however had a few regressions that sometimes have triggered some users to downgrade again or wait a little longer for the next release that has the bug fixed.

When we took that decision in 2006 we had less than 50,000 lines of product code. Today we are approaching 180,000 lines.

Effects of never breaking ABI

We know that once we adopt a change, we are stuck with it for decades to come. It makes us double-check every knot before we accept new changes.

Once accepted and shipped, we keep supporting code and features that we otherwise could have reconsidered and perhaps removed. Sometimes we think of a better way to do something after the initial merge, but by then it is too late to change. We can then always introduce new and better ways to do things, but we have to keep supporting the old way as well.

A most fundamental effect is that we can never shrink the list of options we support. We can never actually rename something. Doing new things and features consistently over this long time is hard if not impossible, as we learn new things and paradigms vary through the decades.

How

The primary way we maintain this is by manual code view and code inspection of every change. Followed of course by a large range of tests that make sure that assumptions remain.

Occasionally we have (long) discussions around subtle details when someone proposes a change that potentially might be considered an ABI break. Or not.

What exactly is covered by ABI compatibility is not always straight forward or easy to have carved in stone. In particular since the project can be built and run on such a wide range of systems and architectures.

Deprecating

We can still remove functionality if the conditions are right.

Some features and options are documented and work in a way so that something is requested or asked for and libcurl then tries to satisfy that ask. Like for example libcurl once supported HTTP/1 pipelining like that.

libcurl still provides the option to enable pipelining and applications can still ask for it so it is still ABI and API compatible, but a modern libcurl simply will never do it because that functionality has been removed.

Example two: we dropped support for NPN a few years back. NPN being a TLS extension called Next Protocol Negotiation that was used briefly in the early days of HTTP/2 development before ALPN was introduced and replaced NPN. Virtually nothing requires NPN anymore, and users can still set the option asking for it, but it will never actually happen over the wire.

Furthermore, a typical libcurl build involves multiple third party libraries that provide features it needs. For things like TLS, SSH, compression and binary HTTP protocol management. Over the years, we have removed support for several such libraries and introduced support for new, in ways that was never visible in the API or ABI. Some users just had to switch to building curl with different helper libraries.

In reality, libcurl is typically more stable than most existing servers and URLs. The libcurl examples you wrote in 2006 can still be built with the modern libcurl, but the servers and URLs you used back then most probably cannot be used anymore.

If no one can spot it, it did not happen

As blunt as it may sound, it has came down to this fundamental statement several times to judge if a change is an ABI breakage or not:

If no one can spot an ABI change, it is not an ABI change

Of course what makes it harder than it sounds is that it is extremely difficult to actually know if someone will notice something ahead of time. libcurl is used in so ridiculously many installations and different setups, second-guessing whatever everyone does and wants is darned close to impossible.

Adding to the challenge is the crazy long upgrade cycles some of our users seem to sport. It is not unusual to see questions appear on the mailing lists from users bumping from curl versions from eight or ten years ago. The fact that we have not heard users comment on a particular change might just mean that they are still stuck on ancient versions.

Getting frustrated comments from users today about a change we landed five years ago is hard to handle.

Forwards compatible

I should emphasize that all this means that users can always upgrade to a later release. It does not necessarily mean that they can switch back to an older version without problems. We do add new features over time and if you start using a new feature, the application of course will not work, or even still compile, if you would switch to a libcurl version from before that feature was added.

How long is never

What I have laid out here is our plan and ambition. We have managed to stick to this for eighteen years now and there is no particular known blockers in the known future either.

I cannot rule out that we might at some point in the future run into an obstacle so huge or complicated that we will be forced to do the unthinkable. To break the ABI. But until we see absolutely no other way forward, it is not going to happen.

cURL and libcurl

decomplexifying curl

October 27, 2024 Daniel Stenberg

(I wrote about this topic in my weekly email this week. This is the blog version, somewhat extended.)

Easy to read

Two contributing factors that make code hard to read are function length and function complexity. To keep source code easy to read, understand and debug we should strive towards keeping functions short and simple. Nothing ground-breaking in that conclusion.

I know, it sounds really simple and straight forward but in a living project that goes on for decades, code develops, moves and grows over time. What started out small and simple risk gradually turning into something else.

This of course because there are so many more factors involved that need to be given focus as well. Like security, bugfixes, performance, food on the table and getting more people involved.

Graphs graphs graphs

Last week I added two more graphs to the curl dashboard showing function complexity and function length growth in curl code over the decades: one plot for the worst function and one plot for the 99th percentile in each graph. For both graphs, the 99th percentile plots shrink gradually over time but the worst offenders grow. This means that there are a few functions that with attention could improve readability and code maintainability but that in general things are under control.

One of the main points for me with graphing the project from as many angles as possible is to unveil things like this. Areas that might need attention, and then keep a check on these areas going forward. Details like these are otherwise rather subtle and not easily detected when manually browsing around.

It has been said that whatever measurement you use to track engineering progress, that will then become the goal for what engineers work towards. I hope to combat this by measuring (and graphing) as many angles as possible of the curl project. To help push us in the right direction in as many different areas as possible.

Improve

I took it upon myself to improve the situation: to reduce the size of the largest function in the code base and to simplify the most complex one. Incidentally they were different functions: the largest function was the big switch handling curl_easy_setopt options, and the most complex one was the main curl tool function setting up a single transfer.

These two functions had simply just slowly and consistently been growing over time, in size and complexity. No one’s “fault” really and not with any specific plan or intention. The graph helped me decide to act and the pmccabe tool helped me identify them. We can of course argue about the specific method or number that pmccabe presents for complexity, but I think it at least is pretty good at actually identifying the correct functions and the exact particular score it sets is not terribly important.

Both pull-requests became > 2000 modified lines monsters, but they also had immediate and distinct effects on the graphs; which ideally should mean that the code readability is now a little better than before, making the functions easier to improve and work with going forward

Complexity

The single worst function in production code had gotten quite complex. I spent a work day on the case and look at the drop on the right edge of the graph below, made after my fix landed. Most of the job was to properly split the function into several smaller ones that made sense.

The single worst offender at this particular time was the function in the curl tool that sets up a single transfer job.

There are still some pretty complex ones remaining. Room for further improvements no doubt.

Function length

The worst offenders in terms of function size in curl have been of two kinds: state machines with many states and functions handling big switches for options.

In this particular case, this was the big function handling curl_easy_setopt(), and as we have over three hundred options having them all handled in a single function made it very big. The new setup splits that handling up into multiple smaller functions, one for each kind of input.

The largest one is now at over 1,500 lines. Still on the too large side of things but way better than before.

Going forward

Yes, I am a graphaholic and I seem to keep finding new ways to illustrate project status and development using plots on timelines. I am also most likely the biggest consumer of these graphs as I monitor them daily to make sure I have full control of how we are in the project, in every imaginable aspect.

I intend to try to continue simplifying a few more of the functions in the pmccabe toplist.

Let’s see what the graph shows in another three years.

cURL and libcurl, Development

UndefinedBehaviorSanitizer’s unexpected behavior

October 17, 2024 Daniel Stenberg 3 Comments

The transition from Ubuntu 22 to 24 for ubuntu-latest on GitHub actions started recently with the associated version bumps of a lot of applications. As expected.

One of the version bumps is for clang: it now uses clang 18 by default. clang 18 introduced some changes that turned out to be relevant for me and other curl developers. Yeah, surely for some others as well.

clang and gcc

In my daily developer life I just typically use gcc for building local stuff – mostly out of old habits. I rebuild and test curl dozens of times every day. In my normal work process I use a couple of different build combinations that enable a lot of third party dependencies and I almost always build curl and libcurl with debug enabled and only statically. It is a debug-friendly setup.

Of course I also have clang installed so that I can try out building with it when I want to, and I have a large set of alternative config setups that I use when I have a particular reason to check or debug such a build.

CI to the rescue

There are literally many millions of build combinations of curl, and we do some of the most important ones automatically for every pull request and commit in the source repository. They help us avoid regressions. Currently we do almost two hundred different jobs.

Two of those CI jobs build curl using clang and enable some sanitizers: address, memory, undefined and signed-integer-overflow and use those builds to run through the test suite to help us verify that everything still looks fine.

Since it gets done in the CI for every change, I don’t have to run it myself locally very often. We have thus been using the default clang version shipped in Ubuntu 22.04 for this for quite some time now.

Undefined behavior sanitizer

When the clang version for the Ubuntu jobs on GitHub was bumped up to version 18, the undefined behavior sanitizer job suddenly found plenty of new problems in curl.

In code that had been running without problems for a long time (decades in some cases) on countless systems and on almost every imaginable architecture. Unexpected.

Picky function prototypes

Here is the reason:

The sanitizer now keeps track of exactly how a function pointer prototype is declared and verifies that the function actually called via that pointer is using an identical prototype.

This is generally probably a good idea and a sound sanity check for most programs but since the checker insists on identical prototypes, I believe it goes beyond what is undefined behavior – I believe some discrepancies are handled just fine. For example a signed vs unsigned pointer or void vs char pointers. I am however not a compiler developer and neither am I an expert in the C language specifications so maybe I am wrong.

Example

A function pointer defined to call a function that returns a void and gets a single char pointer input

void (*name)(char *ptr);
typedef void (*name_func)(char *ptr);

Such a pointer can be made to point to a function that instead takes a void pointer:

void target(void *ptr)
{
   printf("Input %p\n", ptr);
}

This construct works perfectly fine in C:

char *data = "string";
name = (name_func)target;
name(data);

In libcurl we set function pointers (callbacks) via a setopt() style function, which cannot validate the pointer at compile time.

When the code example above is tested with the undefined behavior sanitizer and its -fsanitize=function check (I believe), it complains about the mismatching prototypes between the pointer and the actually called function.

How this became annoying

For the example above, the sanitizer report is most welcome, even if I think it goes beyond what is actually undefined behavior. It helps us clean up the code.

For libcurl, we have a CURL * type returned for a handle from curl_easy_init(). This handle is used as an input argument to multiple functions and it is also used as an input argument to several callbacks an application can tell libcurl to call, etc.

That type was originally done like this:

typedef void CURL;

But in 2016, we changed it to instead become:

#if defined(BUILDING_LIBCURL)
typedef struct Curl_easy CURL;
#else
typedef void CURL;
#endif

This made us get a more descriptive pointer for the type when we build libcurl. For convenience.

The function pointer is defined internally for libcurl as a struct pointer, but outside in the application land as a void pointer. This works great.

Until this new sanitizer check. Now it complaints loudly because the prototypes for the function called does not match the prototype for the function pointer. The struct vs the void pointers. The sanitizer stores and uses “resolved” typedefs in its checks, not the name of the types visible in code.

The fix

Since we can’t have build breakage in the CI jobs, I fixed this.

We are back to how we did it in the past. With a plain

typedef void CURL;

… even when we build libcurl. To make sure the pointer and the final function have the same prototypes. To hush up the undefined behavior sanitizer.

This is now in master and how the code in the pending curl 8.11.0 release will look.

Disabling the check is not enough

While we could disable this particular check in our CI jobs, that would not suffice since we want everyone to be able to run these tools against curl without any warnings or errors.

We also want application authors in general who use libcurl to be able to similarly run this sanitizer against their tools to not get error reports like this.

Is this a clang issue?

Maybe. I just can’t see how this could happen by mistake, and since it is a feature that has existed for quite a while now already I have not bothered to submit an issue or have any argument or discussion with the clang team. I have simply accepted that this is the way they want to play this and adapt accordingly.

A historic footnote

In 2016 I wanted to change the type universally to just

typedef struct Curl_easy CURL;

… as I thought we could do that without breaking neither API nor ABI. I still believe I was right, but the change still caused an “uproar” among some users who had already built code and done things based on the assumption that it was and would always remain a void pointer. Changing the type thus caused build errors at places to a level that made us retract that change and revert back to the #ifdef version showed above.

And now we had to retract even the #ifdef and thus we are back to the pre 2016 way.

Post-publish update

It has been pointed out to me that the way the C standard is phrased, this tool seems to be correct. More discussions around that can be found in a long OpenSSL issue from last year.

cURL and libcurl

My hacker station

March 3, 2023 Daniel Stenberg 2 Comments

My home office was featured over at Hacker Stations where I also detailed stuff in my workplace and offer a few more photos.

I have been working exclusively from home for nine years straight now.

cURL and libcurl

My 2023 dev machine

February 20, 2023 Daniel Stenberg 1 Comment

My desktop computer is my trusted work machine that I do the majority of all my (curl) development on. When the 15th computer I’ve owned through the times was ten years old the time was ripe to bump things up a notch.

Requirements

I don’t do games (as in: never) and I don’t do any other 3D stuff. I just need my two 4K monitors to display my desktops and browser windows fine.

In my ordinary days I compile C code and I run tests. CPU and memory will be used to build and test faster and to be able to run separate VM runtimes in parallel without problems. I rarely even build very large or complicated software projects. (The days of building Firefox are long gone…)

Ideally, this upgrade will last for a long time again so I’ve tried to push it a little to increase those chances.

New machine

This new baby is (of course) built from components and I’ve relied heavily on advice, research and help by my brother Björn for this.

CPU

I’m a sucker for maximum single-thread performance. Lots of things I do still run in single-threaded in a single core so I think this is good for me.

The Intel Core i7-13700K at 3.4 GHz is benchmarked at a CPU Mark that is over 7 times faster than the CPU of my old machine. 16 cores, Socket 1700 Raptor Lake. “13th gen”

On cpubenchmark.net, this model is currently ranked 4th among all current CPUs in single-thread performance.

CPU Cooling

I think I’m not alone in having past happy experiences with Noctua. This time I use the Noctua NH-U12A, which I have gotten reports does a good job for this CPU.

Motherboard

Something to host the CPU that just does the job. MSI PRO B660M-A DDR4 is a small board, but I don’t need anything more.

Turned out to require a little dance to make it accept my CPU since the BIOS it shipped with did not support it, so we had to insert an older CPU first just in order to upgrade the BIOS to make it boot with the intended CPU!

Graphics

My plan is to start trying out the built-in Intel video capabilities. Nothing extra. Lots of space in the box!

Memory

I don’t think I’ve experienced a situation when I have run out of my memory in my current 32GB setup, so my original plan was to go with 64GB in this new machine. However it turned out that the motherboard does not work with all four slots using my 3600MHz memories at full speed and I decided it is better to start out with 32 really fast gigabytes than 64GB at 2100MHz (which was the alternative)!

Corsair Vengeance RGB PRO SL / 3600MHz / DDR4 / CL18. Two 16GB modules installed makes it 32GB in total. I can go 2x32GB in a future when if this turns out to be too limited.

RGB-LEDs on the memory modules is apparently a thing now.

Should be >50% faster than my old memory.

Storage

I am not a data hoarder. On the disks in my current machine I use just a few hundred gigabytes. 2 TB will give me sufficient space to play with for a while. My old machine had a 3 TB spinning disk so this is less room than before, but I don’t expect that to be a problem. This storage is speced doing 14 times faster reads than my previous SSD.

The Samsung 980 Pro series SSD 2TB M.2 (MZ-V8P2T0)

Power Supply

The Kolink Enclave / 600W / 80+ Gold is nothing special. A modular and cheap alternative that my preferred supplier happened to offer. Again, I will not run any power hungry graphics cards.

Case

Me and friends have been happy with Fractal Design cases in the past and a friend of mine mentioned that he recently purchased this model and is very happy, so I went with the Fractal Design Define 7 Compact / Solid.

Internals with motherboard and CPU cooler in place. Not a lot of extra things in this…

This is a big case for a what is otherwise a very small computer (need). Partly because of the recommendation but also partly because that my preferred supplier did not offer any smaller Fractal Design case at the moment. At least there will be lots of air in the box.

This case looks almost identical to my old case which will make my machine upgrade at least physically impossible to detect in my home office once installed.

Front interface from left to right: Reset button, Audio I/O, 1x USB 3.1 Gen 2 Type-C, Power button, 2x USB 2.0, 2x USB 3.0,

Speed comparisons

Here’s how the new beast compares to the old box when doing a few of my regular every day tasks.

Building stuff

On a typical curl debug build of mine, identical setups. Run-time in seconds on the new machine vs the old.

Task	Old	New
build curl with make -sj	13.1	2.7
autoreconf -fi	13.2	5.6
configure	19.3	10.1
build in curl’s test directory	30.1	7.6
run the first 200 curl tests with valgrind	331	194

Downloading 100 GB from http://localhost to /dev/null with curl. Old: 2248MB/sec. New: 4281MB/sec.

Development

predef is our friend

July 13, 2022 Daniel Stenberg

For C programmers like me, who want to write portable programs that can get built and run on the widest possible array of machines and platforms, we often need to write conditional code. Code that use #ifdefs for particular conditions.

Such ifdefs expressions often need to check for a particular compiler, an operating system or perhaps even a specific version of of one those things.

Back in April 2002, Bjørn Reese created the predef project and started collecting this information on a page on Sourceforge. I found it a super useful idea and I have tried to contribute what I have learned through the years to the effort.

The collection of data has been maintained and slowly expanded over the years thanks to contributions from many friends.

The predef documents have turned out to be a genuine goldmine and a resource I regularly come back to time and time again, when I work on projects such as curl, c-ares and libssh2 and need to make sure they remain functional on a plethora of systems. I can only imagine how many others are doing the same for their projects. Or maybe should do the same…

Now, in July 2022, the team is moving the predef documents over to a brand new GitHub organization. The point being to making it more accessible and allow edits and future improvements using standard pull-requests to make ease the maintenance of them.

You find all the documentation here:

https://github.com/cpredef/predef

If you find errors, mistakes or omissions, we will of course be thrilled to see issues and pull requests filed.

cURL and libcurl

case insensitive string comparisons in C

May 19, 2022 Daniel Stenberg 4 Comments

Back in 2008, I had a revelation when it dawned on me that the POSIX function called strcasecmp() compares strings case insensitively, but locale dependent. Because of this, “file” and “FILE” is not actually a case insensitive match in Turkish but is a match in most other locales. curl would sometimes fail in mysterious ways due to this. Mysterious to the users, now we know why.

Of course this behavior was no secret. The knowledge about this problem was widespread already then. It was just me who hadn’t realized this yet.

A custom replacement

To work around that problem for curl, we immediately implemented our own custom comparison replacement function that doesn’t care about locales. Internet protocols work the same way no matter which locale the user happens to prefer.

We did not go the POSIX route. The POSIX function for case insensitive string comparisons that ignores the locale is called strcasecmp_l() but that uses a special locale argument and also doesn’t exist on non-POSIX platforms.

curl has used its custom set of functions since 7.19.1, released in early November 2008.

OpenSSL 3.0.3

Fast forward to May 2022. OpenSSL released their version 3.0.3. In the change-log for this release we learned that they now offer public functions for case insensitive string comparisons. Whatdoyouknow! They too have learned about the Turkish locale. Apparently to the degree that they feel they need to offer those functions in their already super-huge API set. Oh well, that is certainly their choice.

I can relate since we too have such functions in libcurl, but I have always regretted that we added them to the API since comparing strings is not libcurl’s core business. We did wrong then and we still live with the consequences several decades later.

OpenSSL however took the POSIX route and based their implementation on strcasecmp_l() and use a global variable for the locale and an elaborate system to initialize that global and even a way to make things work if string comparisons are needed before that global variable is initialized etc.

This new system was complicated to the degree that it broke the library on several platforms, which curl users running Windows 7 figured out almost instantly. curl with OpenSSL 3.0.3 simply does not work on Windows 7 – at all.

Reasons for not exposing a string compare API

Libraries should only provide functions that are within their core objective. Not fluffy might be useful things. Reasons for this include:

It adds to the complexity to users. Yet another function in the ever expanding set of function calls in the API.
It increases the documentation size even more and makes the real things harder to find somewhere in there.
It adds “attack surface” and areas where you can make errors and introduce security problems.
You get more work since now you have additional functions to keep ABI and API stable for all eternity and you have to spend developer time and effort on making sure they remain so.

Do a custom one for OpenSSL?

I think there is a software law that goes something like this

eventually, all C libraries implement their own case insensitive string comparison functions

When I proposed they should implement their own custom function in discussions in one of the issues about this OpenSSL problem, the suggestion was shot down fairly quickly because of how hard it is to implement a such function that is as fast as the glibc version.

In my ears, that sounds like they prefer to stick with an overworked and complicated error-prone system, because an underlying function is faster, rather than going with simplicity and functionality at the price of sightly slower performance. In fairness, they say that case insensitive string comparisons are “6-7%” of the time spent in some (to me unknown) performance test they referred to. I have no way or intention to argue with that.

I think maybe they couldn’t really handle that idea from an outsider and they might just need a little more time to figure out the right way forward on their own. Then go with simple.

I am of course not in the best position to say how they should act on this. I’m just a spectator here. I may be completely wrong.

Update (May 23)

In a separate PR (4 days after this blog post went live), OpenSSL suddenly implemented their own and it was deemed that it would not hurt performance noticeably. Merged on May 23. Almost like they followed my recommendation!

OpenSSL’s current tolower() implementation used in the comparison function is similar to curl’s old one so I suspect curl’s current function is a tad bit faster.

Custom vs glibc performance

glibc truly has really fast string comparison implementations, with optimized assembly versions for the common architectures. Versions written in plain C tend to be slower.

However, the API and way to use those functions to make them locale independent is horrific because of the way it forces the caller to provide a locale argument (which could be the “C” locale – the equivalent of no locale).

The curl custom function

That talk about the slowness of custom string functions made us start discussing this topic a little in the curl IRC channel and we bounced around some ideas of what things the curl function does not already do and what it could do and how it compares against the glibc assembly version.

Also: the string comparisons in curl are certainly not that performance critical as they seem to be in OpenSSL and while used a lot in curl they are not used in the most important performance critical transfer-data code paths.

Optimizations

Frank Gevaerts took the lead and after some rounds and discussions backed up with tests, he ended up with an updated function that is 1.6 to 1.7 times faster than before his work. We dropped non-ASCII support in curl a while ago, which also made this task more straight-forward.

The two improvements:

Use a lookup table for our own toupper() implementation instead of the previous simple condition + math.
Better end of loop handling: return immediately on mismatch, and a minor touch-up of the final check when the loop goes all the way to the end.

Measurements

The glibc assembler versions are still faster than curl’s custom functions and the exact speed improvements the above mentioned changes provide will of course depend both on platform and the test set.

Ships in 7.84.0

The faster libcurl functions will ship in curl 7.84.0. I doubt anyone will notice the difference!

Do we improve?

vulnerability severity distribution

Severity distribution in code

Current knowledge, applied to old data

C mistakes

Do we improve?

Disclaimer

Testing

C is not memory-safe

Readability

Narrow code and short names

Warning-free

Avoid “bad” functions

Buffer functions

Parsing functions

Monitor memory function use

Double-check multiplications

64-bit support guaranteed

Maximum string length

keep master golden

Always check for and act on errors

APIs and ABIs

Everyone can do it

Comments

Release candidates

Release cycle

Adding three release candidates

Technically

You can help!

ABI breakage

We took away FTP features

A near pitchfork situation

A never-breaking ABI

Effects of never breaking ABI

How

Deprecating

If no one can spot it, it did not happen

Forwards compatible

How long is never

Easy to read

Graphs graphs graphs

Improve

Complexity

Function length

Going forward

clang and gcc

CI to the rescue

Undefined behavior sanitizer

Picky function prototypes

Example

How this became annoying

The fix

Disabling the check is not enough

Is this a clang issue?

A historic footnote

Post-publish update

Requirements

New machine

CPU

CPU Cooling

Motherboard

Graphics

Memory

Storage

Power Supply

Case

Speed comparisons

Building stuff

A custom replacement

OpenSSL 3.0.3

Reasons for not exposing a string compare API

Do a custom one for OpenSSL?

Update (May 23)

Custom vs glibc performance

The curl custom function

Optimizations

Measurements

Ships in 7.84.0

curl, open source and networking