Tag Archives: Security

curl security moves again

tldr: curl goes back to Hackerone.

When we announced the end of the curl bug-bounty at the end of January 2026, we simultaneously moved over and started accepting curl security reports on GitHub instead of its previous platform.

This move turns out to have been a mistake and we are now undoing that part of the decision. The reward money is still gone, there is no bug-bounty, no money for vulnerability reports, but we return to accepting and handling curl vulnerability and security reports on Hackerone. Starting March 1st 2026, this is now (again) the official place to report security problems to the curl project.

This zig-zagging is unfortunate but we do it with the best of intentions. In the curl security team we were naively thinking that since so many projects are already using this setup it should be good enough for us too since we don’t have any particular special requirements. We wrongly thought. Now I instead question how other Open Source projects can use this. It feels like an area and use case for Open Source projects that is under-focused: proper, secure and efficient vulnerability reporting without bug-bounty.

What we want from a security reporting system

To illustrate what we are looking for, I made a little list that should show that we’re not looking for overly crazy things.

  1. Incoming submissions are reports that identify security problems.
  2. The reporter needs an account on the system.
  3. Submissions start private; only accessible to the reporter and the curl security team
  4. All submissions must be disclosed and made public once dealt with. Both correct and incorrect ones. This is important. We are Open Source. Maximum transparency is key.
  5. There should be a way to discuss the problem amongst security team members, the reporter and per-report invited guests.
  6. It should be possible to post security-team-only messages that the reporter and invited guests cannot see
  7. For confirmed vulnerabilities, an advisory will be produced that the system could help facilitate
  8. If there’s a field for CVE, make it possible to provide our own. We are after all our own CNA.
  9. Closed and disclosed reports should be clearly marked as invalid/valid etc
  10. Reports should have a tagging system so that they can be marked as “AI slop” or other terms for statistical and metric reasons
  11. Abusive users should be possible to ban/block from this program
  12. Additional (customizable) requirements for the privilege of submitting reports is appreciated (rate limit, time since account creation, etc)

What’s missing in GitHub’s setup?

Here is a list of nits and missing features we fell over on GitHub that, had we figured them out ahead of time, possibly would have made us go about this a different way. This list might interest fellow maintainers having the same thoughts and ideas we had. I have provided this feedback to GitHub as well – to make sure they know.

  1. GitHub sends the whole report over email/notification with no way to disable this. SMTP and email is known for being insecure and cannot assure end to end protection. This risks leaking secrets early to the entire email chain.
  2. We can’t disclose invalid reports (and make them clearly marked as such)
  3. Per-repository default collaborators on GitHub Security Advisories is annoying to manage, as we now have to manually add the security team for each advisory or have a rather quirky workflow scripting it. https://github.com/orgs/community/discussions/63041
  4. We can’t edit the CVE number field! We are a CNA, we mint our own CVE records so this is frustrating. This adds confusion.
  5. We want to (optionally) get rid of the CVSS score + calculator in the form as we actively discourage using those in curl CVE records
  6. No CI jobs working in private forks is going to make us effectively not use such forks, but is not a big obstacle for us because of our vulnerability working process. https://github.com/orgs/community/discussions/35165
  7. No “quote” in the discussions? That looks… like an omission.
  8. We want to use GitHub’s security advisories as the report to the project, not the final advisory (as we write that ourselves) which might get confusing, as even for the confirmed ones, the project advisories (hosted elsewhere) are the official ones, not the ones on GitHub
  9. No number of advisories count is displayed next to “security” up in the tabs, like for issues and Pull requests. This makes it hard to see progress/updates.
  10. When looking at an individual advisory, there is no direct button/link to go back to the list of current advisories
  11. In an advisory, you can only “report content”, there is no direct “block user” option like for issues
  12. There is no way to add private comments for the team-only, as when discussing abuse or details not intended for the reporter or other invited persons in the issue
  13. There is a lack of short (internal) identifier or name per issue, which makes it annoying and hard to refer to specific reports when discussing them in the security team. The existing identifiers are long and hard to differentiate from each other.
  14. You quite weirdly cannot get completion help for @nick in comments to address people that were added into the advisory thanks to them being in a team you added to the issue?
  15. There are no labels, like for issues and pull requests, which makes it impossible for us to for example mark the AI slop ones or other things, for statistics, metrics and future research

Email?

Sure, we could switch to handling them all over email but that also has its set of challenges. Including:

  • Hard to keep track of the state of each current issue when a number of them are managed in parallel. Even just to see how many cases are still currently open or in need of attention.
  • Hard to publish and disclose the invalid ones, as they never cause an advisory to get written and we rather want the initial report and the full follow-up discussion published.
  • Hard to adapt to or use a reputation system beyond just the boolean “these people are banned”. I suspect that we over time need to use more crowdsourced knowledge or reputation based on how the reporters have behaved previously or in relation to other projects.

Onward and upward

Since we dropped the bounty, the inflow tsunami has dried out substantially. Perhaps partly because of our switch over to GitHub? Perhaps it just takes a while for all the sloptimists to figure out where to send the reports now and perhaps by going back to Hackerone we again open the gates for them? We just have to see what happens.

We will keep iterating and tweaking the program, the settings and the hosting providers going forward to improve. To make sure we ship a robust and secure set of products and that the team doing so can do that

Security problems?

If you suspect a security problem in curl or libcurl, report it here: https://hackerone.com/curl

The other forges don’t even try

Gitlab, Codeberg and others are GitHub alternatives and competitors, but few of them offer this kind of security reporting feature. That makes them bad alternatives or replacements for us for this particular service.

Open Source security in spite of AI

The title of my ending keynote at FOSDEM February 1, 2026.

As the last talk of the conference, at 17:00 on the Sunday lots of people had already left, and presumably a lot of the remaining people were quite tired and ready to call it a day.

Still, the 1500 seats in Janson got occupied and there was even a group of more people outside wanting to get in that had to be refused entry.

The video recording

Thanks to the awesome FOSDEM video team, the recording was made available this quickly after the presentation.

You can also get the video off FOSDEM servers.

The slides

The 59 slide PDF version.

no strcpy either

Some time ago I mentioned that we went through the curl source code and eventually got rid of all strncpy() calls.

strncpy() is a weird function with a crappy API. It might not null terminate the destination and it pads the target buffer with zeroes. Quite frankly, most code bases are probably better off completely avoiding it because each use of it is a potential mistake.

In that particular rewrite when we made strncpy calls extinct, we made sure we would either copy the full string properly or return error. It is rare that copying a partial string is the right choice, and if it is, we can just as well memcpy it and handle the null terminator explicitly. This meant no case for using strlcpy or anything such either.

But strcpy?

strcpy however, has its valid uses and it has a less bad and confusing API. The main challenge with strcpy is that when using it we do not specify the length of the target buffer nor of the source string.

This is normally not a problem because in a C program strcpy should only be used when we have full control of both.

But normally and always are not necessarily the same thing. We are but all human and we all do mistakes. Using strcpy implies that there is at least one or maybe two, buffer size checks done prior to the function invocation. In a good situation.

Over time however – let’s imagine we have code that lives on for decades – when code is maintained, patched, improved and polished by many different authors with different mindsets and approaches, those size checks and the function invoke may glide apart. The further away from each other they go, the bigger is the risk that something happens in between that nullifies one of the checks or changes the conditions for the strcpy.

Enforce checks close to code

To make sure that the size checks cannot be separated from the copy itself we introduced a string copy replacement function the other day that takes the target buffer, target size, source buffer and source string length as arguments and only if the copy can be made and the null terminator also fits there, the operation is done.

This made it possible to implement the replacement using memcpy(). Now we can completely ban the use of strcpy in curl source code, like we already did strncpy.

Using this function version is a little more work and more cumbersome than strcpy since it needs more information, but we believe the upsides of this approach will help us have an oversight for the extra pain involved. I suppose we will see how that will fare down the road. Let’s come back in a decade and see how things developed!

void curlx_strcopy(char *dest,
size_t dsize,
const char *src,
size_t slen)
{
DEBUGASSERT(slen < dsize);
if(slen < dsize) {
memcpy(dest, src, slen);
dest[slen] = 0;
}
else if(dsize)
dest[0] = 0;
}

the strcopy source

AI slop

An additional minor positive side-effect of this change is of course that this should effectively prevent the AI chatbots to report strcpy uses in curl source code and insist it is insecure if anyone would ask (as people still apparently do). It has been proven numerous times already that strcpy in source code is like a honey pot for generating hallucinated vulnerability claims.

Still, this will just make them find something else to make up a report about, so there is probably no net gain. AI slop is not a game we can win.

AIxCC curl details

At the AIxCC competition at DEF CON 33 earlier this year, teams competed against each other to find vulnerabilities in provided Open Source projects by using (their own) AI powered tools.

An added challenge was that the teams were also tasked to have their tooling generate patches for the found problems, and the competitors could have a go to try to poke holes on the patches which if they were successful would lead to a reduced score for the patching team.

Injected vulnerabilities

In order to give the team actual and perhaps even realistic flaws to find, the organizers injected flaws into existing source code. I was curious about how exactly this was done as curl was one of the projects they used for this in the finals, so I had a look and I figured I would let you know. Should you also perhaps be curious.

Would your tools find these vulnerabilities?

Other C based projects used for this in the finals included OpenSSL, little-cms, libexif, libxml2, libavif, freerdp, dav1d and wireshark.

The curl intro

First, let’s paste their description of the curl project here to enjoy their heart-warming words.

curl is a command-line tool and library for transferring data with URLs, supporting a vast array of protocols including HTTP, HTTPS, FTP, SFTP, and dozens of others. Written primarily in C, this Swiss Army knife of data transfer has been a cornerstone of internet infrastructure since 1998, powering everything from simple web requests to complex API integrations across virtually every operating system. What makes curl particularly noteworthy is its incredible protocol support–over 25 different protocols–and its dual nature as both a standalone command-line utility and a powerful library (libcurl) that developers can embed in their applications. The project is renowned for its exceptional stability, security focus, and backward compatibility, making it one of the most widely deployed pieces of software in the world. From IoT devices to major web services, curl quietly handles billions of data transfers daily, earning it a reputation as one of the most successful and enduring open source projects ever created.

Five curl “tasks”

There is this website providing (partial) information about all the challenges in the final, or as they call them: tasks. Their site for this is very flashy and cyber I’m sure, but I find it super annoying. It doesn’t provide all the details but enough to give us some basic insights of what the teams were up against.

Task 9

The organizers wrote a new protocol handler into curl for supporting the “totallyfineprotocl” (yes, with a typo) and within that handler code they injected a rather crude NULL pointer assignment shown below. The result variable is an integer containing zero at that point in the code.

Task 10

This task had two vulnerabilities injected.

The first one is an added parser in the HTTP code for the response header X-Powered-by: where the code copies the header field value to a fixed-size 64 bytes buffer, so that if the contents is larger than so it is a heap buffer overflow.

The second one is curiously almost a duplicate of task 9 using code for a new protocol:

Task 20

Two vulnerabilities. The first one inserts a new authentication method to the DICT protocol code, where it contains a debug handler/message with string format vulnerability. The curl internal sendf() function takes printf() formatting options.

The second is hard to understand based on the incomplete code they provide, but the gist of it that the code uses an array for number of seconds in text format that it indexes with the given “current second” without taking leap seconds into account which then would access the stack out of bounds if tm->tm_sec is ever larger than 59:

Task 24

Third time’s the charm? Here’s the maybe not so sneaky NULL pointer dereference in a third made up protocol handler quite similar to the previous two:

Task 44

This task is puzzling to me because it is listed as “0 vulnerabilities” and there is no vulnerability details listed or provided. Is this a challenge no one cracked? A flaw on the site? A trick question?

Modern tools find these

Given what I recently have seen what modern tools from Aisle and ZeroPath etc can deliver, I suspect lots of tools can find these flaws now. As seen above here, they were all rather straight forward and not hidden or deeply layered very much. I think for future competitions they need to up their game. Caveat of course that I didn’t look much at the tasks related to other projects; maybe they were harder?

Of course making the problems harder to find will also make more work for the organizers.

I suspect a real obstacle for the teams to find these issues had to be the amount of other potential issues the tools also found and reported; some rightfully and some not quite as correctly. Remember how ZeroPath gave us over 600 potential issues on curl’s master repository just recently. I have no particular reason to think that other projects would have fewer, at least if at a comparable size.

[Addition after first post] I was told that a general idea for how to inject proper and sensible bugs for the competition, was to re-insert flaws from old CVEs, as they are genuine problems in the project that existed in the past. I don’t know why they ended up not doing this (for curl).

Reports?

I have unfortunately not seen much written in terms of reports and details from the competition from the competing teams. I am still waiting for details on some of their scans on curl.

CRA compliant curl

As the Cyber Resilience Act (CRA) is getting closer and companies wanting to sell digital services in goods within the EU need to step up, tighten their procedures, improve their documentation and get control over their dependencies I feel it could be timely to remind everyone:

We of course offer full support and fully CRA compliant curl versions to support customers.

curl is not a manufacturer as per the legislation’s terminology so we as a project don’t have those requirements, but we always have our ducks in order and we will gladly assist and help manufacturers to comply.

We have done internet transfers for the world for decades. Fast, securely, standards compliant, feature packed and rock solid. We make curl to empower the world’s digital infrastructure.

You can rely on us.

From suspicion to published curl CVE

Every curl security report starts out with someone submitting an issue to us on https://hackerone.com/curl. The reporter tells us what they suspect and what they think the problem is. This report is kept private, visible only to the curl security team and the reporter while we work on it.

In recent months we have gotten 3-4 security reports per week. The program has run for over six years now, with almost 600 reports accumulated.

On average, someone in the team makes a first response to that report already within the first hour.

Assess

The curl security team right now consists of seven long time and experienced curl maintainers. We immediately start to analyze and assess the received issue and its claims. Most reports are not identifying actual security problems and are instead quickly dismissed and closed. Some of them identify plain bugs that are not security issues and then we move the discussion over to the public bug tracker instead.

This part can take anything from hours up to multiple days and usually involves several curl security team members.

If we think the issue might have merit, we ask follow-up questions, test reproducible code and discuss with the reporter.

Valid

A small fraction of the incoming reports is actually considered valid security vulnerabilities. We work together with the reporter to reach a good understanding of what exactly is required for the bug to trigger and what the flaw can lead to. Together we set a severity for the problem (low, medium, high, critical) and we work out a first patch – which also helps to make sure we understand the issue. Unless the problem is deemed serious we tend to sync the publication of the new vulnerability with the pending next release. Our normal release cycle is eight weeks so we are never farther than 56 days away from the next release.

Fix

For security issues we deem to be severity low or medium we create a pull request for the problem in the public repository – but we don’t mention the security angle of the problem in the public communication of it. This way, we also make sure that the fix gets added test exposure and time to get polished before the pending next release. Over the last five or so years, only two in about eighty confirmed security vulnerabilities have been rated a higher severity than medium. Fixes for vulnerabilities we consider to be severity high or critical are instead merged into the git repository when there is approximately 48 hours left to the pending release – to limit the exposure time before it is announced properly. We need to merge it into the public before the release because our entire test infrastructure and verification system is based on public source code.

Advisory

Next, we write up a detailed security advisory that explains the problem and exactly what the mistake is and how it can lead to something bad – including all the relevant details we can think of. This includes version ranges for affected curl versions and the exact git commits that introduced the problem as well as which commit that fixed the issue – plus credits to the reporter and to the patch author etc. We have the ambition to provide the best security advisories you can find in the industry. (We also provide them in JSON format etc on the site for the rare few users who care about that.) We of course want the original reporter involved as well so that we make sure that we get all the angles of the problem covered accurately.

CVE

As we are a CNA (CVE Numbering Authority), we reserve and manage CVE Ids for our own issues ourselves.

Pre-notify

About a week before the pending release when we also will publish the CVE, we inform the distros@openwall mailing list about the issue, including the fix, and when it is going to be released. It gives Open Source operating systems a little time to prepare their releases and adjust for the CVE we will publish.

Publish

On the release day we publish the CVE details and we ship the release. We then also close the HackerOne report and disclose it to the world. We disclose all HackerOne reports once closed for maximum transparency and openness. We also inform all the curl mailing lists and the oss-security mailing list about the new CVE. Sometimes we of course publish more than one CVE for the same release.

Bounty

Once the HackerOne report is closed and disclosed to the world, the vulnerability reporter can claim a bug bounty from the Internet Bug Bounty which pays the researcher a certain amount of money based on the severity level of the curl vulnerability.

(The original text I used for this blog post was previously provided to the interview I made for Help Net Security. Tweaked and slightly extended here.)

The team

The heroes in the curl security team who usually work on all this in silence and without much ado, are currently (in no particular order):

  • Max Dymond
  • Dan Fandrich
  • Daniel Gustafsson
  • James Fuller
  • Viktor Szakats
  • Stefan Eissing
  • Daniel Stenberg

preparing for the worst

One of these mantras I keep repeating is how we in the curl project keep improving, keep polishing and keep tightening every bolt there is. No one can do everything right from day one, but given time and will we can over time get a lot of things lined up in neat and tidy lines.

And yet new things creep up all the time that can be improved and taken yet a little further.

An exercise

Back in the spring of 2025 we had an exercise at our curl up meeting in Prague. Jim Fuller played up an imaginary life-like scenario for a bunch of curl maintainers. In this role played major incident we got to consider how we would behave and what we would do in the curl project if something like Heartbleed or a serious breach occur.

It was a little of an eye opener for several of us. We realized we should probably get some more details written down and planned for.

Plan ahead

We of course arrange for things and work on procedures and practices to the best of our abilities to make sure that there will never be any such major incident in the curl project. However, as we are all human and we all do mistakes, it would be foolish to think that we are somehow immune against incidents of the highest possible severity level. Rationally, we should just accept the fact that even though the risk is ideally really slim, it exists. It could happen.

What if the big bad happens

We have now documented some guidelines about what exactly constitutes a major incident, how it is declared, some roles we need to shoulder while it is ongoing, with a focus on both internal and external communication, and how we declare that the incident has ended. It’s straight forward and quite simple.

Feel free to criticize or improve those if you find omissions or mistakes. I imagine that if we ever get to actually use these documented steps because of such a project-shaking event, we will get reasons to improve it. Until then we just need to apply our imagination and make sure it seems reasonable.

Death by a thousand slops

I have previously blogged about the relatively new trend of AI slop in vulnerability reports submitted to curl and how it hurts and exhausts us.

This trend does not seem to slow down. On the contrary, it seems that we have recently not only received more AI slop but also more human slop. The latter differs only in the way that we cannot immediately tell that an AI made it, even though we many times still suspect it. The net effect is the same.

The general trend so far in 2025 has been way more AI slop than ever before (about 20% of all submissions) as we have averaged in about two security report submissions per week. In early July, about 5% of the submissions in 2025 had turned out to be genuine vulnerabilities. The valid-rate has decreased significantly compared to previous years.

We have run the curl Bug Bounty since 2019 and I have previously considered it a success based on the amount of genuine and real security problems we have gotten reported and thus fixed through this program. 81 of them to be exact, with over 90,000 USD paid in awards.

End of the road?

While we are not going to do anything rushed or in panic immediately, there are reasons for us to consider changing the setup. Maybe we need to drop the monetary reward?

I want us to use the rest of the year 2025 to evaluate and think. The curl bounty program continues to run and we deal with everything as before while we ponder about what we can and should do to improve the situation. For the sanity of the curl security team members.

We need to reduce the amount of sand in the machine. We must do something to drastically reduce the temptation for users to submit low quality reports. Be it with AI or without AI.

The curl security team consists of seven team members. I encourage the others to also chime in to back me up (so that we act right in each case). Every report thus engages 3-4 persons. Perhaps for 30 minutes, sometimes up to an hour or three. Each.

I personally spend an insane amount of time on curl already, wasting three hours still leaves time for other things. My fellows however are not full time on curl. They might only have three hours per week for curl. Not to mention the emotional toll it takes to deal with these mind-numbing stupidities.

Times eight the last week alone.

Reputation doesn’t help

On HackerOne the users get their reputation lowered when we close reports as not applicable. That is only really a mild “threat” to experienced HackerOne participants. For new users on the platform that is mostly a pointless exercise as they can just create a new account next week. Banning those users is similarly a rather toothless threat.

Besides, there seem to be so many so even if one goes away, there are a thousand more.

HackerOne

It is not super obvious to me exactly how HackerOne should change to help us combat this. It is however clear that we need them to do something. Offer us more tools and knobs to tweak, to save us from drowning. If we are to keep the program with them.

I have yet again reached out. We will just have to see where that takes us.

Possible routes forward

People mention charging a fee for the right to submit a security vulnerability (that could be paid back if a proper report). That would probably slow them down significantly sure, but it seems like a rather hostile way for an Open Source project that aims to be as open and available as possible. Not to mention that we don’t have any current infrastructure setup for this – and neither does HackerOne. And managing money is painful.

Dropping the monetary reward part would make it much less interesting for the general populace to do random AI queries in desperate attempts to report something that could generate income. It of course also removes the traction for some professional and highly skilled security researchers, but maybe that is a hit we can/must take?

As a lot of these reporters seem to genuinely think they help out, apparently blatantly tricked by the marketing of the AI hype-machines, it is not certain that removing the money from the table is going to completely stop the flood. We need to be prepared for that as well. Let’s burn that bridge if we get to it.

The AI slop list

If you are still innocently unaware of what AI slop means in the context of security reports, I have collected a list of a number of reports submitted to curl that help showcase. Here’s a snapshot of the list from today:

  1. [Critical] Curl CVE-2023-38545 vulnerability code changes are disclosed on the internet. #2199174
  2. Buffer Overflow Vulnerability in WebSocket Handling #2298307
  3. Exploitable Format String Vulnerability in curl_mfprintf Function #2819666
  4. Buffer overflow in strcpy #2823554
  5. Buffer Overflow Vulnerability in strcpy() Leading to Remote Code Execution #2871792
  6. Buffer Overflow Risk in Curl_inet_ntop and inet_ntop4 #2887487
  7. bypass of this Fixed #2437131 [ Inadequate Protocol Restriction Enforcement in curl ] #2905552
  8. Hackers Attack Curl Vulnerability Accessing Sensitive Information #2912277
  9. (“possible”) UAF #2981245
  10. Path Traversal Vulnerability in curl via Unsanitized IPFS_PATH Environment Variable #3100073
  11. Buffer Overflow in curl MQTT Test Server (tests/server/mqttd.c) via Malicious CONNECT Packet #3101127
  12. Use of a Broken or Risky Cryptographic Algorithm (CWE-327) in libcurl #3116935
  13. Double Free Vulnerability in libcurl Cookie Management (cookie.c) #3117697
  14. HTTP/2 CONTINUATION Flood Vulnerability #3125820
  15. HTTP/3 Stream Dependency Cycle Exploit #3125832
  16. Memory Leak #3137657
  17. Memory Leak in libcurl via Location Header Handling (CWE-770) #3158093
  18. Stack-based Buffer Overflow in TELNET NEW_ENV Option Handling #3230082
  19. HTTP Proxy Bypass via CURLOPT_CUSTOMREQUEST Verb Tunneling #3231321
  20. Use-After-Free in OpenSSL Keylog Callback via SSL_get_ex_data() in libcurl #3242005
  21. HTTP Request Smuggling Vulnerability Analysis – cURL Security Report #3249936

more views on curl vulnerabilities

This is an intersection of two of my obsessions: graphs and vulnerability data for the curl project.

In order to track and follow every imaginable angle of development, progression and (possible) improvements in the curl project we track and log lots of metadata.

In order to educate and inform users about past vulnerabilities, but also as a means for the project team to find patterns and learn from past mistakes, we extract and document every detail.

Do we improve?

The grand question. Let’s get back to this a little later. Let’s first walk through some of the latest additions to the collection of graphs on the curl dashboard.

The here data is mostly based on the 167 published curl vulnerabilities to date.

vulnerability severity distribution

Twenty years ago, we got very few vulnerability reports. The ones we got were only for the most serious problems and lots of the smaller problems were just silently fixed without being considered anything else than bugs.

Over time, security awareness has become more widespread and nowadays many more problems are reported. Because people are more vigilant, more people are looking and problems are now more often considered security problems. In recent years also because we offer monetary rewards.

This development is clearly visible in this new graph showing the severity distribution among all confirmed curl vulnerabilities through time. It starts out with the first report being a critical one, adding only high severity ones for a few years until the first low appears in 2006. Today, we can see that almost half of all reports so far has been graded medium severity. The dates in the X-axis are when the reports were submitted to us.

Severity distribution in code

One of the tricky details with security reports is that they tend to identify a problem that has existed in code already for quite some time. For a really long time even in many cases. How long you may ask? I know I did.

I created a graph to illustrate this data already years ago, but it was a little quirky and hard to figure out. What you learn after a while trying to illustrate data over time as a graph, is sometimes you need to try a few different ways and layouts before it eventually “speaks” to you. This is one of those cases.

For every confirmed vulnerability report we receive, we backtrack and figure out exactly which the first release was that shipped the vulnerability. For the last decades we also identify the exact commit that brought it and of course the exact commit that fixed it. This way, we know the exact age of every vulnerability we ever had.

Hold on to something now, because here comes an information dense graph if there ever was one.

  • There is a dot in the graph for every known vulnerability
  • The X-axis is the date the vulnerability was fixed
  • The Y-axis is the number of years the flaw existed in code before we fixed it
  • The color of each dot indicates the severity level of the vulnerability (see the legend)

To guide the viewer, there is also a few diagonal lines. They show the release dates of a number of curl versions. I’ll explain below how they help.

Now, look at the graph here and I’ll continue below.

Yes, you are reading it right. If you count the dots above the twenty year line, you realize that no less than twelve of the flaws existed in code that long before found and fixed. Above the fifteen year line is almost too many to even count.

If you check how many dots that are close to the the “4.0” diagonal line, it shows how many bugs that have been found throughout the decades that were introduced in code not long after the initial curl release. The other diagonal lines help us see around which particular versions other bugs were introduced.

The green dotted median line we see bouncing around is drawn where there are exactly as many older reports as there are newer. It has hovered around seven years for several recent years but has fallen down to about six recently. Probably too early to tell if this is indeed a long-term evolution or just a temporary blip.

The average age is even higher, about eight years.

You can spot a cluster of fixed issues in 2016. It remains the year with most number of vulnerabilities reported and fixed in curl: 24. Partly because of a security audit.

A key take-away here is that vulnerabilities linger a long time before found. It means that whatever we change in code today, we cannot see the exact effect on vulnerability frequency until many years into the future. We can’t even know exactly how long time we need to tell for sure.

Current knowledge, applied to old data

The older the projects gets, the more we learn about mistakes we did in the past. The more we realize that some of the past releases were quite riddled with vulnerabilities. Something nobody knew back then.

For every release ever made from the first curl release in 1998 we increase a counter for every vulnerability we now know was present. Make it a different color depending on vulnerability severity.

If we lay all this out in a graph, it becomes an interesting “mountain range” style look. In the end of 2013, we shipped a release that contained no less than (what we now know were) 87 security problems.

In this image we can spot that around 2017, the amount of high severity flaws present in the code decreased and they have been almost extinct since 2019. We also see how the two critical flaws thankfully only existed for brief periods.

However. Recalling that the median time for a vulnerability to exist before getting reported is six years, we know that there is a high probability that at least the rightmost 6-10 years of the graph is going to look differently when we redraw this same graph 6-10 years into the future. We simply don’t know how different it will be.

Did we do anything different in the project starting 2017? I have not been able to find any major distinct thing that stands out. We still only had a dozen CI builds but we started fuzzing curl that year. Maybe that is the change that is now visible?

C mistakes

curl is written in C and C is not a memory-safe language. People keep suggesting that we should rewrite it in other languages. In jest and for real. (Spoiler: we won’t rewrite it in any language.)

To get a feel for how much the language itself impacts our set of vulnerabilities, we analyze every flaw and assess if it is likely to have been avoided had we not used C. By manual review. This helps us satisfy our curiosity. Let me be clear that the mistakes are still ours and not because of the language. They are our mistakes that the language did not stop or prevent.

To also get a feel for how or if this mistake rate changes over time, I decided to use the same mountain layout as the previous graph: iterate over all releases and this time count the vulnerabilities they had and instead separate them only between C mistakes and not C mistakes. In the graph the amount of C mistakes is shown in a red-brown nuance.

C mistakes among the vulnerabilities present in code

The dotted line shows the share of the total that is C mistakes, and the Y axis for that is on the right side.

Again, since it takes six years to get half of the reports, we must take at least the rightmost side of the graph as temporary as it will be changed going forward.

The trend looks like we are reducing the share of C bugs though. I don’t think there is anything that suggests that such bugs would be harder to detect than others (quite the opposite actually) so even if we know the graph will change, we can probably say with some certainty that the C mistake rate has indeed been reduced the last six seven years? (See also writing C for curl on how we work consciously on this.)

Do we improve?

I think (hope?) we are, even if the graphs are still not reliably showing this. We can come back here in 2030 or so and verify. It would be annoying if we weren’t.

We do much more testing than ever: more test cases, more CI jobs with more build combinations, using more and better analyzer tools. Combined with concerned efforts to make us write better code that helps us reduce mistakes.

Detecting malicious Unicode

In a recent educational trick, curl contributor James Fuller submitted a pull-request to the project in which he suggested a larger cleanup of a set of scripts.

In a later presentation, he could show us how not a single human reviewer in the team nor any CI job had spotted or remarked on one of the changes he included: he replaced an ASCII letter with a Unicode alternative in a URL.

This was an eye-opener to several of us and we decided we needed to up our game. We are the curl project. We can do better.

GitHub

The replacement symbol looked identical to the ASCII version so it was not possible to visually spot this, but the diff viewer knows there is a difference.

In this GitHub website screenshot below I reproduced a similar case. The right-side version has the Latin letter ‘g’ replaced with the Armenian letter co. They appear to be the same.

The diff viewer says there is a difference but as a human it isn’t possible to detect what it is. Is it a flaw? Does it matter? If done “correctly”, it would be done together with a real and expected fix.

The impact of changing one or more letters in a URL can of course be devastating depending on conditions.

When I flagged about this rather big omission to GitHub people, I got barely no responses at all and I get the feeling the impact of this flaw is not understood and acknowledged. Or perhaps they are all just too busy implementing the next AI feature we don’t want.

Warnings

When we discussed this problem on Mastodon earlier this week, Viktor Szakats provided me with an example screenshot of doing a similar stunt with Gitea which quite helpfully highlights that there is something special about the replacement:

I have been told that some of the other source code hosting services also show similar warnings.

As a user, I would actually like to know even more than this, but at least this warns about the proposed change clearly enough so that if this happens I would get the code manually and investigate before accepting such a change.

Detect

While we wait for GitHub to wake up and react (which I have no expectation will actually happen anytime soon), we have implemented checks to help us poor humans spot things like this. To detect malicious Unicode.

We have added a CI job that scans all files and validates every UTF-8 sequence in the git repository.

In the curl git repository most files and most content are plain old ASCII so we can “easily” whitelist a small set of UTF-8 sequences and some specific files, the rest of the files are simply not allowed to use UTF-8 at all as they will then fail the CI job and turn up red.

In order to drive this change home, we went through all the test files in the curl repository and made sure that all the UTF-8 occurrences were instead replaced by other kind of escape sequences and similar. Some of them were also used more or less by mistake and could easily be replaced by their ASCII counterparts.

The next time someone tries this stunt on us it could be someone with less good intentions, but now ideally our CI will tell us.

Confusables

There are plenty of tools to find similar-looking characters in different Unicode sets. One of them is provided by the Unicode consortium themselves:

https://util.unicode.org/UnicodeJsps/confusables.jsp

Reactive

This was yet another security-related fix reacting on a demonstrated problem. I am sure there are plenty more problems which we have not yet thought about nor been shown and therefore we do not have adequate means to detect and act on automatically.

We want and strive to be proactive and tighten everything before malicious people exploit some weakness somewhere but security remains this never-ending race where we can only do the best we can and while the other side is working in silence and might at some future point attack us in new creative ways we had not anticipated.

That future unknown attack is a tricky thing.

Update

(At 17:30 on the same day of the original post) GitHub has told me they have raised this as a security issue internally and they are working on a fix.