curl 8.6.0

Numbers

the 254th release
7 changes
56 days (total: 9,448)

154 bug-fixes (total: 9,888)
257 commits (total: 31,684)
0 new public libcurl function (total: 93)
1 new curl_easy_setopt() option (total: 304)

0 new curl command line option (total: 258)
65 contributors, 40 new (total: 3,078)
36 authors, 18 new (total: 1,237)
1 security fix (total: 151)

Release presentation

Security

CVE-2024-0853: OCSP verification bypass with TLS session reuse. curl inadvertently kept the SSL session ID for connections in its cache even when the verify status (OCSP stapling) test failed. A subsequent transfer to the same hostname could then succeed if the session ID cache was still fresh, which then skipped the verify status check.

Changes

  • Markdown documentation. Most of the libcurl and command line documentation is now written using (basic) markdown instead of previous formats. Easier to read, easier to write.
  • CURLE_TOO_LARGE. A new libcurl error code for when “something” is growing too big to be allowed. Like a URL, a HTTP request or similar. Previously it would return out of memory for those situations which caused confusion to users.
  • CURLINFO_QUEUE_TIME_T. Applications can now ask libcurl how long a transfer was “queued” internally before it actually started.
  • CURLOPT_SERVER_RESPONSE_TIMEOUT_MS. A new millisecond version of the already existing option to allow applications higher resolution control.
  • Use GetAddrInfoExW on >= Windows 8. On current Windows versions libcurl will now do asynchronous name resolving by default without using threads, which should be less resource heavy.
  • libpsl detection failure in configure causes error. If configure cannot find libpsl it will require the user to say that it should not be used, or to fix the problem. To make people who build curl more aware of the PSL state of the build.
  • runtests supports -gl, When you invoke individual test cases on macOS, you can now ask to run it with lldb with -gl just as you have been able to run it with gdb using -g for decades. Helps debugging difficult cases.

Bugfixes

Here some of my favorite bugfixes from this cycle:

configure: add libngtcp2_crypto_boringssl detection. Previously it would only detect and build out of the box with the quictls version of ngtcp2 builds.

configure: when enabling QUIC, check that TLS supports QUIC. More efforts trying to detect wrong and invalid build combinations earlier, to avoid users ending up with broken builds.

all libcurl man page examples are verified in CI. Every man page example now compiles cleanly. This step made us detect and fix numerous tiny mistakes of the most annoying kind: when you copy code from docs and it does not work.

curl shows ipfs and ipns as supported “protocols”. In the regular --version output. Even if they are converted to https:// internally.

curl bsearches command line options. The command line parsing is now magnitudes faster. Of course it will not really be noticeable outside of the most extreme cases.

curl stopped supporting @filename style for --cookie. This syntax was never documented and was not used in any test case. It was risking to cause unwanted surprises.

curl –remove-on-error only removes “real” files. Mostly as a precaution for when users are unclever enough to run curl with elevated privileges and would save to a device or named pipe etc.

curl no longer sets the file comment on Amiga. It would truncate the URL weirdly and also risk leak credentials if such were used in the URL.

lib: reduced use of the download buffer all over. The download buffer has over time been abused for all kinds of buffer purposes. This cycle we have made a lot of such buffer use instead start use their own buffers. With a little luck, this will make us possible use a single download buffer for all transfer in a multi handle, thereby drastically reducing the amount of used memory when doing parallel transfers. With no behavior difference or performance degradation. Details on this will follow later.

lib: use memdup0 instead of malloc + memcpy. This was a common code pattern, and with this we reduce the number of mallocs and memcpys at the same time – which we think is good since they are known “problem functions” that are easy to mess up.

lib: various conversions from malloc to dynbuf. In similar spirit as the above, we continued to switch more functions away from using malloc and family to instead use the internal dynbuf API for managing dynamic buffers in a way that is less likely to cause memory related issues.

resolving: with modern c-ares, use its default timeout. It means tighter timeouts by default but also that this combo now also respect the timeout that can be specified in resolve.conf.

headers API: make sure the trailing newline is never stored. A header with no content on the right side of the colon would erroneously get its trailing newline stored as content

mprintf: overhaul, performance and bugfixes. Now the curl printf functions work even more similar to the glibc counterparts especially when provided illegal %-combinations and when using the <num>$ operator. Performance measurements on this new code also says this code now executes around 30% faster on commonly used format strings.

ftp: handle the PORT parsing without allocation. Minor cleanup.

http3: initial support for OpenSSL 3.2 QUIC stack. The forth QUIC backend in curl is here.

http: check for “Host:” case insensitively. If you would ask to disable this header with a different casing that what was compared, it would still send an empty header in the request.

http: only act on 101 responses when they are HTTP/1.1. If a HTTP response says another protocol version with a 101 response, it is now considered an illegal combination.

openldap: fix an LDAP crash. LDAP without TLS would crash on basic use.

openldap: fix STARTTLS. It was recently broken in a refactor.

Next

There are no revolutionary changes in the pipe, but there are a series of things we most likely are going to land in the next cycle making the next version number likely to become 8.7.0.

Coming: a curl distros meeting

The curl project arranges a two hour video conference meeting on March 21, 2024, with the aim of getting together people from curl and persons packaging curl for distributions. Linux distros and others. If you distribute curl to your users, consider yourself invited!

The main purpose is to improve curl and how curl is shipped to end users. The idea is to make it a multi-party conversation to see how we can improve collaboration, information exchange, testing, experiences etc so that we in the curl project can help the distros ship curl better, faster and more convenient. To the benefit for everyone who uses curl and libcurl packaged by distributions.

Of course, getting people working on related matters together and talking could also help us get to know each other a little better, to get some faces and voices attached to email addresses and by that, making cooperation smoother in the future.

The meeting is open for anyone interested to join and participate in.

The event starts at 16:00 UTC and might last up to two hours.

All details and specifics for this event is put in this wiki page:

https://github.com/curl/curl/wiki/curl-distro-discussion-2024

Welcome!

Logitech G915 TKL

On December 21 2023 I started using this new keyboard as my daily driver. I have used my previous one almost daily since August 2014. Back in 2015 I counted doing almost 7 million key-presses/year. I don’t think I’ve typed any less since then. I rather think there are signs that I have typed a lot more since, especially since I started working full-time on curl in 2019. My estimate is thus that my trusted old keyboard has served over 70 million key-presses. And it has done it very well.

I did not want to comment on the keyboard too early, but I also did not want too long so that I forget about the little details a transition like this can bring up. I have averaged around 30,000 key strokes per day since I started using the new keyboard (I used keyfreq to figure this out). I also toyed around on keybr.com (where I reach 84 words/minute typing speed – with a fair amount of typos), to get a feel for how efficient I can possibly become on this thing. Seems okay!

Keyboards belongs in this annoying category of products that are really hard to figure out and get a proper feel for ahead of time even if you would be able to try them out before purchase. They pretty much require that we use them for a while before we can tell for sure we like them and want to stay with them many hours every day for the next several years.

Based on a recommendation from my brother Björn, I went with Logitech G915 TKL – tactile keys flavored. “Carbon” they call this color. It apparently also is available in white. Without a recommendation, I would not have considered this brand.

RGB Lights

This thing is branded a “gaming keyboard” and then of course the default keyboard RGB lightning effects consist of moving and changing colors across all the keys. There are 10 color style presets to select from, but the keyboard goes back to the default after idle timeouts, making the default mode really important. The factory default mode really screams hey look at me!, and is mostly disturbing to use.

I have read comments from Linux users online who even returned their G915 keyboards due to this crazy scheme not being possible to change properly on device, and the unfortunate truth is that I had to install their control software on a Windows box nearby to set a new default coloring mode: a solid single color on all keys – including that big G logo-symbol in the top left. Resetting back to this mode is fine. I appreciate some light on the keys, especially when using the keyboard at night.

It could perhaps also be mentioned that their control software on Windows, G Hub , while looking fancy, was not easy to figure out how to do this with.

I like that the lights on the keyboard go off completely after idling long enough, so that my workplace goes almost completely dark when left alone.

TKL

Ever since I got my previous keyboard I have been interested in a Ten Key Less (TKL) model. I never use any of those keys on the numerical part and I figured it could be good ergonomics to reduce the travel distance for my right hand when moving from keys to the mouse and back. A move that I must do hundreds if not thousands of times per day.

Wireless

This keyboard is wireless but it is not a feature I care about. I use this quite stationary at my desk within cable distance and I find rather use a cable than having to recharge the device.

Layout

I’m using the traditional Swedish QWERTY layout, which some people legitimately will mention is not optimal for development compared to US keyboard layouts. This would be because of what key combos you need to write for parentheses, braces, forward-slash etc. But since the Swedish language has these commonly used extra letters (Å, Ä and Ö) and we have been using this keyboard layout for a long time in this country, I am used to it and where ever you go on Sweden and find a keyboard, almost every one of them will use this layout. It is the path of least resistance.

I have never been a fan of the ergonomic keyboard styles and layouts. Probably because I never really allowed myself enough time to get used to them.

Media keys

With my previous keyboard I had to press Fn plus an F-key to do media control, while this new one has a set of dedicated media keys above the F-keys as you can see on the image above. I can also control the volume with the scroll wheel thing on the top right. Pretty convenient actually.

I have grown to appreciate having media control keys on the keyboard, and I find that having dedicated keys like this is a step up as it saves me from doing combo-presses frequently. A little less finger gymnastics.

Switches

Logitech does not seem to have a specific brand name for their switches. They offer the same keyboard with three different styles: linear, clicky or tactile. I went with tactile and I’m satisfied. The feel is not substantially different than my previous cherry reds.

Sound

I had Cherry MX Red switches in my previous keyboard, and this new one uses Logitech switches without any distinct name (that I could find). I did not have any particular problem with the noise level before but I hoped that this new one would perhaps be a little quieter. For the sake of family members when I type away on my keyboard when they are asleep and maybe to be less audible when I do video recordings etc. My home office and this keyboard setup is not far away from where they sleep.

I think maybe this keyboard is a little more silent, but not by much. I asked family members if they had noticed any difference in sound levels since my keyboard switch but none of them had noticed any change.

Size

Length: 368 mm
Width: 150 mm
Height: 22 mm
Weight: 810 g

The actual size of the keyboard and keys is almost identical to my previous keyboard, I mean when excluding the removed numerical keyboard. The weight of the G915 makes it feel solid and it sits firmly and fixated on the table reliably even during most my intense typing sessions. I suppose it is also good for wireless use as it won’t feel as “flimsy” as regular all-plastic wireless ones tend to feel. But I have not used it much in wireless mode.

Feeling

It feels solid and like it can survive being used daily for a long time.

This is what my previous keyboard looked like.

curl docs format evolution

I trust you have figured out already that I have the highest ambitions for the curl documentation. I want everything documented in a clear, easy-to-read and easy-to-find manner. This takes a lot of work and is not something that happens without effort.

Part of providing best-in-class documentation has in my mind always been to provide and ship man pages. And to make sure that those latest up-to-date man pages are always offered to users as good-looking webpages on the curl site.

Over twenty years ago I decided to render the web versions of the man pages by creating my own tool for that purpose: roffit. We still use it on the site today. It is a simple tool that converts nroff formatted man pages to HTML and adds cross-referencing links to other man pages. There is no concept of links in nroff, but I decided to use a combination of text and markup to use as a signal and it has worked out fine since. In part of course because I have then also subsequently made sure to format the curl man pages in this roffit-friendly way.

curl.1

We started writing the curl man pages in the nroff format some time in the beginning of 1999. Mostly then because we did not know or have any tool that would convert nicely into nroff and later on to make sure we got things formatted the way we wanted. This approach works and can be used, but nroff is not a terribly human-friendly format and over time we decided that documenting several hundred command line options in a single nroff file was not ideal.

In November 2016 we revamped the system and started to generate the curl.1 man page at build time from multiple source files. Every command line option would then be documented in its own stand-alone file. Easier to read, easier to manage. Yet we controlled the output nroff format so that the web version would still look excellent. The individual files got .d extensions (for document) but were mostly still using nroff, and were mostly all appended to create the result.

Over time, we introduced more convenience formatting aids to the .d files so that we could write them using less and less nroff, and as a result they also grew more readable in their source format.

The .d file format never had its own name. It is just how we documented 250+ command line options for the curl command line tool.

The libcurl docs however remained authored in nroff. In 2014, the number of libcurl man pages exploded when we started doing a separate document for every libcurl option. We went from 59 man pages in the project to 270 between two releases. Since then we have added many more. Today in early 2024, we have a total of 496 individual man pages in the project. 493 of them concerns libcurl.

Markdown

When curl started, markdown did not exist – that format was created first in 2004. The early documentation we created in the project that was not nroff, we made using plain text files. It was first in 2014 when we slowly started to convert some of our text documents into markdown.

Markdown has several upsides compared to plain old text. In particular it is easier to render good-looking web versions of them, which we do on the curl web site and it offers cross-document linking etc. Over the years we have converted almost all our text documents into markdown. Markdown is also a user-friend format as it is easy to read and easy to edit for contributors, new and old.

curldown

The move away from using nroff for the libcurl man pages came with the introduction of the “curldown” document format in early 2024. Meant to make it easy to generate the same kind of nroff files we previously hand-edited, but using “markdown”. I use quotes around it because it is not a full-fledged markdown support, but it looks similar enough to trick casual users and we decided to use “.md” file extensions to make editors, GitHub and more treat and show them as such.

The file format is inspired by the one we created for the command line tool: with a header holding meta-data and then simplified markdown. Easy to read, easy to edit – for everyone. No more strange nroff syntax to copy and paste. The fact that GitHub render the sources files clean and good-looking is also attractive.

Switching to a markdown lookalike format had some other side-effects: suddenly a lot of CI jobs we have running caught these files and had a go at them: spellchecks, prose checks, checks for capitalized words after periods and more. This improved the language significantly.

Also, generating the nroff files rather than editing them has more benefits than just avoiding the baroque syntax: it also allows us to change styles and escape certain sequences through-out all documentation with just a few fixes in the scripts producing the output.

After this 47,000 lines added and removed patch was merged, we have a completely new source format system for the libcurl docs, that generates the same style of nroff output we had previously. With improved language.

But then why not…

With the huge switch to “curldown” for libcurl options, it felt wrong to keep the old curl .d files for the curl.1 man page, seeing as they were almost there already. I did the work and converted them over to the same markdown lookalike style I did for libcurl: they are now also .md files and now they too get spell-checked and scrutinized much more.

I took this move one step further: previously we generated the single curl.1 man page using a fixed header file and a footer file that we prepended and appended to the nroff generated from the (converted) .d documentation.

Now, we instead have a mainpage.idx file that lists which (markdown) file names to include and convert to nroff. This allows us to split the two big headers and footers into several smaller markdown files, each with its own header. Like “DESCRIPTION”, “URL”, and “GLOBBING”. Separating them makes them easier to manage for contributors and the index file makes it easy to change the order of sections etc.

We end up with a curl.1 man page that looks quite similar to before, but with improve language and easier format to edit the documentation going forward.

When I type this, we have 836 files with .md extensions in the curl git repository. At the time of the most recent release, we had 61.

Future

I’m sure it does not stop here. I already have some ideas on how to improve this setup further.

I’m considering doing something to completely avoid the nroff step for the website HTML versions of the man pages and instead go straight from markdown.

The curl tool has the manpage built-in, (shown with the -M option), and I would like to fix the build to do this without using nroff.

I have some more CI jobs to add to verify and check the language in the markdown files. Mostly to make sure the language is consistent.

curl is a CNA

The curl project has been accepted as a CVE Numbering Authority (CNA) for vulnerabilities in all products directly made or managed by the project. If I’m counting correctly, we are the 351st CNA.

The official announcement from Mitre states: curl is now a CVE Numbering Authority (CNA) for all products made and managed by the curl project. This includes curl, libcurl, and trurl.

In plain English, this means that we will reserve and manage our own CVEs in the future directly against the CVE database with no middle man, and also that we have a scope for CVEs that is our territory: curl and libcurl. No one else can now register CVEs for our products – without involving us. (There’s an appeals process so someone can still actually file CVEs for issues even if we say no, but at least there’s a process where both sides will argue their points.)

We do not particularly want to be a CNA but we hope that this move will make it harder to file more stupid curl CVEs in the future.

emails I received, the collection

I have since at least 2009 posted occasional emails I received on this blog. Often they are emails from people who found my email address somewhere, thinking I am involved in the product or service where they found me. In cars, games, traces after breaches, apps and more.

They range from plain weird to fun to scary and threatening.

Somehow they help me remember that the world we live in is super complicated.

The ones I have shown here on the blog are only a small subset of those that I have received. And of all the ones I have received, I have not kept all, or at least not stored them in ways that makes it easy for me to find them now.

Also: for products such as Cisco Anyconnect and Mobdro I have received dozens of emails so I just selected a few examples to include

Out of the ones that I could find, I have dug up seventy-four different messages, converted them to markdown and added them to a brand new GitHub repository: bagder/emails.

I plan to add future email correspondence that meet the criteria as well.

Enjoy!

The top image for this blog post is a joke back from when I received the subject: urgent warning email which talked about the “hacking ring”.

PSL in curl

PSL in this context stands for Public Suffix List. It is a list of known public suffixes in domain names. A public suffix is a name under which Internet users can (or historically could) directly register names – other than the top-level domains. One of the most commonly known and used such suffixes is “co.uk”, but the complete list is rather long and exhaustive.

Public suffixes are important when working with HTTP cookies, because the only way an HTTP client can accurately deny setting super cookies is to know about every existing PSL domain and deny servers to set cookies for those domains.

The use of PSL is recommend in RFC 6265: “If feasible, user agents SHOULD use an up-to-date public suffix list” (and yes, there mere fact that it speaks of a list and not the list of course highlights how problematic this is)

A super cookie is a cookie that is set for a “too wide” domain, making it apply across multiple sites in ways cookies are not supposed to function. If we allow a super cookie set for “co.uk”, that cookie could get sent to a huge amount of different domains.

If you ask me, this is one of the ugliest parts of cookie functionality. And there are a few such parts to select from.

The exact list of names present in the PSL varies over time. New names are added, old names are removed. This of course makes aging clients over time increasingly likely to unwittingly accept cookies for domain names that have since been added to the list. Or block cookies for domains that are no longer in the list!

curl, cookies and PSL

curl has supported cookies since 1998, but it has only supported PSL since 2015 (7.46.0). PSL support in curl is powered by the libpsl library.

Support for this external library is entirely optional in the build and lots of users deliberately decide to build curl without it. Leaving it out reduces footprint, avoids an extra dependency etc. Lots of users who build curl won’t use cookies or do not care about the risk for super cookies.

When you invoke curl -V, to get to see the versions and features of your currently installed curl tool, look for the PSL keyword among the features to learn whether your curl build supports Public Suffix List handling.

cookies are cross-client

Since cookies are sent to clients / browsers typically in a completely client-agnostic way, setting cookies with “illegal” domains make them get discarded and not used. This makes sites tend to not try super cookies for normal scenarios as those cookies will just be rejected by most clients. This fact sometimes save the non-PSL compliant clients.

Of course, servers can also just send a flood of cookies to the client trying different domain attributes – as clients will just silent discard the illegal ones and keep the legal ones and then use them later according to their own best abilities.

Force a decision

Based on the realization that people have been building and shipping curl without PSL support perhaps without fully understanding what they have ducked for, I have just merged a change to curl’s configure script. This change will ship in curl 8.6.0.

The configure script checks for libpsl by default (and has done so for a long time) and will use the library if detected, but starting now it will exit with an error if the detection fails to highlight this fact in a much more noticeable way. The users who want to keep building without it can just add --without-libpsl to hush up the message. Previously, a failed detection only worked as a hint to configure that it should build curl without PSL support – completely fine, but maybe not always done so with the user being totally knowledgeable of what that means.

This change still does not necessarily mean that the person getting the configure error understands PSL in depth, but it forces them to make a decision: Willingly build without it, or fix the build to use it. Your choice!

Not a security issue (in curl)

This (now past) lenient take, to allow users to build curl easily without PSL support, was reported as a curl security problem.

I disagree with that take, as I believe it is the responsibility of everyone building curl to make sure that the sum of the components is what they think it should be – the curl build procedure is not designed for nor attempting to make sure that the final product is hardened security wise (whatever that could be). PSL support is optionally built-in – by design.

This said: users who mistakenly use curl without PSL support might of course end up in what could be seen as a potential security problem. I do however not think that it is a security problem in the curl project itself.

Funding Stefan’s curl work

The curl fund sponsors curl development in Q1 2024 . The curl fund consists entirely and only of money donated to the project by companies and individuals.

Thank you sponsors!

These two funded projects are first out in 2024.

Support for OpenSSL’s QUIC API

Designated lead developer: Stefan Eissing

In OpenSSL‘s recent 3.2 release, they introduced support for client-side QUIC. OpenSSL has taken a different API approach than what just about all other TLS libraries have done (including all the OpenSSL forks): they implement a full QUIC stack.

The result is that curl’s QUIC (and by extension HTTP/3) support that already works with three different backends does not work with OpenSSL, but only with the OpenSSL family of forks (BoringSSL, libressl, AWS-LC and quictls), wolfSSL and GnuTLS. Also, OpenSSL just shipped this, while the other TLS libraries have shipped their QUIC APIs for years already.

This design choice of OpenSSL significantly hampers HTTP/3 adoption with curl (and others) since OpenSSL remains the by far most popular TLS library to use with curl.

To improve this situation going forward, Stefan Eiussing will work on adding support for OpenSSL’s QUIC in curl using nghttp3 on top. Experience from other QUIC libraries says that they usually don’t have all the methods we need on their first attempt, and since this is OpenSSL’s first attempt and nobody from them has asked us for feedback, we expect a high risk of at least minor problems. Finding those issues earlier is better than later, in case we need to work with the OpenSSL team to improve things.

Ideally, no major problems are identified and we have HTTP/3 support with OpenSSL in official curl builds later in the year.

Refactoring the HTTP Handler

Designated lead developer: Stefan Eissing

The second half of Stefan’s assignment is to work on generally improving the state of HTTP and handling the different HTTP versions in the libcurl source code.

We want to make curl’s HTTP handler focus on generic HTTP processing of requests and responses only and separate the serialization on the wire into version specific parts. This puts HTTP 1.x, 2 and 3 implementations on par with each other. Other 1.x implementations can then be configured in (e.g. the Rust based “hyper” module) without needing to replace the core HTTP handling of curl.

Completion

Stefan has already started. We expect and hope these two projects to be mostly done and landed by the end of Q1 2024. Stefan has an established presence in the project and has already previously done lots of good development work. We are in good hands with Stefan behind the wheel for this.

I personally will of course assist, help out, review, participate in design decisions and generally cheer on, while continuing to do my regular “chores” and curl work: security, releases, maintenance, customer and user support for example.

My upcoming FOSDEM 2024

I attended FOSDEM for the first time back in 2010. I have since been back and attended every single physical version of the conference since then (remember that it skipped a few years in the COVID days).

FOSDEM is my favorite conference no doubt.

I did a presentation in the embedded dev room in 2010. I have in fact talked in front of audiences almost every year and some years I did it more than once.

Stickers and coasters

If you are interested in getting a curl sticker or two, this is a great opportunity. I will bring a senseless amount of curl stickers in different flavors and sizes so that hopefully everyone who wants one can get one. Find me at FOSDEM to get one. Or find the wolfSSL stand (K building level 1), where I will stock up stickers and also spend time every once in a while.

You can find me at the wolfSSL booth following my talks. Saturday 11:30 – 13:00 and Sunday 11:00 – 13:00.

I will also bring fancy PCB-style coasters. You must have a commit merged in curl’s source repository to be eligible to one of those beauties receive from me.

Feel free to email or DM me on Mastodon or something for syncing.

Broom not included: curling the modern way

That’s the title of my talk in the Network devroom. It is scheduled to take place at 10:50 on Saturday February 3rd. In room UB5.230.

The talk is set to take 20 minutes (including questions). My presentation abstract from when I still naively thought I could get 40 minutes describes the talk like this:

Everyone uses curl, the Swiss army knife of Internet transfers. Earlier this year we celebrated curl’s 25th birthday, and while this tool has provided a solid set of command line options for decades, new ones are added over time This talk is a look at some of the most powerful and interesting additions to curl done in recent years. The perhaps lesser known curl tricks that might enrich your command lines, extend your “tool belt” and make you more productive. Also trurl, the recently created companion tool for URL manipulations you maybe did not realize you want.

I will have to make some tough decisions on what of all this that I can actually include…

You too could have made curl

This talk has been accepted on the main track, but is not yet scheduled.

The talk is 50 minutes. It happens at 10:00 Sunday February 4th in room K1.105 (La Fontaine).

Daniel has taken the curl project to run in some 20 billion installations. He talks about what it takes to succeed with Open Source: patience, time, ups and downs, cooperation, fighting your impostor syndrome – all while having fun. There’s no genius or magic trick behind successful open source. You can do it. The talk will of course be spiced up with anecdotes, experiences and stories from Daniel’s 25 years of leading the curl project.

More

I proposed a talk titled “HTTP/3 – why and where are we” in the Web Performance devroom but it was not accepted.

I will update this post with more info as such becomes available.

The I in LLM stands for intelligence

I have held back on writing anything about AI or how we (not) use AI for development in the curl factory. Now I can’t hold back anymore. Let me show you the most significant effect of AI on curl as of today – with examples.

Bug Bounty

Having a bug bounty means that we offer real money in rewards to hackers who report security problems. The chance of money attracts a certain amount of “luck seekers”. People who basically just grep for patterns in the source code or maybe at best run some basic security scanners, and then report their findings without any further analysis in the hope that they can get a few bucks in reward money.

We have run the bounty for a few years by now, and the rate of rubbish reports has never been a big problem. Also, the rubbish reports have typically also been very easy and quick to detect and discard. They have rarely caused any real problems or wasted our time much. A little like the most stupid spam emails.

Our bug bounty has resulted in over 70,000 USD paid in rewards so far. We have received 415 vulnerability reports. Out of those, 64 were ultimately confirmed security problems. 77 of the report were informative, meaning they typically were bugs or similar. Making 66% of the reports neither a security issue nor a normal bug.

Better crap is worse

When reports are made to look better and to appear to have a point, it takes a longer time for us to research and eventually discard it. Every security report has to have a human spend time to look at it and assess what it means.

The better the crap, the longer time and the more energy we have to spend on the report until we close it. A crap report does not help the project at all. It instead takes away developer time and energy from something productive. Partly because security work is consider one of the most important areas so it tends to trump almost everything else.

A security report can take away a developer from fixing a really annoying bug. because a security issue is always more important than other bugs. If the report turned out to be crap, we did not improve security and we missed out time on fixing bugs or developing a new feature. Not to mention how it drains you on energy having to deal with rubbish.

AI generated security reports

I realize AI can do a lot of good things. As any general purpose tool it can also be used for the wrong things. I am also sure AIs can be trained and ultimately get used even for finding and reporting security problems in productive ways, but so far we have yet to find good examples of this.

Right now, users seem keen at using the current set of LLMs, throwing some curl code at them and then passing on the output as a security vulnerability report. What makes it a little harder to detect is of course that users copy and paste and include their own language as well. The entire thing is not exactly what the AI said, but the report is nonetheless crap.

Detecting AI crap

Reporters are often not totally fluent in English and sometimes their exact intentions are hard to understand at once and it might take a few back and fourths until things reveal themselves correctly – and that is of course totally fine and acceptable. Language and cultural barriers are real things.

Sometimes reporters use AIs or other tools to help them phrase themselves or translate what they want to say. As an aid to communicate better in a foreign language. I can’t find anything wrong with that. Even reporters who don’t master English can find and report security problems.

So: just the mere existence of a few give-away signs that parts of the text were generated by an AI or a similar tool is not an immediate red flag. It can still contain truths and be a valid issue. This is part of the reason why a well-formed crap report is harder and takes longer to discard.

Exhibit A: code changes are disclosed

In the fall of 2023, I alerted the community about a pending disclosure of CVE-2023-38545. A vulnerability we graded severity high.

The day before that issue was about to be published, a user submitted this report on Hackerone: Curl CVE-2023-38545 vulnerability code changes are disclosed on the internet

That sounds pretty bad and would have been a problem if it actually was true.

The report however reeks of typical AI style hallucinations: it mixes and matches facts and details from old security issues, creating and making up something new that has no connection with reality. The changes had not been disclosed on the Internet. The changes that actually had been disclosed were for previous, older, issues. Like intended.

In this particular report, the user helpfully told us that they used Bard to find this issue. Bard being a Google generative AI thing. It made it easier for us to realize the craziness, close the report and move on. As can be seen in the report log, we did have to not spend much time on researching this.

Exhibit B: Buffer Overflow Vulnerability

A more complicated issue, less obvious, done better but still suffering from hallucinations. Showing how the problem grows worse when the tool is better used and better integrated into the communication.

On the morning of December 28 2023, a user filed this report on Hackerone: Buffer Overflow Vulnerability in WebSocket Handling. It was morning in my time zone anyway.

Again this sounds pretty bad just based on the title. Since our WebSocket code is still experimental, and thus not covered by our bug bounty it helped me to still have a relaxed attitude when I started looking at this report. It was filed by a user I never saw before, but their “reputation” on Hackerone was decent – this was not their first security report.

The report was pretty neatly filed. It included details and was written in proper English. It also contained a proposed fix. It did not stand out as wrong or bad to me. It appeared as if this user had detected something bad and as if the user understood the issue enough to also come up with a solution. As far as security reports go, this looked better than the average first post.

In the report you can see my first template response informing the user their report had been received and that we will investigate the case. When that was posted, I did not yet know how complicated or easy the issue would be.

Nineteen minutes later I had looked at the code, not found any issue, read the code again and then again a third time. Where on earth is the buffer overflow the reporter says exists here? Then I posted the first question asking for clarification on where and how exactly this overflow would happen.

After repeated questions and numerous hallucinations I realized this was not a genuine problem and on the afternoon that same day I closed the issue as not applicable. There was no buffer overflow.

I don’t know for sure that this set of replies from the user was generated by an LLM but it has several signs of it.

Ban these reporters

On Hackerone there is no explicit “ban the reporter from further communication with our project” functionality. I would have used it if it existed. Researchers get their “reputation” lowered then we close an issue as not applicable, but that is a very small nudge when only done once in a single project.

I have requested better support for this from Hackerone. Update: this function exists, I just did not look at the right place for it…

Future

As these kinds of reports will become more common over time, I suspect we might learn how to trigger on generated-by-AI signals better and dismiss reports based on those. That will of course be unfortunate when the AI is used for appropriate tasks, such as translation or just language formulation help.

I am convinced there will pop up tools using AI for this purpose that actually work (better) in the future, at least part of the time, so I cannot and will not say that AI for finding security problems is necessarily always a bad idea.

I do however suspect that if you just add an ever so tiny (intelligent) human check to the mix, the use and outcome of any such tools will become so much better. I suspect that will be true for a long time into the future as well.

I have no doubts that people will keep trying to find shortcuts even in the future. I am sure they will keep trying to earn that quick reward money. Like for the email spammers, the cost of this ends up in the receiving end. The ease of use and wide access to powerful LLMs is just too tempting. I strongly suspect we will get more LLM generated rubbish in our Hackerone inboxes going forward.

Discussion

Hacker news

Credits

Image by Haider Mahmood from Pixabay

tech, open source and networking