June 15, 2023 10:00 AM PST (19:00 CEST, 17:00 UTC)
Register Here
This will be was an overview by Daniel Stenberg of the new WebSocket support in curl and in particular how to use this API with libcurl in your applications. It is followed by a live Q&A.
The session will be recorded and made available after the fact.
This webinar is done as a registration-only event on Zoom. If this is problematic for you, there will be a separate second version of this webinar done over Twitch at a later date.
There is something about having your product installed in over twenty billion instances all over the world and even out of the globe. In my case it helps me remain focused on and committed to working on the security aspects of curl. Ideally, we will never have our heartbleed moment.
Security is also a generally growing concern in the world around us and Open Source security perhaps especially so. This is one reason why NVD making things up is such a big problem.
The National Vulnerability Database (NVD) has a global presence. They host and share information about security vulnerabilities. If you search for a CVE Id using your favorite search engine, it is likely that the first result you get is a link to NVD’s page with information about that specific CVE. They take it upon themselves to educate the world about security issues. A job that certainly is needed but also one that puts a responsibility and requirement on them to be accurate. When they get things wrong they help distributing misinformation. Misinformation makes people potentially draw the wrong conclusions or act in wrong, incomplete or exaggerated ways.
Low or Medium severity issues
There are well-known, recognized and reputable Open Source projects who by policy never issue CVEs for security vulnerabilities they rank severity low or medium. (I will not identify such projects here because it is not the point of this post.)
Such a policy successfully avoids the risk that NVD will greatly inflate their issues since they can already only be high or critical. But is it helping the users and the ecosystem at large?
In the curl project we have a policy which makes us register a CVE for every single reported or self-detected problem that can have a security impact. Either at will or by mistake. This includes a fair amount of low and medium issues. The amount of low and medium issues as a total of all issues increases over time as we keep finding issues, but the really bad ones are less frequently reported.
As we have all data recorded and stored, we can visualize this development over time. Below is a graph showing the curl vulnerability and severity level trends since 2010.
Out of the 145 published curl security vulnerabilities so far, 28% have been rated severity high or critical while 104 of them were set low or medium by the curl security team.
I think this trend is easy to explain. It is because of two separate developments:
We as a project have matured and have learned over time how to test better, write code better to minimize risk and we have existed for a while to have a series of truly bad flaws already found (and fixed). We make less serious bugs these days.
Since 2010, lots of more people look for security problems and these days we are much better better at identifying problems as security related and we have better tools, while for a few years ago the same problem would just have become “a bug fix”.
Deciding severity
When a security problem is reported to curl, the curl security team and the reporter collaborate. First to make sure we understand the full width of the problem and its security impact. What can happen and what is required for that badness to trigger? Further, we assess what the likeliness that this can be done on purpose or by mistake and how common those situations and required configurations might be. We know curl, we know the code but we also often go back and double-check exactly what the documentation says and promises to better assess what users should be expected to know and do, and what is not expected from them etc. And we re-read the involved code again and again.
curl is currently a little over 160,000 lines of feature packed C code (excluding blank lines). It might not always be straight forward to a casual observer exactly how everything is glued together even if we try to also document internals to help you find deeper knowledge.
I think it is fair to say that it requires a certain amount of experience and time spent with the code to be able to fully understand a curl security issue and what impact it might have. I believe it is difficult or next to impossible for someone without knowledge about how it works to just casually read our security advisories and try to second-guess our assessments and instead make your own.
Yet this is exactly what NVD does. They don’t even ask us for help or for clarifications of anything. They think they can assess the severity of our problems without knowing curl nor fully understanding the reported issues.
A case to prove my point
In March 2023 we published a security advisory for the problem commonly referred to as CVE-2023-27536.
This is potentially a security problem, probably never hurts anyone and is in fact quite unlikely to ever cause a problem. But it might. So after deliberating we accepted it and ranked it severity low.
Bear with me here. I’ll spend two paragraphs revealing some details from the internal libcurl engine:
The problem is of a kind we have had several times in the past: curl has a connection pool and when a user makes a subsequent request which this particular option modified (compared to how it was when the previous connection was setup) it would wrongly reuse the first connection thinking they had the exact same properties.
The second would then accidentally get the wrong rights because it was setup differently. Still, the first connection would need the correct credentials and everything and so would the second one, it would just differ depending on what “GSSAPI delegation” that is allowed.
NVD ranks this
The person or team at NVD whose job it is to make up stuff for security vulnerabilities ranked this as CRITICAL 9.8. Almost as bad as it gets apparently. 10 is the max as you might recall.
When realizing this, at the end of May, I first fell off my chair in shock by this insanity, but after a quick recovery I emailed them (again) and complained (yet again) on setting this severity for *27536. I used the word “ridiculous” in my email to describe their actions. Why and who benefits from them scaremongering the world like this? It makes no sense. On the contrary, this is bad for everyone.
As a reaction to my complaint, someone at NVD went back and agreed to revise the CVSS string they had set and suddenly it was “only” ranked HIGH 7.2. I say “someone” because they never communicate with names and never sign the emails which whomever I talk to. They are just “NVD”.
I objected to their new CVSS string as well. It is just not a high severity security problem!
In my new argument I changed two particular details in the CVSS string (compared to the one they insisted was good) and presented arguments for that. For your pleasure, I include my exact wording below. (Some emphasis is added here for display purposes.)
How I motivated a downgrade
I could possibly live with: AV:N/AC:H/PR:H/UI:N/S:U/C:H/I:N/A:N (4.4) - even if that means Medium and we argue Low.
These are two changes and my motivations:
Attack complexity high - because how this requires that you actually have a working first communication and then do a second is slightly changed and you would expect the second to be different but in reality it accidentally reuse the first connection and therefore gives different/elevated rights.
It is a super-niche and almost impossible attack and there has been no report ever of anyone having suffered from this or even the existence of an application that actually would enable it to happen.
It is more likely to only happen by mistake by an application, but it also seems unlikely to ever be used by an application in a way that would trigger it since having the same user credentials with different values for GSS delegation and assume different access levels seems … weird.
This almost impossible chance of occurring is the primary reason we think this is a Low severity. With CVSS, it seems impossible to reach Low.
Privileges Required high - because the only way you can trigger this flaw is by having full privileges for the *same* user credentials that is later used again but with changed GSS API delegation set. While the previous connection is still live in the connection pool.
It would also only be an attack or a flaw if that second transfer actually assumes to have different access properties, which is probably debatable if users of the API would expect or not
CVSS still sucks
CVSS is a crap system so using this single-dimension number it seems next to impossible to actually get severity low report.
NVD wants “public sources”
NVD does not just take my word for how curl works. I mean, I only wrote a large chunk of it and am probably the single human that knows most about its internals and how it works. I also wrote the patch for this issue, I wrote the connection pool logic and I understand the problem exactly. Nope, just because I say so does not make it true.
My claims above about this issue can of course be verified by reading the publicly available source code and you can run tests to reproduce my claims. Not to mention that the functionality in question is documented.
But no.
They decided to agree to one of my proposed changes, which further downgraded the severity to MEDIUM 5.9. Quite far away from their initial stance. I think it is at least a partial victory.
For the second change to the CVSS string I requested, they demand that I provide more information for them. In their words:
There is no publicly available information about the CVE that clarifies your statement so we must request clarification from you and additionally have this detail added to the HackerOne report or some other public interface for transparency purposes prior to making changes to the CVSS vector.
… which just emphasizes exactly what I have stated already in this post. They set a severity on this without understanding the issue, with no knowledge of the feature that gets this wrong and without clues about what is actually necessary to trigger this flaw in the first place.
For people intimately familiar with curl internals, we actually don’t have to spell out all these facts with excruciating details. We know how the connection pool works, how the reuse of connections should work and what it means when curl gets it wrong. We have also had several other issues in this areas in the past. (It is a tricky area to get right.)
But it does not make this CVE more than a Low severity issue.
Conclusion
This issue is now stuck at this MEDIUM 5.9at NVD. Much less bad than where they started. Possibly Low or Medium does not make a huge difference out there in the world.
I think it is outrageous that I need to struggle and argue for such a big and renowned organization to do right. I can’t do this for every CVE we have reported because it takes serious time and energy, but at the same time I have zero expectation of them getting this right. I can only assume that they are equally lost and bad when assessing security problems in other projects as well.
A completely broken and worthless system. That people seem to actually use.
It is certainly tempting to join the projects that do not report Low or Medium issues at all. If we would stop doing that, at least NVD would not shout wolf and foolishly claim they are critical.
My response
That is a ridiculous request.
I'm stating *verifiable facts* about the flaw and how curl is vulnerable to it. The publicly available information this is based on is the actual source code which is openly available. You can also verify my claims by running code and checking what happens and then you'd see that my statements match what the code does.
The fact that you assess the severity of this (and other) CVE without understanding the basic facts of how it works and what the vulnerability is, just emphasizes how futile your work is: it does not work. If you do not even bother to figure these things out then of course you cannot set a sensible severity level or CVSS score. Now I understand your failures much better.
We in the curl project's security team already know how curl works, we understand this vulnerability and we set the severity accordingly. We don't need to restate known facts. curl functionality is well documented and its source code has always been open and public.
If you have questions after having read that, feel free to reach out to the curl security team and we can help you. You reach us at security@curl.se
I recommend that you (NVD) always talk to us before you set CVSS scores for curl issues so that we can help guide you through them. I think that could make the world a better place and it would certainly benefit a world of curl users who trust the info you provide.
/ Daniel
Several years ago when someone highlighted the fact for me that curl was credited in the ending sequence of the megahit game Grand Theft Auto V, I got a brief moment of acknowledgement from my kids that I might be doing cool stuff before they forgot and moved on.
Later I would find curl credits in more games and it started to become somewhat of a pattern. I collect screenshotted curl credits and awesome people help me point them out. Many of them are from games.
Finding out they use curl is rarely straight forward. They virtually never told us before hand. Many list the curl license somewhere, sometimes you can find the DLL and some actually include a mention in on-screen credits displays.
Fortnite uses libcurl
PUBG: Battlegrounds uses libcurl
ROBLOX uses libcurl
As this little subset shows, some of the most popular games in the world use libcurl. We rarely get to know exactly for what purpose they use libcurl but we are left to guessing and assumptions. Modern games do a fair amount of internet transfers and what better library to do that with?
Game consoles bundle it
libcurl is also shipped as an OS component in several game consoles.
Nintendo Switch
Valve Steam Deck
Sony Playstation 5
Microsoft Xbox 360
List of curl credits
libcurl (often mentioned as plain curl) has been frequently used in games for almost twenty years already. Doom 3, from 2004 uses libcurl, just as well as well as Diablo IV, released just days ago in June 2023.
Doom 3 from 2004 used libcurl
Diablo IV from 2023 uses libcurl
The site mobygames.com maintains a database of people getting credits in games. The entry for me, since my name is the one used in the curl license, right now lists me (curl really) credited in no less than 136 games – and then the two games listed immediately above here are not even mentioned there so there are reasons to suspect there are others missing as well. Also, as mentioned before, the fact that a game is using curl is sometimes a well hidden secret.
What for?
I have not been closely involved with any of the makers of these games. I don’t have insights or special knowledge about exactly what they use libcurl for. Whatever download or upload purposes they have I guess.
Why curl?
Because curl is Capable, Ubiquitous , Reliable and Libre. It knows how to transfer data in many different ways, is feature packed and it performs well. It runs everywhere so the API works on all platforms. It has proven itself stable and solid for decades without breaking APIs or ABIs. The being available cheap (at no purchase cost at least) is probably also a strong contributing factor combined with the others.
The curl test suite was born in November 2000. We wrote our own custom system, dedicated for us.
In May 2001 we changed the file format for individual tests and this is still today the format we use. During the Twenty-two years that have passed we have added some 1600 test cases to the collection and we make sure that they can run on virtually any platform and that each test case themselves specify what curl features they require to work so that builds with those features disabled can skip those tests.
Only a thorough test suite provides the necessary confidence you need to promise to users that we keep existing behaviors and yet we still can and do repeatedly rewrite, refactor and replace large chunks of the internals.
Synchronous in a single thread
In 2000 we all had single cores and single CPUs. We made the test suite run the tests one by one, in a serial fashion. Some are quick, some take a little longer. While CPUs certainly have grown significantly faster over the lifetime of curl, the amount of test cases have also grown.
Today, on my fast modern machine, running all test cases in the main test suite takes about 10 minutes. If we run them with valgrind enabled (it then invokes all curl related commands and functions with the valgrind tool to monitor that it doesn’t do any serious memory violations or leaks), the same process takes close to 30 minutes.
This might not sound terribly bad, but it also not unusual to run the tests on slower machines that spend two or maybe even five times longer to completion. If you want to run the tests on a few different build combinations to make sure they are all happy, you may need to rerun the set a number of times. It all adds up.
This is a rather ineffective use of time and available system resources. In researching and measuring the current state of curl testing, Dan Fandrich figured out that in a normal test round the CPU is idle 80% of the time! And that’s just one core.
Illustration from Dan’s “curl Parallel Testing Proposal”
Going parallel
In March 2023, Dan brought his curl Parallel Testing proposal (11 page PDF) to us, outlining an idea on how to convert the current single-threaded serial test runner into one that runs many separate worker processes and can run several test cases in parallel.
The general idea being that even on a single-core machine, running tests in parallel has the chance to speed up the process a lot. Because of that 80% number if nothing else.
Most (curl) developers of course also have machine with several or even many cores, making parallelism an even better idea.
We all loved the idea, gave Dan our thumbs up and arranged to fund his work on this improvement.
Port numbers
curl does Internet transfers, and for testing curl we have a set of test servers implemented that curl can talk to and get response back from. The specific tests control exactly how these servers respond and act for each test. To make sure that curl speaks the protocols correctly and consistently in both good and bad situations.
A challenge with this is that the test suite actually has to fire up and run actual networking servers on the local machine for this purpose. Each such server has to listen to a dedicated TCP or UDP port for as long as the tests are still going.
Luckily, we reworked the port number use for test servers recently. Using fixed port numbers for test servers was problematic already with single threaded tests because you could not run a separate test case in a different shell on the same machine etc. They would also sometimes collide with other random services running on developers’ machines.
Since August 2020 all test servers listen on random port numbers. A fundamental criteria for being able to run tests in parallel.
Landed
After a lot of hard work to refactor the test internals, it can now fire up N worker processes, where each such process can run its own set of test servers, then make sure the main scheduler hands out test cases to all of the workers and collects and outputs the test results from all of them. On June 5, Dan merged the commits to master that made it possible for all of us to start test (!) driving this.
First impressions
Dan recommends maybe 7 workers per core, but it might be a little bit limited to how much system memory you have since every such worker might end up running a fairly large amount of test servers. It also depends on if you run the tests with or without valgrind.
I ran a first simple test shot on my machine using 80 workers. A full valgrind enabled round with 1606 tests completed in 87 seconds. That is more than twenty times faster than previously.
Some further polish needed
There are still some issues left that make the parallel test setup a little shakier than the normal serial style, so we do not yet enable this by default for people. We will work on fixing those issues and iron out the last wrinkles so that we can soon get everyone onboard on this.
This is the second follow-up patch release in the 8.1.x series due to regressions and bugs that are too annoying to leave lingering around.
Release video
Numbers
the 219th release 0 changes 7 days (total: 9,202) 14 bug-fixes (total: 9,045) 22 commits (total: 30,429 0 new public libcurl function (total: 91) 0 new curl_easy_setopt() option (total: 302) 0 new curl command line option (total: 251) 13 contributors, 3 new (total: 2,888) 5 authors, 2 new (total: 1,150) 0 security fixes (total: 145)
Bugfixes
configure: quote the assignments for run-compiler
A regression introduced in the previous release made configure fail if the $CC shell variable was set to something else than just a single command name. This now quotes the variable correctly.
configure: without pkg-config and no custom path, use -lnghttp2
Installations without pkg-config where nghttp2 is installed in a default directory would get a link error in the build.
http2: fix EOF handling on uploads with auth negotiation
This was a regression when using HTTP/2 for doing multi-phase authentication methods with POST, like for example Digest.
http3: send EOF indicator early as possible
By better tracking the amount of upload data, curl can avoid a superfluous final zero-length DATA packet and instead send the EOF sooner.
libcurl.m4: remove trailing ‘dnl’ that causes this to break autoconf
The configure macro we ship for other projects to use to detect installed libcurl version now works better.
libssh: when keyboard-interactive auth fails, try password
When a SSH server allows multiple auth methods, and curl tried keyboard-interactive it would wrongly skip trying the password method – if built to use libssh. This bug has been present all since libssh support shipped.
For widely used, widely distributed open source project such as curl, we often have little to no relation at all with our users and therefore it is hard to get feedback and learn what works and what is less good.
Our best and primary way is thus simply to ask users every year how they use curl.
user survey
For the tenth consecutive year, we put together a survey and we ask everyone we know and can reach who ever used curl or library within the last year, to donate a few minutes of their precious time and give us their honest opinions.
The survey is anonymous but hosted by Google. We do not care who you are, but we want to know how you think curl works for you.
The survey will remain online for submissions during 14 days. From Thursday May 25 2023 until midnight (CEST) Wednseday June 7 2023. Please tell your friends about it!
user survey
Post survey analysis
At June 5 the painstaking work of analyzing the results and putting together a summary and presentation begins. It usually takes me a few weeks to complete. Once that is done, the results will be shared for the entire world to enjoy.
Then we see what the curl project should take home and do as a direct result of what users say. Updating procedures, writing documentation and adding features to the roadmap are among the things that can happen and has happened after previous surveys.
Only 6 days since the previous release we are again here with a curl release. It turned out 8.1.0 had some rather nasty regressions that we felt were urgent enough to warrant another round on the dance floor. So here goes curl 8.1.1. A bugfix release.
Release presentation
Numbers
the 218th release 0 changes 6 days (total: 9,195) 25 bug-fixes (total: 9,031) 40 commits (total: 30,407 0 new public libcurl function (total: 91) 0 new curl_easy_setopt() option (total: 302) 0 new curl command line option (total: 251) 19 contributors, 10 new (total: 2,885) 13 authors, 6 new (total: 1,148) 0 security fixes (total: 145)
Bugfixes
Some of the highlights of this release include…
cmake: avoid list(PREPEND)
This use of a too new cmake feature snuck itself into the build in the last release which caused trouble for people using older cmake versions.
cmake: repair cross compiling
A recently added cmake check did not have the correct precautions added for cross-compiling which broke such builds.
configure: generate a script to run the compiler
The configure script has an elaborate check that verifies provided if libraries can be used at run-time. This turned out complicated when the compiler itself uses libraries that configure checks for by setting the LD_LIBRARY_PATH since that path also affects the compiler!
http2: double http request parser max line length
The last word is probably not said about this logic, but capping the max request header line size to 4KB was too narrow and caused application breakages. Now the limit is at 8KB.
http2: increase stream window size to 10 MB
It turned out that even though we have a flexible HTTP/2 window concept, download performance could suffer and now we have bumped the window size again significantly.
http2: upload improvements
In particular doing uploads that are aborted prematurely by a reset when for example a 404 is returned before the entire upload was done could cause issues.
rename struct ‘http_req’ to ‘httpreq’
The development branch of FreeBSD (14) introduced a struct in one of the public headers that name-collided an internal struct libcurl uses. The bug exists in FreeBSD’s header, but we renamed ours anyway to work around the problem while the FreeBSD team fixes their end.
better error message when URLs fail to parse
Since we have a fairly elaborate identification of exactly what fails when the URL parser rejects a URL, this now helps users to better understand what curl does not like.
urlapi: allow numerical parts in the host name
The URL parser was far too strict in rejecting host names because they were “invalid IPv4” when in fact they should be treated as host names instead. Probably the worst regression added in 8.1.0. In fact, the URL parser basically cannot refuse a host name for not being a valid IPv4 since then it can get passed through to the name resolver which can then still find it in /etc/hosts etc.
We are back with the first release since that crazy March day when we did two releases on the same day. First 8.0.0 shipped that bumped the major version for the first time in decades. Then curl 8.0.1 followed just hours after, due to a serious mess-up in the factory lines.
Release video presentation
Numbers
the 217th release 3 changes 58 days (total: 9,189) 185 bug-fixes (total: 9,006) 322 commits (total: 30,367 0 new public libcurl function (total: 91) 0 new curl_easy_setopt() option (total: 302) 1 new curl command line option (total: 251) 64 contributors, 35 new (total: 2,875) 37 authors, 17 new (total: 1,142) 4 security fixes (total: 145)
Security
We disclose four new curl security vulnerabilities today, three of them at severity Low and one of them at Medium. This also means that 3,840 USD was awarded as bug bounties in this release cycle.
UAF in SSH sha256 fingerprint check
[CVE-2023-28319] libcurl offers a feature to verify an SSH server’s public key using a SHA 256 hash. When this check fails, libcurl would free the memory for the fingerprint before it returns an error message containing the (now freed) hash.
siglongjmp race condition
[CVE-2023-28320] libcurl provides several different backends for resolving host names, selected at build time. If it is built to use the synchronous resolver, it allows name resolves to time-out slow operations using alarm() and siglongjmp().
When doing this, libcurl used a global buffer that was not mutex protected and a multi-threaded application might therefore crash or otherwise misbehave.
IDN wildcard match
[CVE-2023-28321] curl supports matching of wildcard patterns when listed as “Subject Alternative Name” in TLS server certificates. curl can be built to use its own name matching function for TLS rather than one provided by a TLS library. This private wildcard matching function would match IDN (International Domain Name) hosts incorrectly and could as a result accept patterns that otherwise should mismatch.
more POST-after-PUT confusion
[CVE-2023-28322] When doing HTTP(S) transfers, libcurl might erroneously use the read callback (CURLOPT_READFUNCTION) to ask for data to send, even when the CURLOPT_POSTFIELDS option has been set, if the same handle previously was used to issue a PUT request which used that callback.
This flaw may surprise the application and cause it to misbehave and either send off the wrong data or use memory after free or similar in the second transfer.
Changes
This release has only three real changes. One bigger and two smaller:
HTTP/2 over proxy
libcurl can now negotiate and use HTTP/2 when it is told to use a HTTPS proxy (details in the CURLOPT_PROXYTYPE man page), and the command line tool can of course switch it on using the --proxy-http2 option. Explained more in this blog post.
refuse to resolve the .onion TLD
When a host name ending with .onion is passed on to the name resolver functions, they will cause an error and will not be resolved. Like RFC 7686 tells us.
The official counter says we did more than 180 bugfixes in his release cycle. Here follows some of my favorites:
checksrc fixes
We made it better at checking the code style for three distinct code situations – and then updated the source code accordingly.
cmake fixes
bring in the network library on Haiku
do not add zlib headers for OpenSSL
make config version 8 compatible with 7
set SONAME for SunOS too
only do transfer-encoding compression if asked to
Transfer encodings other than “chunked” are rarely used. Up until now libcurl would still activate automatic decompression if such was used, even if it was not asked for by the application.
bring back support for SFTP path ending in /~
A regression made a URL that ends with /~ no longer make a directory listing because the URL does not end with a slash. Now we bring back that behavior, even if it goes a little against the standard behavior.
never allocate dynbufs larger than “too big”
The general dynamic buffer system no longer allocates more memory than what the specific buffer is allowed to grow to. An optimization.
various gskit compile errors in OS400
Makes curl build fine there again.
enforce a maximum DNS cache size independent of timeout value
The DNS cache entries are purged on age only (default 60 seconds). With this new code, libcurl limits caps the maximum total amount of DNS cache entries to 30,000.
remove PENDING + MSGSENT handles from the main linked list
Not yet activated transfers and the transfers that are already completed, are now moved away off the main linked list. For performance.
runtests: prepare for parallel
Lots of cleanups and smaller fixes have been merged during this cycle in preparation for the pending introduction of parallel tests.
verify socketpair with a random value
The custom socketpair implementation used for platforms without a native one, was changed to use truly random values when verifying that the pipe works.
Fix ‘Location:’ formatting for early VTE terminals
The special terminal highlighting of the URL that is shown in the Location: header is now disabled for some terminals that can’t display it properly.
urlapi polish
Several different bugs and improvements were made. Including:
cleanups and performance improvements
detect and error on illegal IPv4 addresses
prevent setting invalid schemes
URL encoding for the URL missed the fragment
enhanced WebSocket en-/decoding
Parts of the websocket parser code was rewritten to fix bugs.
This post is slightly delayed just because I happened to be traveling when I realized we had climbed above this mark, so it took me an extra day to get an appropriate photograph made. The kind of photograph a moment like this calls for. Unfortunately without a beer this time.
Daniel celebrating
Steady growth
The star growth rate for curl seems to have been fairly linear over many years now.
Meaningless
Yes GitHub stars is a totally meaning less metric that does not say anything real about a project like curl. It does not reflect usage, it does not reflect popularity and it does not reflect the number of installs out in the real world. It’s just a vanity number.
It started as just a test to see if I could use the existing advisory data we have for all curl CVEs to date and provide that as JSON. Maybe, I thought, if we provide it good enough it can be used to populate other databases automatically or even get queried easier by tools.
Information
In the curl project we have published 141 vulnerability advisories so far, each with its own registered CVE Id. We make an effort to provide all the details about all the flaws as good and thorough as possible, but also with easy overviews and tables etc so that users can quickly detect for example which curl versions are vulnerable to which flaws etc. It is our going the extra meticulous mile that makes it extra annoying when others override what we conclude.
More machine friendly
In a recent push I decided that all the info we have and provide could and should be offered in a more machine friendly format for whoever wants it.
After a first few test shots, fellow curl team mates pointed out to me that there is an existing effort called the Open Source Vulnerability format, an openly developed JSON schema designed for pretty much exactly what I was set out to do. I agreed that it made sense to follow something existing rather to make up my own format.
Can be improved
Of course we ran into some minor snags with this schema and there are still details in it that I think can be improved and we are discussing with the OSV team to see if there is merit to our ideas or not. Still, even without any changes we can now offer our data using this established format.
The two primary things I want to improve is how we provide project identification (whose issue is this) and how we convey the severity level of the issue.
Different sets
As of now, we offer a set of different ways to get the CVE data as JSON.
1. Everything all at once
On the fixed URL https://curl.se/docs/vuln.json, we provide a JSON array holding a number of JSON objects. One object per existing CVE. Right now, this is 349KB of data. (If you ask for it compressed it will be smaller!)
This URL will always contain the entire set and it will automatically update in the future as we published new CVEs. It also automatically updates when we correct or change any of the previously published advisories.
2. Single object per issue
If you prefer to get the exact metadata for a single specific curl CVE, you can get the JSON for an issue by replacing the .html extension to .json for any CVE documented on the website. You will also find a menu option the “related box” for each documented CVE that links to it.
On the curl website we already offer a way for users to get a list of all known vulnerabilities a certain specific release is known to be vulnerable to. Of course always updated with the latest publications.
For example, curl version 7.87.0 has all its vulnerability information detailed on https://curl.se/docs/vuln-7.87.0.html. When I write this, there are eight known vulnerabilities for this version.
Screenshot of the website displaying vulnerability information for curl 7.87.0
Again, either by clicking the JSON link there on the page under the table, or simply by replacing html by json in the URL, the user can get a listing of all the CVEs this version is vulnerable to, as a JSON array with a number of JSON objects inside. In this case, eight objects as of now.
While I expect the format might still change a little bit going forward, and not all issues have all the metadata provided just yet (for example, the git commit ranges are still lacking on a number of issues from before 2017), here is an illustration screenshot of jq displaying the JSON object for CVE-2022-27780.
Object details
The database_specific object near the top is metadata that we have and believe belongs with the issue but that has no defined established field in the JSON schema. Since I think the data still adds value to users, I decided to put them into this section that is designed and meant exactly for this kind of extensions.
You can see that we set an “id” that is the CVE ID with a CURL- prefix. This is just us catering to the conditions of OSV and the JSON schema. We apparently need our own ID and provide the actual CVE ID as an alias, so we “fake” this by simply prepending curl to the CVE ID. We don’t use any private ID when we work with vulnerabilities: we only deal with public issues and we only deal with issues that are CVE worthy so it seems unnecessary to involve anything else.