Category Archives: cURL and libcurl

curl and/or libcurl related

My work on tool vs library

February 10, 2022 Daniel Stenberg

I’m the lead developer in the curl project. We make the command line tool curl and the library libcurl, for doing Internet transfers. The command line tool uses the library for all the internet transfer heavy lifting.

The command line tool is somewhat of a shell binding to access libcurl.

We make these things. We recently surpassed 1,000 authors. I lead the project and I have done the most number of commits per month in curl for the last 79 months, and in fact in 222 of the 267 months we have stats for.

The tool came first

curl was born in 1998 as a command line tool only. Two years in, we (well, mostly me actually) remodeled some internals and shipped the first libcurl to the world in August 2000. The idea was (and still is) to provide the same Internet transfer powers the tool has to any application or device out there that need it.

Code sizes

The library side of things is right now about 85% of the total product code. A little over 120,000 lines right now, including comments.

Pie chart showing code distribution between tool and library

Users at scale

Many users think of the curl project as equivalent to the curl tool and the command line tool certainly has a lot of users. It is available for and on all the popular platforms. It is impossible to count curl command users, but millions should not be an exaggeration.

While it seems likely that more users are using the tool than is writing applications that use libcurl, each product, service or device that uses libcurl can themselves scale up the volumes. A single libcurl user can use libcurl in an application used by billions. A few hundreds or thousands of libcurl users populate the world with things transferring data with our library. (A noticeable share of the current Internet traffic is likely driven by libcurl.)

The net result is therefore that libcurl runs in several thousand times more installations than the tool.

If we visualize the number of curl users as a yellow ball and the libcurl installations as a green ball, putting them next to each other would look something like this:

Planet curl is barely visible next to planet libcurl. Install and user numbers are **estimated**.

(Image math: 3 million curl users, 10 billion libcurl installations. Yellow sphere radius is 90 vs the green’s 1340)

Complexity

Making a command line tool is much easier than doing a library. The command line tool has just one entry point and the interface is limited. A command line and associated files and pipes to read from. In our case, the tool lets libcurl deal with a lot of platform specifics which makes the command line code generic and mostly identical on all platforms it runs on.

A library has many more entry points, each that needs to be written to care for what the users might pass in to them. libcurl has code to work with millions of build combinations, and (right now) up to 35 different external libraries. (13 different TLS libraries, 3 different SSH libraries, 2 different QUIC/HTTP/3 libraries, 3 different compression libraries etc.)

libcurl is used in many more challenging environments such as niche operating systems, scaled up to thousands of concurrent transfers, systems that never exits and in builds with a creative set of features disabled at build-time.

Compared to the curl tool, libcurl is way more advanced. A bug in the command line tool is often much easier to fix than one in the library. Such bugs are also rarer since it is much simpler and a smaller amount of code.

Where it matters

Because of what I have outlined above, I focus my curl work on library related changes. Both when it comes to bug-fixes and adding features. It scales better to the world, and as one of the designers and architects of most solutions used in libcurl I am in many cases a suitable engineer to work on many of the more complicated problems.

It is smarter for the entire project for me to leave the slightly less complicated problems and more easily understood features to be fixed and added by others. It scales better. After all, I have a finite number of hours per week to spend on curl. I want to make the best use of my time as I can.

Therefore, I tend to leave tool-related things for others to work on.

Where it pays

I work on curl for a living. The companies that pay for support have higher priority than almost any other bug reports, and they tend to be libcurl related. Keeping my paying customers happy is crucial to me. And in a funny way also to others, since that work usually end up benefiting all curl users.

Where its fun

As I work on curl full-time during my workdays and also during a good chunk of my spare time, I need to “lighten up” my work at times and get some variation. Sometimes I can go about and find new little things to work on in the project that maybe isn’t top priority by any means, but are things that could use some polish and are different enough from what I spent the rest of my week on. To give me variation. To keep the fun.

Also, sometimes scratching the surface on a new somewhat forgotten place brings up more important stuff.

Theory meets practice

Years ago I even for a while considered to hand over maintenance of the curl tool to someone else and more distinctly say that I would only work on the library as they are separate entities and could possibly benefit from being worked on independently from each other.

My idea of focusing my work on the more complicated issues, to work on design and architecture and help newcomers find their way into the code doesn’t always work out.

I never gave up maintenance of the tool and a lot of things that someone else could implement or fix in the project aren’t, which makes me eventually get to work on that too anyway. For the good of the project. Also, it makes my work day more varied and fun if I take occasional strolls around the project every now and then and put on some new paint on areas I find could use some.

Summary

Most of my work efforts go into libcurl matters. But I work on the tool too.

cURL and libcurl

curl dash-dash-json

February 2, 2022 Daniel Stenberg 3 Comments

The curl “cockpit” is yet again extended with a new command line option: --json. The 245th command line option.

curl is a generic transfer tool for sending and receiving data across networks and it is completely agnostic as to what it transfers or even why.

To allow users to craft all sorts of transfers, or requests if you will, it offers a wide range of command line options. This flexibility has made it possible for a large number of users to keep using curl even as network ecosystems and its (HTTP) use have changed over time.

Craft content to send

curl offers a few convenience options for creating contents to send in HTTP “uploads” . Options such as -F for building multi-part formposts, and --data-urlencode for URL-encoding POST data.

JSON is the new popular kid

When curl was born, JSON didn’t exist. Over the decades since, JSON has grown to become a very commonly used format for structured data, often used in “REST API” calls and more. curl command lines sending and receiving JSON are very common now.

When asking users today what features they would like to see added in curl, and when asking them what good features they see in curl alternatives, a huge chunk of them mention JSON support in one way or another.

Partly because dealing with JSON with curl sometimes make the involved command lines rather long and quirky.

JSON in curl

The discussion has been ignited in the curl community about what, if anything, we should do in curl to make it a smoother and better tool when working with JSON. The offered opinions range from nothing (“curl is content agnostic”) to full-fledged JSON generator, parser and pretty-printer (or a combination in between). I don’t think we are ready to put our collective foot down on where we should go with this.

Introducing `--json`

While the discussion is ongoing, we still bring a first JSON oriented feature to the tool as a first step. The --json option. This is a new option that basically works as an alias, or shortcut, to sending JSON to an endpoint. You use it like this:

curl --json '{"tool": "curl"}' https://example.com/

Send JSON from a file:

curl --json @json.txt https://example.com/

Or send JSON passed to curl on stdin:

echo '{"a":"b"}' | curl --json @- https://example.com/

This change does not affect the libcurl library at all, it is only done the tool side of things.

Details

The --json option is the equivalent of setting

--data [arg]
--header "Content-Type: application/json"
--header "Accept: application/json"

If you want another Content-Type or Accept header, then you can override them as usual with -H. If you use multiple --json options on the same command line, the contents from them will be concatenated before sent off.

Ships!

This new json option was merged in this commit and will be available and present in curl 7.82.0, to be released in early March 2022. Of course you can build it from git or a daily snapshot already now to test it out!

Future JSON stuff

I want to at least experiment a little with some level of JSON creation for curl. To help users who want to create basic JSON on the command line to send off. Lots of users run into problems with this since JSON uses double quotes a lot and then the combination of quoting and unquoting and the shell often make users confused or even frustrated.

Exactly how or to what level is still to be determined. Your opinions and feedback are very valuable and helpful.

jo+curl+jq

This trinity of commands helps you send correctly formatted data and show the returned JSON easily. Like in a command line similar to:

jo name=daniel tool=curl | curl --json @- https://httpbin.org/post | jq

jo, jq

cURL and libcurl

curl with rust

February 1, 2022 Daniel Stenberg

I did an online presentation with this name for the Rust Linz meetup, on January 27 2022. This is the recording:

The individual slides are also available.

Content quickly

In this presentation I talk about how libcurl’s most important aspect is the stable ABI and API. If we just maintain those, we can change the internals however we like.

libcurl has a system with build-time selected components, called backends. They are usually powered by third party libraries. With the recently added HTTP backend there are now seven “flavors” of backends.

A backend can be provided by a library written in rust, as long as that rust component provides a C interface that libcurl can use. When it does, it being rust or not is completely transparent and a non-issue for libcurl.

curl currently supports components written in rust for three different backends:

HTTP – by hyper
TLS – by rustls
HTTP/3 – by quiche

None of these backends are yet “feature complete”, but we are moving slowly towards that. Your help is appreciated!

cURL and libcurl

1,000 commit authors

January 30, 2022 Daniel Stenberg 2 Comments

In the curl project we switched to using git for source code control in March 2010. Since then we have exact tracking of every commit author to the project. In the times before git, we used CVS which doesn’t properly separate committers from authors.

The number of commit authors in curl has gradually been increasing over time.

1998-03-20: 1 author recorded in the premiere curl release. It actually took until May 30th 2001 for the second recorded committer (partly of course because we used CVS). 1167 days to bump up the number one notch.

2014-05-06: 250 authors took 5891 days to reach.

2017-08-18: 500 authors required additional 1200 days

2019-12-20: 750 authors was reached after only 854 more days

2022-01:30: reaching 1000 authors took an additional 772 days.

Jan-Piet Mens became the thousandth author by providing pull-request #8354.

These 1,000 authors have done in total 28,163 commits. This equals 28.2 commits on average while the median commit author only did a single one – more than 60% of the authors are one-time committers after all! Only fourteen persons in the project have authored more than one hundred commits.

It took us 8,717 days to reach 1,000 authors, making it one new committer in the project per 8.7 days on average, over the project’s life time so far. The latest 250 authors have “arrived” at a rate of one new author per 3.1 days.

In 2021 we had more commit authors than ever before and also more first-timers than ever. It will certainly be a challenge to reach that same level or maybe even break the record again in 2022.

It took almost 24 years to reach 1,000 authors. How long until we double this?

Visualized on video

To celebrate this occasion, I did a fresh gource video of curl development so far; for the duration of the first 1,000 committers. I spiced it up by adding 249 annotations of source related facts laid out in time.

Credits: the music is from bensound.com.

GitHub can’t count

If you look at the curl/curl repository on GitHub, it appears to be short of a few hundred committers. It says 792 right now.

This is because GitHub can’t count commit authors. The theory is that they somehow ignore committers that do not have GitHub accounts. My author count is instead based on plain git access and standard git commands run against a plain git clone:

$ git shortlog -s | wc -l
1000

cURL and libcurl, Open Source

LogJ4 Security Inquiry – Response Required

January 24, 2022 Daniel Stenberg 16 Comments

On Friday January 21, 2022 I received this email. I tweeted about it and it took off like crazy.

The email comes from a fortune-500 multi-billion dollar company that apparently might be using a product that contains my code, or maybe they have customers who do. Who knows?

My guess is that they do this for some compliance reasons and they “forgot” that their open source components are not automatically provided by “partners” they can just demand this information from.

I answered the email very briefly and said I will be happy to answer with details as soon as we have a support contract signed.

I think maybe this serves as a good example of the open source pyramid and users in the upper layers not at all thinking of how the lower layers are maintained. Building a house without a care about the ground the house stands on.

In my tweet and here in my blog post I redact the name of the company. I most probably have the right to tell you who they are, but I still prefer to not. (Especially if I manage to land a profitable business contract with them.) I suspect we can find this level of entitlement in many companies.

The level of ignorance and incompetence shown in this single email is mind-boggling.

While they don’t even specifically say which product they are using, no code I’ve ever been involved with or have my copyright use log4j and any rookie or better engineer could easily verify that.

In the picture version of the email I padded the name fields to better anonymize the sender, and in the text below I replaced them with NNNN.

(And yes, it is very curious that they send queries about log4j now, seemingly very late.)

Continue down for the reply.

The email

Dear Haxx Team Partner,

You are receiving this message because NNNN uses a product you developed. We request you review and respond within 24 hours of receiving this email. If you are not the right person, please forward this message to the appropriate contact.

As you may already be aware, a newly discovered zero-day vulnerability is currently impacting Java logging library Apache Log4j globally, potentially allowing attackers to gain full control of affected servers.

The security and protection of our customers' confidential information is our top priority. As a key partner in serving our customers, we need to understand your risk and mitigation plans for this vulnerability.

Please respond to the following questions using the template provided below.

1. If you utilize a Java logging library for any of your application, what Log4j versions are running?

2. Have there been any confirmed security incidents to your company?

3. If yes, what applications, products, services, and associated versions are impacted?

4. Were any NNNN product and services impacted?

5. Has NNNN non-public or personal information been affected?

6. If yes, please provide details of affected information NNNN immediately.

7. What is the timeline (MM/DD/YY) for completing remediation? List the NNNN steps, including dates for each.

8. What action is required from NNNN to complete this remediation?

In an effort to maintain the integrity of this inquiry, we request that you do not share information relating to NNNN outside of your company and to keep this request to pertinent personnel only.

Thank you in advance for your prompt attention to this inquiry and your partnership!

Sincerely,

NNNN Information Security

The information contained in this message may be CONFIDENTIAL and is for the intended addressee only. Any unauthorized use, dissemination of the information, or copying of this message is prohibited. If you are not the intended addressee, please notify the sender immediately and delete this message.

Their reply

On January 24th I received this response, from the same address and it quotes my reply so I know they got it fine.

Hi David,

Thank you for your reply. Are you saying that we are not a customer of your organization?

/ [a first name]

My second reply

I replied again (22:29 CET on Jan 24) to this mail that identified me as “David”. Now there’s this great story about a David and some giant so I couldn’t help myself…

Hi Goliath,

No, you have no established contract with me or anyone else at Haxx whom you addressed this email to, asking for a lot of information. You are not our customer, we are not your customer. Also, you didn't detail what product it was regarding.

So, we can either establish such a relationship or you are free to search for answers to your questions yourself.

I can only presume that you got our email address and contact information into your systems because we produce a lot of open source software that are used widely.

Best wishes,
Daniel

The image version of the initial email

The company

Update on February 9: The email came from MetLife.

Discussed

On Hackernews and Reddit

cURL and libcurl

Don’t mix URL parsers

January 10, 2022 Daniel Stenberg 1 Comment

I have had my share of adventures with URL parsers and their differences in the past. The current state of my research on the topic of (failed) URL interoperability remains available in this GitHub document.

Use one and only one

There is still no common or standard URL syntax format in sight. A string that you think looks like a URL passed to one URL parser might be considered fine, but passed to a second parser it might be rejected or get interpreted differently. I believe the state of URLs in the wild has never before been this poor.

The problem

If you parse a URL with parser A and make conclusions about the URL based on that, and then pass the exact same URL to parser B and it draws different conclusions and properties from that, it opens up not only for strange behaviors but in some cases for downright security vulnerabilities.

This is easily done when you for example use two different libraries, frameworks or libraries that need to work on that URL, but the repercussions are not always easy to see at once.

A well-known presentation on this topic from 2017 is Orange Tsai’s A New Era Of SSRF – Exploiting Url Parsers.

URL Parsing Confusion

The report EXPLOITING URL PARSERS: THE GOOD, BAD, AND INCONSISTENT (by Noam Moshe, Sharon Brizinov, Raul Onitza-Klugman and Kirill Efimov) was published today and I have had the privilege to have read and worked with the authors a little on this prior to its release.

As you see in the report, it shows that problems very similar to those mr Tsai reported and exploited back in 2017 are still present today, although perhaps in slightly different ways.

As the report shows, the problem is not only that there are different URL standards and that every implementation provides a parser that is somewhere in between both specs, but on top of that, several implementations often do not even follow the existing conflicting specifications!

The report authors also found and reported a bug in curl’s URL parser (involving percent encoded octets in host names) which I’ve subsequently fixed so if you use the latest curl that one isn’t present anymore.

curl’s URL API

In the curl project we attempt to help applications and authors to reduce the number of needed URL parsers in any given situation – to a large part as a reaction to the Tsai presentation from 2017 – with the URL API we introduced for libcurl in 2018.

Thanks to this URL parser API, if you are already using libcurl for transfers, it is easy to also parse and treat URLs elsewhere exactly the same way libcurl does. By sticking to the same parser, there is a significantly smaller risk that repeated parsing bring surprises.

Other work-arounds

If your application uses different languages or frameworks, another work-around to lower the risk that URL parsing differences will hurt you, is to use a single parser to extract the URL components you need in one place and then work on the individual components from that point on. Instead of passing around the full URL to get parsed multiple times, you can pass around the already separated URL parts.

Future

I am not aware of any present ongoing work on consolidating the URL specifications. I am not even aware of anyone particularly interested in working on it. It is an infected area, and I will get my share of blow-back again now by writing my own view of the state.

The WHATWG probably say they would like to be the steward of this and they are generally keen on working with URLs from a browser standpoint. It limits them to a small number of protocol schemes and from my experience, getting them to interested in changing something for the the sake of aligning with RFC 3986 parsers is hard. This is however the team that more than any other have moved furthest away from the standard we once had established. There are also strong anti-IETF sentiments oozing there. The WHATWG spec is a “living specification” which means it continues to change and drift away over time.

The IETF published RFC 3986 back in 2005, they saw the RFC 3987 pretty much fail and then more or less gave up on URLs. I know there are people and working groups there who would like to see URLs get brought back to the agenda (as I’ve talked to a few of them over the years) and many IETFers think that the IETF is the only group that can do it proper, but due to the unavoidable politics and the almost certain collision course against (and cooperation problems with) WHATWG, it is considered a very hot potato that barely anyone wants to hold. There are also strong anti-WHATWG feelings in some areas of the IETF. There is just a too small of a chance of a successful outcome from something that mostly likely will take a lot of effort, will, thick skin and backing from several very big companies.

We are stuck here. I foresee yet another report to be written a few years down the line that shows more and new URL problems.

My URL isn’t your URL.

cURL and libcurl

curl 7.81.0 – more percent

January 5, 2022 Daniel Stenberg

There has been eight weeks since 7.80.0.

Release presentation

Numbers

the 205th release
1 change
56 days (total: 8,636)
121 bug-fixes (total: 7,518)
189 commits (total: 28,055)
0 new public libcurl function (total: 86)
1 new curl_easy_setopt() option (total: 295)
1 new curl command line option (total: 244)
53 contributors, 25 new (total: 2,558)
32 authors, 14 new (total: 990)
0 security fixes (total: 111)
0 USD paid in Bug Bounties (total: 16,900 USD)

Security

Today we celebrate our fourth consecutive release without any new vulnerability to fix and reveal.

Change

This release comes with just one change to note, but one that brings both a new libcurl setopt (CURLOPT_MIME_OPTIONS) and a new command line option (--form-escape). Starting now, libcurl defaults to percent encoding certain fields when doing multi-part HTTP formposts.

Bug-fixes

As usual, here’s a set of selected favorite bug-fixes of mine from this cycle:

require “see also” for every documented option in curl.1

When the curl command man page is generated at build time, the script now makes sure that there is a “see also” for each option. This will help users find related info. More mandatory information for each option makes us do better documentation that ultimately helps users.

lazy-alloc the table in Curl_hash_add()

The internal hash functions moved the allocation of the actual hash table from the init() function to when the first add() is called to add something to the table. This delay simplified code (when the init function became infallible ) and does even avoid a few allocs in many cases.

enable haproxy support for hyper backend

Plus a range of code and test cases adjusted to make curl built with hyper run better. There are now less than 30 test cases still disabled for hyper. We are closing in!

mbedTLS: add support for CURLOPT_CAINFO_BLOB

Users of this backend can now also use this feature that allows applications to provide a CA cert store in-memory instead of using an external file.

multi: handle errors returned from socket/timer callbacks

It was found out that the two multi interface callbacks didn’t at all treat errors being returned the way they were documented to do. They are now, and the documentation was also expanded to clarify.

nss:set_cipher don’t clobber the cipher list

Applications that uses libcurl built to use NSS found out that if they would select cipher, they would also effectively prevent connections from being reused due to this bug.

openldap: implement STARTTLS

curl can now switch LDAP transfers into LDAPS using the STARTTLS command much like how it already works for the email protocols. This ability is so far limited to LDAP powered by OpenLDAP.

openssl: define HAVE_OPENSSL_VERSION for OpenSSL 1.1.0+

This little mistake made libcurl use the wrong method to extract and show the OpenSSL version at run-time, which most notably would make libcurl say the wrong version for OpenSSL 3.0.1, which would rather show up as the non-existing version 3.0.0a.

sha256/md5: return errors when init fails

A few internal functions would simply ignore errors from these hashing functions instead of properly passing them back to the caller, making them to rather generate the wrong hash instead of properly and correctly returning an error etc.

curl: updated search for a file in the homedir

The curl tool now searches for personal config files in a slightly improved manner, to among other things make it find the same .known_hosts file on Windows as the Microsoft provided ssh client does.

url: check ssl_config when re-use proxy connection

A bug in the logic for checking connections in the connection pool suitable for reuse caused flaws when doing subsequent HTTPS transfers to servers over the same HTTPS proxy.

ngtcp2: verify server certificate

When doing HTTP/3 transfers, libcurl is now doing proper server certificate verification for the QUIC connection – when the ngtcp2 backend is used. The quiche backend is still not doing this, but really should.

urlapi: accept port number zero

Years ago I wrote a blog post about using port zero in URLs to do transfers. Then it turned out port zero did not work like that with curl anymore so work was done and now order is restored again and port number zero is once again fine to use for curl.

urlapi: provide more detailed return codes

There are a whole range of new error codes introduced that help better identify and pinpoint what the problem is when a URL or a part of a URL cannot be parsed or will not be accepted. Instead of the generic “failed to parse URL”, this can now better tell the user what part of the URL that was found out to be bad.

socks5h: use appropriate ATYP for numerical IP address

curl supports using SOCKS5 proxies and asking the proxy to resolve the host name, what we call socks5h. When using this protocol and using a numerical IP address in the URL, curl would use the SOCKS protocol slightly wrong and pass on the wrong “ATYP” parameter which a strict proxy might reject. Fixed now.

Coming up?

The curl factory never stops. There are many pull-requests already filed and in the pipeline of possibly getting merged. There will also, without any doubts, be more ones coming up that none of us have yet thought about or considered. Existing pending topics might include:

the ManageSieve protocol
--no-clobber
CURLMOPT_STREAM_WINDOW_SIZE
Remove Mesalink support
HAproxy protocol v2
WebSockets
Export/import SSL session-IDs
HTTP/3 fixes
more hyper improvements
CURLFOLLOW_NO_CUSTOMMETHOD

Next release

March 2, 2022 is the scheduled date for what will most probably become curl 7.82.0.

cURL and libcurl

The curl year 2021

December 21, 2021 Daniel Stenberg

Every year is a curl year.

I’m saving my bigger summary for curl’s 24th birthday in March, but when reaching the end of a calendar year it feels natural and even fun to look back and highlight some of the things we accomplished and what happened during this particular lap around the sun. I decided to pick five areas to highlight.

This has been another great curl year and it has been my pleasure to serve this project working full time with it, and I intend to keep doing it next year as well.

Activities, contribution and usage have all grown. I don’t think there has ever before been a more curl year than 2021.

Contributions

In 2021, the curl project beats all previous project records in terms of contribution. More than 180 individuals authored commits to the source code repository, out of more than 130 persons were first-time committers. Both numbers larger than ever before.

The number of authors per month was also higher than ever before and we end the year with a monthly average of 25 authors.

The number of committers who authored ten or more commits within a single year lands on 15 this year. A new record, up from the previous 13 in 2014 and 2017.

We end this year with the amazing number of more than 2,550 persons listed as contributors. We are also very close to reaching 1,000 committers. We are just a dozen authors away. Learn how to help us!

I personally have done about 60% of all commits to curl in 2021 and I was awarded a GitHub star earlier this year. I was a guest on eight podcast episodes this year, talking curl at least partly in all of them.

New backends

This year we introduced support for two new backends in curl: hyper and rustls. I suppose it is a sign of the times that both of them are written in Rust and could be a stepping stone into a future with more curl components written in memory safe languages.

We actually got an increase in number of CVEs reported in 2021, 13 separate ones, after previously having had a decreasing trend the last few years. To remind us that security is still crucial!

Technically we merged the first hyper code already in late 2020 but we’ve worked on it through 2021 and this has made it work almost on par with the native code now.

None of these two new backends are yet used or exercised widely yet in curl, but we are moving in that direction. Slowly but surely.

Also backend related, during 2021 we removed the default TLS library choice when building curl and instead push that decision to get made by the person building curl. It refuses to build unless a choice is made.

The backends in curl provides build-time “pluggable” functionality.

Everything curl

In September 2015 I started to write Everything curl. The book to cover all there is to know and learn about curl. The project, the command line tool and the library.

When I started out, I wrote a lot of titles and sub-titles that I figured should be covered and detailed. For those that I didn’t yet have any text written I just wrote TBD. Over time I thought of more titles so I added more TBDs all over – and I created myself a script that would list which files that had the most number of TBDs outstanding. I added more and more text and explanations over time, but the more content I added I often thought of even more things that were still missing.

It took until December 15, 2021 to erase the final TBD occurrence! Six years and three months.

Presently, everything curl consists of more than 81,000 words in 12,000 lines of text. Done using more than 1,000 commits.

There are and probably always will be details missing and text that can be improved and clarified, but all the sections I once thought out should be there now at least are present and covered! I trust that users will tell us what we miss, and as we continue to grow and develop curl there will of course pop up new things to add to the book.

Death threat

In February 2021 I received a death threat by email. It is curl related because I was targeted entirely because my name is in the curl copyright statement and license and that is (likely) how the person found and contacted me. Months later, the person who sent me the threat apologized for his behavior.

It was something of a brutal awakening for me that reminded me with far too much clarity than I needed, that everything isn’t always just fun and games when people find my email address in their systems.

I filed a police report. I had a long talk with my wife. It shook my world there for a moment and it hinted of the abyss of darkness that lurk out there. I cannot say that it particularly changed my life or how I go about with curl development since then, but I think maybe it took away some of the rosy innocence out of the weird emails I get.

Mars

Not only did we finally get confirmation this year that curl is used in space – we learned that curl was used in the Mars 2020 Helicopter Mission! Quite possibly one of the coolest feats an open source project can pride itself with.

GitHub worked with NASA and have given all contributors to participating projects with a GitHub account a little badge on their profile. Shown here on the right. I think this fact alone might have helped attract more contributors this year. Getting your code into curl gets your contributions to places few other projects go.

There’s no info anywhere as to what function and purpose curl had exactly in the project and we probably will never know, but I think we can live with that. Now we are aiming for more planets.

cURL and libcurl, Security

Keeping curl safe

December 13, 2021 Daniel Stenberg 2 Comments

I’ve talked on this topic before but I realized I never did a proper blog post on the topic. So here it is: how we develop curl to keep it safe. The topic of supply chain security is one that is discussed frequently these days and every so often there’s a very well used (open source) component that gets a terrible weakness revealed.

Don’t get me wrong. Proprietary packages have their share of issues as well, and probably even more so, but for obvious reasons we never get the same transparency, details and insight into those problems and solutions.

curl

curl, in the shape of libcurl primarily, is one of the world’s most commonly used software components. It is installed in somewhere around ten billion installations world wide. It might even be forty billion. Nobody knows.

If we would find a critical vulnerability in curl, it could potentially exist in every internet-connected device on the globe. We don’t want that.

A critical security flaw in our products would be bad, but we also similarly need to make sure that we provide APIs and help users of our products to be safe and to use curl safely. To make sure users of libcurl don’t accidentally end up getting security problems, to the best of our ability.

In the curl project, we work hard to never have our own version of a “heartbleed moment“. How do we do this?

Always improving

Our method is not strange, weird or innovative. We simply apply all best practices, tools and methods that are available to us. In all areas. As we go along, we tighten the screws and improve our procedures, learning from past mistakes.

There are no short cuts or silver bullets. Just hard work and running tools.

Not a coincidence

Getting safe and secure code into your product is not something that happens by chance. We need to work on it and we need to make a concerned effort. We must care about it.

We all know this and we all know how to do it, we just need to make sure that we also actually do it.

The steps

Write code following the rules
Review written code and make sure it is clear and easy to read.
Test the code. Before and after merge
Verify the products and APIs to find cracks
Bug-bounty to reward outside helpers
Act on mistakes – because they will happen

Writing

For users of libcurl we provide an API with safe and secure defaults as we understand the power of the default. We also document everything with details and take great pride in having world-class documentation. To reduce the risk of applications becoming unsafe just because our API was unclear.

We also document internal APIs and functions to help contributors write better code when improving and changing curl.

We don’t allow compiler warnings to remain – on any platform. This is sometimes quite onerous since we build on such a ridiculous amount of systems.

We encourage use of source code comments and assert()s to make assumptions obvious. (curl is primarily written in C.)

Number of lines of (product) code in the curl project over time.

Review

All code should be reviewed. Maintainers are however allowed to review and merge their own pull-requests for practical reasons.

Code should be easy to read and understand. Our code style must be followed and encourages that: for example, no assignments in conditions, one statement per line, no lines longer than 80 columns and more.

Strict compliance with the code style also means that the code gets a flow and a consistent look, which makes it easier to read and manage. We have a tool that verifies most aspects of the code style, which takes away most of that duty away from humans. I find that PR authors generally take code style remarks better when pointed out by a tool than when humans do it.

A source code change is accompanied with a git commit message that need to follow the template. A consistent commit message style makes it easier to later come back and understand it proper when viewing source code history.

Test

We want everything tested.

Unit tests. We strive at writing more and more unit tests of internal functions to make sure they truly do what expected.
System tests. Do actual network transfers against test servers, and make sure different situations are handled.
Integration tests. Test libcurl and its APIs and verify that they handle what they are expected to.
Documentation tests. Check formats, check references and cross-reference with source code, check lists that they include all items, verify that all man pages have all sections, in the same order and that they all have examples.
“Fix a bug? Add a test!” is a mantra that we don’t always live up to, but we try.

curl runs on 80+ operating systems and 20+ CPU architectures, but we only run tests on a few platforms. This usually works out fine because most of the code is written to run on multiple platforms so if tested on one, it will also run fine on all the other.

curl has a flexible build system that offers many million different build combinations with over 30 different possible third-party libraries in countless version combinations. We cannot test all build combos, but we try to test all the popular ones and at least one for each config option enabled and disabled.

We have many tests, but there are unfortunately still gaps and details not tested by the test suite. For those things we simply have to rely on the code review and then that users report problems in the shipped products.

Number of test cases, test files really, over time.

Verify

We run all the tests using valgrind to make sure nothing leaks memory or do bad memory accesses.

We build and run with address, undefined behavior and integer overflow sanitizers.

We are part of the OSS-Fuzz project which fuzzes curl code non-stop, and we run CIFuzz in CI builds, which runs “a little” fuzzing on the curl code in the normal pull-request process.

We do “torture testing“: run a test case once and count the number of “fallible” function calls it makes. Those are calls to memory allocation, file operations, socket read/write etc. Then re-run the test that many times, and for each new iteration we make another one of the fallible functions fail and return error. Verify that no memory leaks or crashes occur. Do this on all tests.

We use several different static code analyzers to scan the code checking for flaws and we always fix or otherwise handle every reported defect. Many of them for each pull-request and commit, some are run regularly outside of that process:

scan-build
clang tidy
lgtm
CodeQL
Lift
Coverity

The exact set has varied and will continue to vary over time as services come and go.

Bug-bounty

No matter how hard we try, we still ship bugs and mistakes. Most of them of course benign and harmless but some are not. We run a bug-bounty program to reward security searchers real money for reported security vulnerabilities found in curl. Until today, we have paid almost 17,000 USD in total and we keep upping the amounts for new findings.

When we report security problems, we produce detailed and elaborate advisories to help users understand every subtle detail about the problem and we provide overview information that shows exactly what versions are vulnerable to which problems. The curl project aims to also be a world-leader in security advisories and related info.

Act on mistakes

We are not immune, no matter how hard we try. Bad things will happen. When they do, we:

Act immediately.
Own the problem, responsibly
Fix it and announce it – as soon as possible
Learn from it
Make it harder to do the same or similar mistakes again

Does it work? Do we actually learn from our history of mistakes? Maybe. Having our product in ten billion installations is not a proof of this. There are some signs that might show we are doing things right:

We were reporting fewer CVEs/year the last few years but in 2021 we went back up. It could also be the result of more people looking, thanks to the higher monetary rewards offered. At the same time the number of lines of code have kept growing at a rate of around 6,000 lines per year.
We get almost no issues reported by OSS-Fuzz anymore. The first few years it ran it found many problems.
We are able to increase our bug-bounty payouts significantly and now pay more than one thousand USD almost every time. We know people are looking hard for security bugs.

Security vulnerabilities. Fixed vs Introduced over the years.

Continuous Integration

For every pull-request and commit done in the project, we run about 100 different builds + test rounds.

Total number of CI builds per pull-request and commit, over time

Test code style
Run thousands of tests per build
Build and test on tens of platforms
Over twenty hours of CPU time per commit
Done using several different CI services for maximum performance, widest possible coverage and shortest time to completion.

We currently use the following CI services: Cirrus CI, AppVeyor, Azure Pipelines, GitHub Actions, Circle CI and Zuul CI.

We also have a separate autobuild system with systems run by volunteers that checkout the latest code, build, run all the tests and report back in a continuous manner a few times or maybe once per day.

New habits past mistakes have taught us

We have done several changes to curl internals as direct reactions to past security vulnerabilities and their root causes. Lessons learned.

Unified dynamic buffer functions

These days we have a family of functions for working with dynamically sized buffers. Be using the same set for this functionality we have it well tested and we reduce the risk that new code messes up. Again, nothing revolutionary or strange, but as curl had grown organically over the decades, we found ourselves in need of cleaning this up one day. So we did.

Maximum string sizes

Several past mistakes came from possible integer overflows due to libcurl accepting input string sizes of unrestricted lengths and after doing operations on such string sizes, they would sometimes lead to overflows.

Since a few years back now, no string passed to curl is allowed to be larger than eight megabytes. This limit is somewhat arbitrarily set but is meant to be way larger than the largest user names and passwords ever used etc. We could also update the limit in a future, should we want. It’s not a limit that is exposed in the API or even mentioned. It is there to trap mistakes and malicious use.

Avoid reallocs

Thanks to the previous points we now avoid realloc as far as possible outside of those functions. History shows that realloc in combination with integer overflows have been troublesome for us. Now, both reallocs and integer overflows should be much harder to mess up.

Code coverage

A few years ago we ran code coverage reports for one build combo on one platform. This generated a number that really didn’t mean a lot to anyone but instead rather mislead users to drawing funny conclusions based on the report. We stopped that. Getting a “complete” and representative number for code coverage for curl is difficult and nobody has yet gone back to attempt this.

The impact of security problems

Every once in a while someone discovers a security problem in curl. To date, those security vulnerabilities have been limited to certain protocols and features that are not used by everyone and in many cases even disabled at build-time among many users. The issues also often rely on either a malicious user to be involved, either locally or remotely and for a lot of curl users, the environments it runs in limit that risk.

To date, I’m not aware of any curl user, ever, having been seriously impacted by a curl security problem.

This is not a guarantee that it will not ever happen. I’m only stating facts about the history so far. Security is super hard and I can only promise that we will keep working hard on shipping secure products.

Is it scary?

Changes done to curl code today will end up in billions of devices within a few years. That’s an intimidating fact that could truly make you paralyzed by fear of the risk that the world will “burn” due to a mistake of mine.

Rather than instilling fear by this outlook, I think the proper way to think it about it, is respecting the challenge and “shouldering the responsibility”. Make the changes we deem necessary, but make them according to the guidelines, follow the rules and trust that the system we have setup is likely to detect almost every imaginable mistake before it ever reaches a release tarball. Of course we plug holes in the test suite that we spot or suspect along the way.

The back-door threat

I blogged about that recently. I think a mistake is much more likely to slip-in and get shipped to the world than a deliberate back-door is.

Memory safe components might help

By rewriting parts of curl to use memory safe components, such as hyper for HTTP, we might be able to further reduce the risk of future vulnerabilities. That’s a long game to make reality. It will also be hard in the future to actually measure and tell for sure if it truly made an impact.

How can you help out?

Pay for a curl support contract. This is what enables me to work full time on curl.
Help out with reviews and adding new tests to curl
Help out with fixing issues and improving the code
Sponsor curl
Report all bugs you find
Upgrade your systems to run modern curl versions

Credits

Image by Dorian Krauss from Pixabay

cURL and libcurl, Security

No easter eggs in curl

December 6, 2021 Daniel Stenberg 8 Comments

Easter egg; noun:

An unexpected or undocumented feature in a piece of computer software, included as a joke or a bonus.

There are no Easter eggs in curl. For the good.

I’ve been asked about this many times. Among the enthusiast community, people seem to generally like the concept of Easter eggs and hidden treasures, features and jokes in software and devices. Having such an embedded surprise is considered fun and curl being a cool and interesting project should be fun too!

With the risk of completely ruining my chances of ever being considered a fun person, I’ll take you through my thought process on why curl does not feature any such Easter eggs and why it will not have any in the future either.

Trust

The primary and main reason is the question of trust.

We deliver products with known and documented functionality. Everything is known and documented. There’s nothing secret or hidden. The users see it all, can learn it all and it all is documented. We are always 100% transparent.

curl is installed in some ten billion installations to date and we are doing everything we can to be responsible and professional to make sure curl can and will be installed in many more places going forward.

Having an Easter egg in curl would violate several of the “commandments” we live by. If we could hide an Easter egg, what else is there that we haven’t shown or talked about?

Security

Everything in curl needs to be scrutinized, poked at, “tortured” and reviewed for security. An Easter egg would as well, as otherwise it would be an insecure component and therefor a security risk. This makes it impossible to maintain an Easter egg even almost secret.

Adding code to perform an Easter egg would mean adding code that potentially could cause problems to users by the plain unexpected nature of an Easter egg. Unexpected behavior is not a good foundation for security and secure procedures.

Boring is good

curl is not meant to be “fun” (on that fun scale). curl is here to perform its job, exactly as documented and expected and it is not meant to be fun. Boring is good and completely predictable. Boring is to deliver nothing else than the expected.

Even more security

If we would add an Easter egg, which by definition would be a secret or surprise to many, it would need to be hidden or sneaked in somehow and remain undocumented. We cannot allow code or features to get “snuck in” or remain undocumented.

If we would allow some features to get added like that, where would we draw the line? What other functionality and code do we merge into curl without properly disclosing and documenting it?

Useless work

If we would allow an Easter egg to get merged, we would soon start getting improvements to the egg code and people would like to add more eggs and to change the existing one. We would spend time and effort on the silly parts and we would need to spend testing and energy on these jokes instead of the real thing. We already have enough work without adding irrelevant work to the pile.

“Unintended Easter eggs”

We frequently ship bugs and features that go wrong. Due to fluke or random accidents, some of those mistakes can perhaps at times almost appear as Easter eggs, if you try hard. Still, when they are not done on purpose they are just bugs – not Easter eggs – and we will fix them as soon as we get them reported and have the chance.

A cover-up?

Yes, some readers will take this denial as a sign that there actually exists an Easter egg in curl and I am just doing my best to hide it. My advice to you, if you are one of those thinking this, is to read the code. We all benefit if more people read and carefully investigate the code so we will just be happy if you do and then ask us about whatever you think is unclear or “suspicious”.

I am not judging

This is not a judgement on projects that ship Easter eggs. I respect and acknowledge that different projects and people resonate differently on these topics.

Credits

Image by anncapictures from Pixabay

The tool came first

Code sizes

Users at scale

Complexity

Where it matters

Where it pays

Where its fun

Theory meets practice

Summary

Craft content to send

JSON is the new popular kid

JSON in curl

Introducing --json

Details

Ships!

Future JSON stuff

jo+curl+jq

Content quickly

Visualized on video

GitHub can’t count

The email

Their reply

My second reply

The image version of the initial email

The company

Discussed

Use one and only one

The problem

URL Parsing Confusion

curl’s URL API

Other work-arounds

Future

Release presentation

Numbers

Security

Change

Bug-fixes

require “see also” for every documented option in curl.1

lazy-alloc the table in Curl_hash_add()

enable haproxy support for hyper backend

mbedTLS: add support for CURLOPT_CAINFO_BLOB

multi: handle errors returned from socket/timer callbacks

nss:set_cipher don’t clobber the cipher list

openldap: implement STARTTLS

openssl: define HAVE_OPENSSL_VERSION for OpenSSL 1.1.0+

sha256/md5: return errors when init fails

curl: updated search for a file in the homedir

url: check ssl_config when re-use proxy connection

ngtcp2: verify server certificate

urlapi: accept port number zero

urlapi: provide more detailed return codes

socks5h: use appropriate ATYP for numerical IP address

Coming up?

Next release

Contributions

New backends

Everything curl

Death threat

Mars

curl

Always improving

Not a coincidence

The steps

Writing

Review

Test

Verify

Bug-bounty

Act on mistakes

Continuous Integration

New habits past mistakes have taught us

Unified dynamic buffer functions

Maximum string sizes

Avoid reallocs

Code coverage

The impact of security problems

Is it scary?

The back-door threat

Memory safe components might help

How can you help out?

Introducing `--json`