Category Archives: Open Source

Open Source, Free Software, and similar

curl is 23 years old today

curl’s official birthday was March 20, 1998. That was the day the first ever tarball was made available that could build a tool named curl. I put it together and I called it curl 4.0 since I kept the version numbering from the previous names I had used for the tool. Or rather, I bumped it up from 3.12 which was the last version I used under the previous name: urlget.

Of course curl wasn’t created out of thin air exactly that day. The history can be traced back a little over a year earlier: On November 11, 1996 there was a tool named httpget released. It was developed by Rafael Sagula and this was the project I found and started contributing to. httpget 0.1 was less than 300 lines of a single C file. (The earliest code I still have source to is httpget 1.3, found here.)

I’ve said it many times before but I started poking on this project because I wanted to have a small tool to download currency rates regularly from a web site site so that I could offer them in my IRC bot’s currency exchange.

Small and quick decisions done back then, that would later make a serious impact on and shape my life. curl has been one of my main hobbies ever since – and of course also a full-time job since a few years back now.

On that exact same November day in 1996, the first Wget release shipped (1.4.0). That project also existed under another name prior to its release – and remembering back I don’t think I knew about it and I went with httpget for my task. Possibly I found it and dismissed it because of its size. The Wget 1.4.0 tarball was 171 KB.

After a short while, I took over as maintainer of httpget and expanded its functionality further. It subsequently was renamed to urlget when I added support for Gopher and FTP (driven by the fact that I found currency rates hosted on such servers as well). In the spring of 1998 I added support for FTP upload as well and the name of the tool was again misleading and I needed to rename it once more.

Naming things is really hard. I wanted a short word in classic Unix style. I didn’t spend an awful lot of time, as I thought of a fun word pretty soon. The tool works on URLs and it is an Internet client-side tool. ‘c’ for client and URL made ‘cURL’ seem pretty apt and fun. And short. Very “unixy”.

I already then wanted curl to be a citizen in the Unix tradition of using pipes and stdout etc. I wanted curl to work mostly like the cat command but for URLs so it would by default send the URL to stdout in the terminal. Just like cat does. It would then let us “see” the contents of that URL. The letter C is pronounced as see, so “see URL” also worked. In my pun-liking mind I didn’t need more. (but I still pronounce it “kurl”!)

This is the original logo, created in 1998 by Henrik Hellerstedt

I packaged curl 4.0 and made it available to the world on that Friday. Then at 2,200 lines of code. In the curl 4.8 release that I did a few months later, the THANKS file mentions 7 contributors who had helped out. It took us almost seven years to reach a hundred contributors. Today, that file lists over 2,300 names and we add a few hundred new entries every year. This is not a solo project!

Nothing particular happened

curl was not a massive success or hit. A few people found it and 14 days after that first release I uploaded 4.1 with a few bug-fixes and a multi-decade tradition had started: keep on shipping updates with bug-fixes. “ship early and often” is a mantra we’ve stuck with.

Later in 1998 when we had done more than 15 releases, the web page featured this excellent statement:

Screenshot from the curl web site in December 1998

300 downloads!

I never had any world-conquering ideas or blue sky visions for the project and tool. I just wanted it to do Internet transfers good, fast and reliably and that’s what I worked on making reality.

To better provide good Internet transfers to the world, we introduced the library libcurl, shipped for the first time in the summer of 2000 and that then enabled the project to take off at another level. libcurl has over time developed into a de-facto internet transfer API.

Today, at its 23rd birthday that is still mostly how I view the main focus of my work on curl and what I’m here to do. I believe that if I’ve managed to reach some level of success with curl over time, it is primarily because of one particular quality. A single word:

Persistence

We hold out. We endure and keep polishing. We’re here for the long run. It took me two years (counting from the precursors) to reach 300 downloads. It took another ten or so until it was really widely available and used.

In 2008, the curl website served about 100 GB data every month. This months it serves 15,600 GB – which interestingly is 156 times more data over 156 months! But most users of course never download anything from our site but they get curl from their distro or operating system provider.

curl was adopted in Red Hat Linux in late 1998, became a Debian package in May 1999, shipped in Mac OS X 10.1 in August 2001. Today, it is also shipped by default in Windows 10 and in iOS and Android devices. Not to mention the game consoles, Nintendo Switch, Xbox and Sony PS5.

Amusingly, libcurl is used by the two major mobile OSes but not provided as an API by them, so lots of apps, including many extremely large volume apps bundle their own libcurl build: YouTube, Skype, Instagram, Spotify, Google Photos, Netflix etc. Meaning that most smartphone users today have many separate curl installations in their phones.

Further, libcurl is used by some of the most played computer games of all times: GTA V, Fortnite, PUBG mobile, Red Dead Redemption 2 etc.

libcurl powers media players and set-top boxes such as Roku, Apple TV by maybe half a billion TVs.

curl and libcurl ships in virtually every Internet server and is the default transfer engine in PHP, which is found in almost 80% of the world’s almost two billion websites.

Cars are Internet-connected now. libcurl is used in virtually every modern car these days to transfer data to and from the vehicles.

Then add media players, kitchen and medical devices, printers, smart watches and lots of “smart” IoT things. Practically speaking, just about every Internet-connected device in existence runs curl.

I’m convinced I’m not exaggerating when I claim that curl exists in over ten billion installations world-wide

Alone and strong

A few times over the years I’ve tried to see if curl could join an umbrella organization, but none has accepted us and I think it has all been for the best in the end. We are completely alone and independent, from organizations and companies. We do exactly as we please and we’re not following anyone else’s rules. Over the last few years, sponsorships and donations have really accelerated and we’re in a good position to pay large rewards for bug-bounties and more.

The fact that I and wolfSSL offer commercial curl support has only made curl stronger I believe: it lets me spend even more time working on curl and it makes more companies feel safer with going with curl, which in the end makes it better for all of us.

Those 300 lines of code in late 1996 have grown to 172,000 lines in March 2021.

Future

Our most important job is to “not rock the boat”. To provide the best and most solid Internet transfer library you can find, on as many platforms as possible.

But to remain attractive we also need to follow with the times and adapt to new protocols and new habits as they emerge. Support new protocol versions, enable better ways to do things and over time deprecate the bad things in responsible ways to not hurt users.

In the short term I think we want to work on making sure HTTP/3 works, make the Hyper backend really good and see where the rustls backend goes.

After 23 years we still don’t have any grand blue sky vision or road map items to guide us much. We go where Internet and our users lead us. Onward and upward!

The curl roadmap

23 curl numbers

Over the last few days ahead of this birthday, I’ve tweeted 23 “curl numbers” from the project using the #curl23 hashtag. Those twenty-three numbers and facts are included below.

2,200 lines of code by March 1998 have grown to 170,000 lines in 2021 as curl is about to turn 23 years old

14 different TLS libraries are supported by curl as it turns 23 years old

2,348 contributors have helped out making curl to what it is as it turns 23 years old

197 releases done so far as curl turns 23 years

6,787 bug-fixes have been logged as curl turns 23 years old

10,000,000,000 installations world-wide make curl one of the world’s most widely distributed 23 year-olds

871 committers have provided code to make curl a 23 year old project

935,000,000 is the official curl docker image pull-counter at (83 pulls/second rate) as curl turns 23 years old

22 car brands – at least – run curl in their vehicles when curl turns 23 years old

100 CI jobs run for every commit and pull-request in curl project as it turns 23 years old

15,000 spare time hours have been spent by Daniel on the curl project as it turns 23 years old

2 of the top-2 mobile operating systems bundle and use curl in their device operating systems as curl turns 23

86 different operating systems are known to have run curl as it turns 23 years old

250,000,000 TVs run curl as it turns 23 years old

26 transport protocols are supported as curl turns 23 years old

36 different third party libraries can optionally be built to get used by curl as it turns 23 years old

22 different CPU architectures have run curl as it turns 23 years old

4,400 USD have been paid out in total for bug-bounties as curl turns 23 years old

240 command line options when curl turns 23 years

15,600 GB data is downloaded monthly from the curl web site as curl turns 23 years old

60 libcurl bindings exist to let programmers transfer data easily using any language as curl turns 23 years old

1,327,449 is the total word count for all the relevant RFCs to read for curl’s operations as curl turns 23 years old

1 founder and lead developer has stuck around in the project as curl turns 23 years old

Credits

Image by AnnaER from Pixabay

half of curl’s vulnerabilities are C mistakes

I spent a lot of time and effort digging up the numbers and facts for this post!

Lots of people keep referring to the awesome summary put together by a friendly pseudonymous “Tim” which says that “53 out of 95” (55.7%) security flaws in curl could’ve been prevented if curl had been written in Rust. This is usually in regards to discussions around how insecure C is and what to do about it. I’ve blogged about this topic before, but things change, the world changes and my own view on these matters keep getting refined.

I did my own count: how many of the current 98 published security problems in curl are related to it being written in C?

Possibly due to the slightly different question, possibly because I’ve categorized one or two vulnerabilities differently, possibly because I’m biased as heck, but my count end up at:

51 out of 98 security vulnerabilities are due to C mistakes

That’s still 52%. (you can inspect my analysis and submit issues/pull-requests against the vuln.pm file) and yes, 51 flaws that could’ve been avoided if curl had been written in a memory safe language. This contradicts what I’ve said in the past, but I will also show you below that the numbers have changed and I still was right back then!

Let me also already now say that if you check out the curl security section, you will find very detailed descriptions of all vulnerabilities. Using those, you can draw your own conclusions and also easily write your own blog posts on this topic!

This post is not meant as a discussion around how we can rewrite C code into other languages to avoid these problems. This is an introspection of the C related vulnerabilities in curl. curl will not be rewritten but will continue to support backends written in other languages.

It seems hard to draw hard or definite conclusions based on the CVEs and C mistakes in curl’s history due to the relatively small amounts to analyze. I’m not convinced this is data enough to actually spot real trends, but might be mostly random coincidences.

98 flaws out of 6,682

The curl changelog counts a total of 6,682 bug-fixes at the time of this writing. It makes the share of all vulnerabilities to be 1.46% of all known curl bugs fixed through curl’s entire life-time, starting in March 1998.

Looking at recent curl development: the last three years. Since January 1st 2018, we’ve fixed 2,311 bugs and reported 26 vulnerabilities. Out of those 26 vulnerabilities, 18 (69%) were due to C mistakes. 18 out of 2,311 is 0.78% of the bug-fixes.

We’ve not reported a single C-based vulnerability in curl since September 2019, but six others. And fixed over a thousand other bugs. (There’s another vulnerability pending announcement, a 99th one, to become public on March 31, but that is also not a C mistake.)

This is not due to lack of trying. We’re one of the few small open source projects that pays several hundred dollars for any reported and confirmed security flaw since a few years back.

The share of C based security issues in curl is an extremely small fraction of the grand total of bugs. The security flaws are however of course the most fatal and serious ones – as all bugs are certainly not equal.

But also: not all vulnerabilities are equal. Very few curl vulnerabilities have had a severity level over medium and none has been marked critical.

Unfortunately we don’t have “severity” noted for very many many of the past vulnerabilities, as we only started that practice in 2019 and I’ve spent time and effort to backtrack and fill them in for the 2018 ones, but it’s a tedious job and I probably will not update the remainder soon, if at all.

51 flaws due to C

Let’s dive in to see how they look.

Here’s a little pie chart with the five different C mistake categories that have caused the 51 vulnerabilities. The categories here are entirely my own. No surprises here really. The two by far most common C mistakes that caused vulnerabilities are reading or writing outside a buffer.

Buffer overread – reading outside the buffer size/boundary. Very often due to a previous integer overflow.

Buffer overflow – code wrote more data into a buffer than it was allocated to hold.

Use after free – code used a memory area that had already been freed.

Double free – freeing a memory pointer that had already been freed.

NULL mistakes – NULL pointer dereference and NUL byte mistake.

Addressing the causes

I’ve previously described a bunch of the counter-measures we’ve done in the project to combat some of the most common mistakes we’ve done. We continue to enforce those rules in the project.

Two of the main methods we’ve introduced that are mentioned in that post, are that we have A) created a generic dynamic buffer system in curl that we try to use everywhere now, to avoid new code that handles buffers, and B) we enforce length restrictions on virtually all input strings – to avoid risking integer overflows.

Areas

When I did the tedious job of re-analyzing every single security vulnerability anyway, I also assigned an “area” to each existing curl CVE. Which area of curl in which the problem originated or belonged. If we look at where the C related issues were found, can we spot a pattern? I think not.

“internal” being the number one area, which means that was in generic code that affected multiple protocols or in several cases even entirely protocol independent.

HTTP was the second largest area, but that might just also reflect the fact that it is the by far most commonly used protocol in curl – and there is probably the most amount of protocol-specific code for this protocol. And there were a total of 21 vulnerabilities reported in that area, and 8 out of 21 is 38% C mistakes – way below the total average.

Otherwise I think we can conclude that the mistakes were distributed all over, rather nondiscriminatory…

C mistake history

As curl is an old project now and we have a long history to look back at, we can see how we have done in this regard throughout history. I think it shows quite clearly that age hasn’t prevented C related mistakes to slip in. Even if we are experienced C programmers and aged developers, we still let such flaws slip in. Or at least we don’t find old such mistakes that went in a long time ago – as the reported vulnerabilities in the project have usually been present in the source code for many years at the time of the finding.

The fact is that we only started to take proper and serious counter-measures against such mistakes in the last few years and while the graph below shows that we’ve improved recently, I don’t think we yet have enough data to show that this is a true trend and not just a happenstance or a temporary fluke.

The blue line in the graph shows how big the accumulated share of all security vulnerabilities has been due to C mistakes over time. It shows we went below 50% totally in 2012, only to go above 50% again in 2018 and we haven’t come down below that again…

The red line shows the percentage share the last twelve months at that point. It illustrates that we have had several series of vulnerabilities reported over the years that were all C mistakes, and it has happened rather recently too. During the period one year back from the very last reported vulnerability, we did not have a single C mistake among them.

Finding the flaws takes a long time

C mistakes might be easier to find and detect in source code. valgrind, fuzzing, static code analyzers and sanitizers can find them. Logical problems cannot as easily be detected using tools.

I decided to check if this seems to be the case in curl and if it is true, then C mistakes should’ve lingered in the code for a shorter time until found than other mistakes.

I had a script go through the 98 existing vulnerabilities and calculating the average time the flaws were present in the code until reported, splitting out the C mistake ones from the ones not caused by C mistakes. It revealed a (small) difference:

C mistake vulnerabilities are found on average at 80% of the time other mistakes need to get found. Or put the other way around: mistakes that were not C mistakes took 25% longer to get reported – on average. I’m not convinced the difference is very significant. C mistakes are still shipped in code for 2,421 days – on average – until reported. Looking over the last 10 C mistake vulnerabilities, the average is slightly lower at 2,108 days (76% of the time the 10 most recent non C mistakes were found). Non C mistakes take 3,030 days to get reported on average.

Reproducibility

All facts I claim and provide in this blog post can be double-checked and verified using available public data and freely available scripts.

Discuss

Hacker news

Lobste.rs

Reddit

“I will slaughter you”

You might know that I’ve posted funny emails I’ve received on my blog several times in the past. The kind of emails people send me when they experience problems with some device they own (like a car) and they contact me because my email address happens to be visible somewhere.

People sometimes say I should get a different email address or use another one in the curl license file, but I’ve truly never had a problem with these emails, as they mostly remind me about the tough challenges the modern technical life bring to people and it gives me insights about what things that run curl.

But not all of these emails are “funny”.

Category: not funny

Today I received the following email

From: Al Nocai <[redacted]@icloud.com>
Date: Fri, 19 Feb 2021 03:02:24 -0600
Subject: I will slaughter you

That subject.

As an open source maintainer since over twenty years, I know flame wars and personal attacks and I have a fairly thick skin and I don’t let words get to me easily. It took me a minute to absorb and realize it was actually meant as a direct physical threat. It found its ways through and got to me. This level of aggressiveness is not what I’m prepared for.

Attached in this email, there were seven images and no text at all. The images all look like screenshots from a phone and the first one is clearly showing source code I wrote and my copyright line:

The other images showed other source code and related build/software info of other components, but I couldn’t spot how they were associated with me in any way.

No explanation, just that subject and the seven images and I was left to draw my own conclusions.

I presume the name in the email is made up and the email account is probably a throw-away one. The time zone used in the Date: string might imply US central standard time but could of course easily be phony as well.

How I responded

Normally I don’t respond to these confused emails because the distance between me and the person writing them is usually almost interplanetary. This time though, it was so far beyond what’s acceptable to me and in any decent society I couldn’t just let it slide. After I took a little pause and walked around my house for a few minutes to cool off, I wrote a really angry reply and sent it off.

This was a totally and completely utterly unacceptable email and it hurt me deep in my soul. You should be ashamed and seriously reconsider your manners.

I have no idea what your screenshots are supposed to show, but clearly something somewhere is using code I wrote. Code I have written runs in virtually every Internet connected device on the planet and in most cases the users download and use it without even telling me, for free.

Clearly you don’t deserve my code.

I don’t expect that it will be read or make any difference.

Update below, added after my initial post.

Al Nocai’s response

Contrary to my expectations above, he responded. It’s not even worth commenting but for transparency I’ll include it here.

I do not care. Your bullshit software was an attack vector that cost me a multimillion dollar defense project.

Your bullshit software has been used to root me and multiple others. I lost over $15k in prototyping alone from bullshit rooting to the charge arbitrators.

I have now since October been sandboxed because of your bullshit software so dipshit google kids could grift me trying to get out of the sandbox because they are too piss poor to know shat they are doing.

You know what I did to deserve that? I tried to develop a trade route in tech and establish project based learning methodologies to make sure kids aren’t left behind. You know who is all over those god damn files? You are. Its sickening. I got breached in Oct 2020 through federal server hijacking, and I owe a great amount of that to you.

Ive had to sit and watch as i reported:

  1. fireeye Oct/2020
  2. Solarwinds Oct/2020
  3. Zyxel Modem Breach Oct/2020
  4. Multiple Sigover attack vectors utilizing favicon XML injection
  5. JS Stochastic templating utilizing comparison expressions to write to data registers
  6. Get strong armed by $50billion companies because i exposed bullshit malware

And i was rooted and had my important correspondence all rerouted as some sick fuck dismantled my life with the code you have your name plastered all over. I cant even leave the country because of the situation; qas you have so effectively built a code base to shit all over people, I dont give a shit how you feel about this.

You built a formula 1 race car and tossed the keys to kids with ego problems. Now i have to deal with Win10 0-days because this garbage.

I lost my family, my country my friends, my home and 6 years of work trying to build a better place for posterity. And it has beginnings in that code. That code is used to root and exploit people. That code is used to blackmail people.

So no, I don’t feel bad one bit. You knew exactly the utility of what you were building. And you thought it was all a big joke. Im not laughing. I am so far past that point now.

/- Al

Al continues

Nine hours after I first published this blog post , Al replied again with two additional emails. His third and forth emails to me.

Email 3:

https://davidkrider.com/i-will-slaughter-you-daniel-haxx-se/
Step up. You arent scaring me. What led me here? The 5th violent attempt on my life. Apple terms of service? gtfo, thanks for the platform.

Amusingly he has found a blog post about my blog post.

Email 4:

There is the project: MOUT Ops Risk Analysis through Wide Band Em Spectrum analysis through different fourier transforms.
You and whoever the fuck david dick rider is, you are a part of this.
Federal server breaches-
Accomplice to attempted murder-
Fraud-
just a few.

I have talked to now: FBI FBI Regional, VA, VA OIG, FCC, SEC, NSA, DOH, GSA, DOI, CIA, CFPB, HUD, MS, Convercent, as of today 22 separate local law enforcement agencies calling my ass up and wasting my time.

You and dick ridin’ dave are respinsible. I dont give a shit, call the cops. I cuss them out wheb they call and they all go silent.

I’ve kept his peculiar formatting and typos. In email 4 there was also a PDF file attached named BustyBabes 4.pdf. It is apparently a 13 page document about the “NERVEBUS NERVOUS SYSTEM” described in the first paragraph as “NerveBus Nervous System aims to be a general utility platform that provides comprehensive and complex analysis to provide the end user with cohesive, coherent and “real-time” information about the environment it monitors.”. There’s no mention of curl or my name in the document.

Since I don’t know the status of this document I will not share it publicly, but here’s a screenshot of the front page:

Related

This topic on hacker news and reddit.

I have reported the threat to the Swedish police (where I live).

This person would later apologize.

Transfers vs connections spring cleanup

Warning: this post is full of libcurl internal architectural details and not much else.

Within libcurl there are two primary objects being handled; transfers and connections. The transfers objects are struct Curl_easy and the connection counterparts are struct connectdata.

This is a separation and architecture as old as libcurl, even if the internal struct names have changed a little through the years. A transfer is associated with none or one connection object and there’s a pool with potentially several previously used, live, connections stored for possible future reuse.

A simplified schematic picture could look something like this:

One transfer, one connection and three idle connections in the pool.

Transfers to connections

These objects are protocol agnostic so they work like this no matter which scheme was used for the URL you’re transferring with curl.

Before the introduction of HTTP/2 into curl, which landed for the first time in September 2013 there was also a fixed relationship that one transfer always used (none or) one connection and that connection then also was used by a single transfer. libcurl then stored the association in the objects both ways. The transfer object got a pointer to the current connection and the connection object got a pointer to the current transfer.

Multiplexing shook things up

Lots of code in libcurl passed around the connection pointer (conn) because well, it was convenient. We could find the transfer object from that (conn->data) just fine.

When multiplexing arrived with HTTP/2, we then could start doing multiple transfers that share a single connection. Since we passed around the conn pointer as input to so many functions internally, we had to make sure we updated the conn->data pointer in lots of places to make sure it pointed to the current driving transfer.

This was always awkward and the source for agony and bugs over the years. At least twice I started to work on cleaning this up from my end but it quickly become a really large work that was hard to do in a single big blow and I abandoned the work. Both times.

Third time’s the charm

This architectural “wart” kept bugging me and on January 8, 2021 I emailed the curl-library list to start a more organized effort to clean this up:

Conclusion: we should stop using ‘conn->data’ in libcurl

Status: there are 939 current uses of this pointer

Mission: reduce the use of this pointer, aiming to reach a point in the future when we can remove it from the connection struct.

Little by little

With the help of fellow curl maintainer Patrick Monnerat I started to submit pull requests that would remove the use of this pointer.

Little by little we changed functions and logic to rather be anchored on the transfer rather than the connection (as data->conn is still fine as that can only ever be NULL or a single connection). I made a wiki page to keep an updated count of the number of references. After the first ten pull requests we were down to just over a hundred from the initial 919 – yeah the mail quote says 939 but it turned out the grep pattern was slightly wrong!

We decided to hold off a bit when we got closer to the 7.75.0 release so that we wouldn’t risk doing something close to the ship date that would jeopardize it. Once the release had been pushed out the door we could continue the journey.

Gone!

As of today, February 16 2021, the internal pointer formerly known as conn->data doesn’t exist anymore in libcurl and therefore it can’t be used and this refactor is completed. It took at least 20 separate commits to get the job done.

I hope this new order will help us do less mistakes as we don’t have to update this pointer anymore.

I’m very happy we could do this revamp without it affecting the API or ABI in any way. These are all just internal artifacts that are not visible to the outside.

One of a thousand little things

This is just a tiny detail but the internals of a project like curl consists of a thousand little details and this is one way we make sure the code remains in a good shape. We identify improvements and we perform them. One by one. We never stop and we’re never done. Together we take this project into the future and help the world do Internet transfers properly.

curl –fail-with-body

That’s --fail-with-body, using two dashes in front of the name.

This is a brand new command line option added to curl, to appear in the 7.76.0 release. This function works like --fail but with one little addition and I’m hoping the name should imply it good enough: it also provides the response body. The --fail option has turned out to be a surprisingly popular option but users have often repeated the request to also make it possible to get the body stored. --fail makes curl stop immediately after having received the response headers – if the response code says so.

--fail-with-body will instead first save the body per normal conventions and then return an error if the HTTP response code was 400 or larger.

To be used like this:

curl --fail-with-body -o output https://example.com/404.html

If the page is missing on that HTTPS server, curl will return exit code 22 and save the error message response in the file named ‘output’.

Not complicated at all. But has been requested many times!

This is curl’s 238th command line option.

Webinar: curl, Hyper and Rust

On February 11th, 2021 18:00 UTC (10am Pacific time, 19:00 Central Europe) we invite you to participate in a webinar we call “curl, Hyper and Rust”. To join us at the live event, please register via the link below:

https://www.wolfssl.com/isrg-partner-webinar/

What is the project about, how will this improve curl and Hyper, how was it done, what lessons can be learned, what more can we expect in the future and how can newcomers join in and help?

Participating speakers in this webinar are:

Daniel Stenberg. Founder of and lead developer of curl.

Josh Aas, Executive Director at ISRG / Let’s Encrypt.

Sean McArthur, Lead developer of Hyper.

The event went on for 60 minutes, including the Q&A session at the end.

Recording

Questions?

If you already have a question you want to ask, please let us know ahead of time. Either in a reply here on the blog, or as a reply on one of the many tweets that you will see about about this event from me and my fellow “webinarees”.

curl 7.75.0 is smaller

There’s been another 56 day release cycle and here’s another curl release to chew on!

Release presentation

Numbers

the 197th release
6 changes
56 days (total: 8,357)

113 bug fixes (total: 6,682)
268 commits (total: 26,752)
0 new public libcurl function (total: 85)
1 new curl_easy_setopt() option (total: 285)

2 new curl command line option (total: 237)
58 contributors, 30 new (total: 2,322)
31 authors, 17 new (total: 860)
0 security fixes (total: 98)
0 USD paid in Bug Bounties (total: 4,400 USD)

Security

No new security advisories this time!

Changes

We added --create-file-mode to the command line tool. To be used for the protocols where curl needs to tell the remote server what “mode” to use for the file when created. libcurl already supported this, but now we expose the functionality to the tool users as well.

The --write-out option got five new “variables” to use. Detailed in this separate blog post.

The CURLOPT_RESOLVE option got an extended format that now allows entries to be inserted to get timed-out after the standard DNS cache expiry time-out.

gophers:// – the protocol GOPHER done over TLS – is now supported by curl.

As a new experimentally supported HTTP backend, you can now build curl to use Hyper. It is not quite up to 100% parity in features just yet.

AWS HTTP v4 Signature support. This is an authentication method for HTTP used by AWS and some other services. See CURLOPT_AWS_SIGV4 for libcurl and --aws-sigv4 for the curl tool.

Bug-fixes

Some of the notable things we’ve fixed this time around…

Reduced struct sizes

In my ongoing struggles to remove “dead weight” and ensure that curl can run on as small devices as possible, I’ve trimmed down the size of several key structs in curl. The memory foot-print of libcurl is now smaller than it has been for a very long time.

Reduced conn->data references

While itself not exactly a bug-fix, this is a step in a larger refactor of libcurl where we work on removing all references back from connections to the transfer. The grand idea is that transfers can point to connections, but since a connection can be owned and used by many transfers, we should remove all code that reference back to a transfer from the connection. To simplify internals. We’re not quite there yet.

Silly writeout time units bug

Many users found out that when asking the curl tool to output timing information with -w, I accidentally made it show microseconds instead of seconds in 7.74.0! This is fixed and we’re back to the way it always was now…

CURLOPT_REQUEST_TARGET works with HTTP proxy

The option that lets the user set the “request target” of a HTTP request to something custom (like for example “*” when you want to issue a request using the OPTIONS method) didn’t work over proxy!

CURLOPT_FAILONERROR fails after all headers

Often used with the tools --fail flag, this is feature that makes libcurl stop and return error if the HTTP response code is 400 or larger. Starting in this version, curl will still read and parse all the response headers before it stops and exists. This then allows curl to act on and store contents from the other headers that can be used for features in combination with --fail.

Proxy-Connection duplicated headers

In some circumstances, providing a custom “Proxy-Connection:” header for a HTTP request would still get curl’s own added header in the request as well, making the request get sent with a duplicate set!

CONNECT chunked encoding race condition

There was a bug in the code that handles proxy responses, when the body of the CONNECT responses was using chunked-encoding. curl could end up thinking the response had ended before it actually had…

proper User-Agent header setup

Back in 7.71.0 we fixed a problem with the user-agent header and made it get stored correctly in the transfer struct, from previously having been stored in the connection struct.

That cause a regression that we fixed now. The previous code had a mistake that caused the user-agent header to not get used when a connection was re-used or multiplexed, which from an outside perspective made it appear go missing in a random fashion…

add support for %if [feature] conditions in tests

Thanks to the new preprocessor we added for test cases some releases ago, we could now take the next step and offer conditionals in the test cases so that we can now better allow tests to run and behave differently depending on features and parameters. Previously, we have resorted to only make tests require a certain feature set to be able to run and otherwise skip the tests completely if the feature set could be satisfied, but with this new ability we can make tests more flexible and thus work with a wider variety of features.

if IDNA conversion fails, fallback to Transitional

A user reported that that curl failed to get the data when given a funny URL, while it worked fine in browsers (and wget):

The host name consists of a heart and a fox emoji in the .ws top-level domain. This is yet another URLs-are-weird issue and how to do International Domain Names with them is indeed a complicated matter, but starting now curl falls back and tries a more conservative approach if the first take fails and voilá, now curl too can get the heart-fox URL just fine… Regular readers of my blog might remember the smiley URLs of 2015, which were similar.

urldata: make magic first struct field

We provide types for the most commonly used handles in the libcurl API as typedef’ed void pointers. The handles are typically declared like this:

CURL *easy;
CURLM *multi;
CURLSH *shared;

… but since they’re typedefed void-pointers, the compiler cannot helpfully point out if a user passes in the wrong handle to the wrong libcurl function and havoc can ensue.

Starting now, all these three handles have a “magic” struct field in the same relative place within their structs so that libcurl can much better detect when the wrong kind of handle is passed to a function and instead of going bananas or even crash, libcurl can more properly and friendly return an error code saying the input was wrong.

Since you’d ask: using void-pointers like that was a mistake and we regret it. There are better ways to accomplish the same thing, but the train has left. When we’ve tried to correct this situation there have been cracks in the universe and complaints have been shouted through the ether.

SECURE_RENEGOTIATION support for wolfSSL

Turned out we didn’t support this before and it wasn’t hard to add…

openssl: lowercase the host name before using it for SNI

The SNI (Server Name Indication) field is data set in TLS that tells the remote server which host name we want to connect to, so that it can present the client with the correct certificate etc since the server might serve multiple host names.

The spec clearly says that this field should contain the DNS name and that it is case insensitive – like DNS names are. Turns out it wasn’t even hard to find multiple servers which misbehave and return the wrong cert if the given SNI name is not sent lowercase! The popular browsers typically always send the SNI names like that… In curl we keep the host name internally exactly as it was given to us in the URL.

With a silent protest that nobody cares about, we went ahead and made curl also lowercase the host name in the SNI field when using OpenSSL.

I did not research how all the other TLS libraries curl can use behaves when setting SNI. This same change is probably going to have to be done on more places, or perhaps we need to surrender and do the lowercasing once and for all at a central place… Time will tell!

Future!

We have several pull requests in the queue suggesting changes, which means the next release is likely to be named 7.76.0 and the plan is to ship that on March 31st, 2021.

Send us your bug reports and pull requests!

What if GitHub is the devil?

Some critics think the curl project shouldn’t use GitHub. The reasons for being against GitHub hosting tend to be one or more of:

  1. it is an evil proprietary platform
  2. it is run by Microsoft and they are evil
  3. GitHub is American thus evil

Some have insisted on craziness like “we let GitHub hold our source code hostage”.

Why GitHub?

The curl project switched to GitHub (from Sourceforge) almost eleven years ago and it’s been a smooth ride ever since.

We’re on GitHub not only because it provides a myriad of practical features and is a stable and snappy service for hosting and managing source code. GitHub is also a developer hub for millions of developers who already have accounts and are familiar with the GitHub style of developing, the terms and the tools. By being on GitHub, we reduce friction from the contribution process and we maximize the ability for others to join in and help. We lower the bar. This is good for us.

I like GitHub.

Self-hosting is not better

Providing even close to the same uptime and snappy response times with a self-hosted service is a challenge, and it would take someone time and energy to volunteer that work – time and energy we now instead can spend of developing the project instead. As a small independent open source project, we don’t have any “infrastructure department” that would do it for us. And trust me: we already have enough infrastructure management to deal with without having to add to that pile.

… and by running our own hosted version, we would lose the “network effect” and convenience for people that already are on and know the platform. We would also lose the easy integration with cool services like the many different CI and code analyzer jobs we run.

Proprietary is not the issue

While git is open source, GitHub is a proprietary system. But the thing is that even if we would go with a competitor and get our code hosting done elsewhere, our code would still be stored on a machine somewhere in a remote server park we cannot physically access – ever. It doesn’t matter if that hosting company uses open source or proprietary code. If they decide to switch off the servers one day, or even just selectively block our project, there’s nothing we can do to get our stuff back out from there.

We have to work so that we minimize the risk for it and the effects from it if it still happens.

A proprietary software platform holds our code just as much hostage as any free or open source software platform would, simply by the fact that we let someone else host it. They run the servers our code is stored on.

If GitHub takes the ball and goes home

No matter which service we use, there’s always a risk that they will turn off the light one day and not come back – or just change the rules or licensing terms that would prevent us from staying there. We cannot avoid that risk. But we can make sure that we’re smart about it, have a contingency plan or at least an idea of what to do when that day comes.

If GitHub shuts down immediately and we get zero warning to rescue anything at all from them, what would be the result for the curl project?

Code. We would still have the entire git repository with all code, all source history and all existing branches up until that point. We’re hundreds of developers who pull that repository frequently, and many automatically, so there’s a very distributed backup all over the world.

CI. Most of our CI setup is done with yaml config files in the source repo. If we transition to another hosting platform, we could reuse them.

Issues. Bug reports and pull requests are stored on GitHub and a sudden exit would definitely make us lose some of them. We do daily “extractions” of all issues and pull-requests so a lot of meta-data could still be saved and preserved. I don’t think this would be a terribly hard blow either: we move long-standing bugs and ideas over to documents in the repository, so the currently open ones are likely possible to get resubmitted again within the nearest future.

There’s no doubt that it would be a significant speed bump for the project, but it would not be worse than that. We could bounce back on a new platform and development would go on within days.

Low risk

It’s a rare thing, that a service just suddenly with no warning and no heads up would just go black and leave projects completely stranded. In most cases, we get alerts, notifications and get a chance to transition cleanly and orderly.

There are alternatives

Sure there are alternatives. Both pure GitHub alternatives that look similar and provide similar services, and projects that would allow us to run similar things ourselves and host locally. There are many options.

I’m not looking for alternatives. I’m not planning to switch hosting anytime soon! As mentioned above, I think GitHub is a net positive for the curl project.

Nothing lasts forever

We’ve switched services several times before and I’m expecting that we will change again in the future, for all sorts of hosting and related project offerings that we provide to the work and to the developers and participators within the project. Nothing lasts forever.

When a service we use goes down or just turns sour, we will figure out the best possible replacement and take the jump. Then we patch up all the cracks the jump may have caused and continue the race into the future. Onward and upward. The way we know and the way we’ve done for over twenty years already.

Credits

Image by Elias Sch. from Pixabay

Updates

After this blog post went live, some users remarked than I’m “disingenuous” in the list of reasons at the top, that people have presented to me. This, because I don’t mention the moral issues staying on GitHub present – like for example previously reported workplace conflicts and their association with hideous American immigration authorities.

This is rather the opposite of disingenuous. This is the truth. Not a single person have ever asked me to leave GitHub for those reasons. Not me personally, and nobody has asked it out to the wider project either.

These are good reasons to discuss and consider if a service should be used. Have there been violations of “decency” significant enough that should make us leave? Have we crossed that line in the sand? I’m leaning to “no” now, but I’m always listening to what curl users and developers say. Where do you think the line is drawn?

curl your own error message

The --write-out (or -w for short) curl command line option is a gem for shell script authors looking for more information from a curl transfer. Experienced users know that this option lets you extract things such as detailed timings, the response code, transfer speeds and sizes of various kinds. A while ago we even made it possible to output JSON.

Maybe the best resource to learn more about it, is the dedication section in Everything curl. You’ll like it!

Now even more versatile

In curl 7.75.0 (released on February 3, 2021) we introduce five new variables for this option, and I’ll elaborate on some of the fun new things you can do with these!

These new variables were invented when we received a bug report that pointed out that when a user transfers many URLs in parallel and one or some of them fail – the error message isn’t identifying exactly which of the URLs that failed. We should improve the error messages to fix this!

Or wait a minute. What if we provide enough details for --write-out to let the user customize the error message completely by themselves and thus get exactly the info they want?

onerror

Using this, you can specify a message only to get written if the transfer ends in error. That is a non-zero exit code. An example could look like this:

curl -w '%{onerror}failed\n' $URL -o saved -s

…. if the transfer is OK, it says nothing. If it fails, the text on the right side of the “onerror” variable gets output. And that text can of course contain other variables!

This command line uses -s for “silent” to make it inhibit the regular built-in error message.

url

To help craft a good error message, maybe you want the URL included that was used in the transfer?

curl -w '%{onerror}%{url} failed\n' $URL

urlnum

If you get more than one URL in the command line, it might be helpful to get the index number of the used URL. This is of course especially useful if you for example work with the same URL multiple times in the same command line and just one of them fails!

curl -w '%{onerror}URL %{urlnum} failed\n' $URL $URL

exitcode

The regular built-in curl error message shows the exit code, as it helps diagnose exactly what the problem was. Include that in the error message like:

curl -w '%{onerror}%{url} got %{exitcode}\n' $URL

errormsg

This is the human readable explanation for the problem. The error message. Mimic the default curl error message like this:

curl -w '%{onerror}curl: %{exitcode} %{errormsg}\n' $URL

stderr

We already provide this “variable” from before, which allows you to make sure the output message is sent to stderr instead of stdout, which then makes it even more like a real error message:

url -w '%{onerror}%{stderr}curl: %{exitcode} %{errormsg}\n' $URL

More

These new variables work fine after %{onerror}, but they also of course work just as fine to output even when there was no error, and they work perfectly fine whether you use -Z for parallel transfers or doing them serially, one after the other.

More on less curl memory

tldr: curl uses 30K of dynamic memory for downloading a large HTTP file, plus the size of the download buffer.

Back in September 2020 I wrote about my work to trim curl allocations done for FTP transfers. Now I’m back again on the memory use in curl topic, from a different angle.

This time, I learned about the awesome tool pahole, which can (among other things) show structs and their sizes from a built library – and when embracing this fun toy, I ran some scripts on a range of historic curl releases to get a sense of how we’re doing over time – memory size and memory allocations wise.

The task I set out to myself was: figure out how the sizes of key structs in curl have changed over time, and correlate that with the number and size of allocations done at run-time. To make sure that trimming down the size of a specific struct doesn’t just get allocated by another one instead, thus nullifying the gain. I want to make sure we’re not slowly degrading – and if we do, we should at least know about it!

Also: we keep developing curl at a fairly good pace and we’re adding features in almost every release. Some growth is to be expected and should be tolerated I think. We also keep the build process very configurable so users with particular needs and requirements can switch off features and thus also gain memory.

Memory sizes in modern computing

Of course systems are growing every year and machines ship with more and more ram, which also goes for the smallest machines. But there are still a vast amount of systems out there with limited memory capabilities that want good Internet transfers as well. Also, by keeping sizes down, it allows applications and systems to scale better: a 10% decrease in size can imply a 10% increase in number of possible parallel transfers. curl, and especially libcurl, is still today in 2021 frequently used on machines with limited amounts of available memory. Sometimes in the few megabytes of ram range.

Fixed configuration

In my tests I did for this I used the exact same configuration and build config for all versions tested. The sizes and behavior will vary greatly depending on config, but I tried to use a fairly complete and typical build to see how code and memory use is for “most” users. I ran everything on my x86_64 Debian Linux dev machine. My focus is on curl versions from the last 3-4 years. I figured going back to ancient times won’t help here.

Key structs

struct Curl_easy – this is the “easy handle”, what is allocated by curl_easy_init() and is the anchor for every transfer done with libcurl, no matter which API you’re using. An application creates one of these for each concurrent transfer it wants to do or keep around. Some applications allocate hundreds or even thousands of these.

struct Curl_multi – this is the “multi handle”, allocated with curl_multi_init(). This handle is created by applications as a holder of many concurrent transfers so applications typically do not have a very large amount of these.

struct connectdata – this is an internal struct that isn’t visible externally to applications. It is the holder of connection related data for a connection to a specific server. The connection pool curl uses to handle persistent connections will hold a number of these structs in memory after the transfer has completed, to allow subsequent reuse. The size of the connection pool is customizable. A busy application doing lots of transfers might end up with a sizeable number of connections in the pool, so the size of this struct adds up.

Dynamic allocations

In early curl history, the download and upload buffers for transfers were part of the Curl_easy struct, which made it fairly large.

In curl 7.53.0 (February 2017) the download buffer was turned dynamically sized and is since then allocated separately. Before that transition, curl 7.52.0 had a Curl_easy struct that was 36584 bytes, which included both the download and the upload buffers. In 7.58.0 the size was down to 21264 bytes since the download buffer was then allocated separately and was then also allowed to be done much larger than the previously set 16KB fixed size.

The 16KB upload buffer was moved out of the Curl_easy handle in the 7.62.0 release (October 2018) to be done on demand – which of course especially benefits everyone who doesn’t do uploads at all… The size of this struct was then down to 6208 bytes.

In curl 7.71.0 we also made the download buffer allocated on demand, and immediately freed after the transfer completes. This makes applications that keep handles around for reuse use significantly less memory. Applications are generally encouraged to keep the handles around to better facilitate connection reuse.

Struct size development

Curl_easy

The size in bytes of struct Curl_easy the last few years:

 7.52.0 36584
 7.58.0 21264
 7.61.0 21344
 7.62.0 6208
 7.63.0 6216
 7.64.0 6312
 7.65.0 5976
 7.66.0 6024
 7.67.0 6040
 7.68.0 6040
 7.69.0 6040
 7.70.0 6080
 7.71.0 6448
 7.72.0 6472
 7.73.0 6464
 7.74.0 6512

Current git: 5272 bytes (-19% from last release). With this, the struct is smaller than it has ever been before.

How we made this extra reduction? Primarily I noticed how we had a DoH related struct in the handle by default, which was turned into on-demand allocation. DoH is still rare and that data only needs to be allocated during the name resolving phase.

Curl_multi

The size in bytes of struct Curl_multi the last few years has remained very stable and it keeps being very small. Notable is that when we removed pipelining support in 7.65.0 it took away 96 bytes from this struct.

 7.50.0 384
 7.52.0 384
 7.58.0 480
 7.59.0 488
 7.60.0 488
 7.61.0 512
 7.62.0 512
 7.63.0 512
 7.64.0 512
 7.65.0 416
 7.66.0 416
 7.67.0 424
 7.68.0 432
 7.69.0 416
 7.70.0 416
 7.71.0 416
 7.72.0 416
 7.73.0 416
 7.74.0 416

Current git: 416 bytes.

With this, we’re smaller than we were in the beginning of 2018.

connectdata

The size in bytes of struct connectdata. It’s been both up and down.

 7.50.0 1904 
 7.52.0 2104 
 7.58.0 2112 
 7.59.0 2112 
 7.60.0 2112 
 7.61.0 2128 
 7.62.0 2152 
 7.63.0 2160 
 7.64.0 2160 
 7.65.0 1944 
 7.66.0 1960 
 7.67.0 1976 
 7.68.0 1976 
 7.69.0 2600 
 7.70.0 2608 
 7.71.0 2608 
 7.72.0 2624 
 7.73.0 2640 
 7.74.0 2656

Current git: 1472 bytes (-44% from last release)

The size bump in 7.69.0 was the insertion of a new struct for state data when doing SOCKS connections non-blocking, and the corresponding decrease again for the pending release is the removal of the buffer from that struct. With this, we’re down to a size we had a very long time ago.

Run-time memory use

To make sure that we don’t just move memory to other on-demand buffers that we need to allocate anyway, I ran a script with a lot of curl versions and counted the number of allocations needed and the peak amount of memory allocated. For a plain 512MB download over HTTP from localhost. The counted allocations were only the ones done by curl code (malloc, calloc, realloc, strdup etc).

Number of allocations

There are many reasons to allocate memory and while we want to keep the number down, lots of factors of course needs to be taken into account.

In the list below you’ll see that clearly we had some mistake in 7.52.0 and perhaps some more versions, as it did over 32,000 allocations. The situation was fixed in or before 7.58.0 and I haven’t bothered to go back to check exactly what it was.

 7.52.0 32883 
 7.58.0 82 
 7.59.0 82 
 7.60.0 82 
 7.61.0 82 
 7.62.0 86 
 7.63.0 87 
 7.64.0 87 
 7.65.0 82
 7.66.0 101 
 7.67.0 107 
 7.68.0 111 
 7.69.0 113 
 7.70.0 113 
 7.71.0 99 
 7.72.0 99 
 7.73.0 96 
 7.74.0 96

Current git: 96 allocations.

We do more allocations than some years back, but I think it is still within a reasonable growth.

Peak memory allocations

Here’s some developments to look closer at!

If we start out looking at the oldest versions in my test, we can see that they’re sub 100KB allocated – but we need to take into account the fact that back then we used a fixed 16KB download buffer. In curl 7.54.1 we bumped the default buffer size the curl tool uses to 100K which in the table below is visible in the 7.58.0 allocation.

 7.50.0 84473 
 7.52.0 85329 
 7.58.0 174243 
 7.59.0 174315 
 7.60.0 174339 
 7.61.0 174531 
 7.62.0 143886 
 7.63.0 143928 
 7.64.0 144128 
 7.65.0 143152 
 7.66.0 168188 
 7.67.0 173365 
 7.68.0 168575
 7.69.0 169167
 7.70.0 169303 
 7.71.0 136573
 7.72.0 136765 
 7.73.0 136875 
 7.74.0 137043

Current git: 131680 bytes.

The gain in 7.62.0 was mostly the removal of the default allocation of the upload buffer, which isn’t used in this test…

The current size tells me several things. We’re at a memory consumption level that is probably at its lowest point in the last decade – while at the same time having more features and being better than ever before. If we deduct the download buffer we have 29280 additional bytes allocated. Compare this to 7.50.0 which allocated 68089 bytes on top of the download buffer!

If I change my curl to use the smallest download buffer size allowed by libcurl (1KB) instead of the default 100KB, it ends up peaking at: 30304 bytes. That’s 44% of the memory needed by 7.50.0.

In my opinion, this is very good.

It might also be worth to reiterate that this is with a full featured libcurl build. We can shrink even further if we switch off undesired features or just go tiny-curl.

I hope this goes without saying, but of course all of this work has been done with the API and ABI still intact.

Graphs?

You know I like graphs, but for now I decided this blog post and analysis was enough. I’m going to think about how we can perhaps get this info somehow floated on a more regular and automated way in the future. Not sure it is worth spending a lot of effort on though.

Reproduce

  1. build curl with the --enable-debug option to configure. Don’t use the threaded resolver – I use the c-ares one, because it otherwise breaks the memdebug system.
  2. Run your command line with tracing enabled and then run memanalyze on the log:
#!/bin/sh
export CURL_MEMDEBUG=/tmp/curlmem.log
./src/curl -v localhost/512M -o /dev/null
./tests/memanalyze.pl -v /tmp/curlmem.log

To get the struct sizes, just run pahole on the static libcurl lib after the build:

pahole -s lib/.libs/libcurl.a > sizes.txt

Credits

The photo was taken by me, in Siem Reap, Cambodia. “A smaller transport”

Updates

After the initial posting of this article I optimized the structs even further so the numbers have been updated since then to reflect the state of what’s in git a week later.