Category Archives: cURL and libcurl

curl and/or libcurl related

More on less curl memory

tldr: curl uses 30K of dynamic memory for downloading a large HTTP file, plus the size of the download buffer.

Back in September 2020 I wrote about my work to trim curl allocations done for FTP transfers. Now I’m back again on the memory use in curl topic, from a different angle.

This time, I learned about the awesome tool pahole, which can (among other things) show structs and their sizes from a built library – and when embracing this fun toy, I ran some scripts on a range of historic curl releases to get a sense of how we’re doing over time – memory size and memory allocations wise.

The task I set out to myself was: figure out how the sizes of key structs in curl have changed over time, and correlate that with the number and size of allocations done at run-time. To make sure that trimming down the size of a specific struct doesn’t just get allocated by another one instead, thus nullifying the gain. I want to make sure we’re not slowly degrading – and if we do, we should at least know about it!

Also: we keep developing curl at a fairly good pace and we’re adding features in almost every release. Some growth is to be expected and should be tolerated I think. We also keep the build process very configurable so users with particular needs and requirements can switch off features and thus also gain memory.

Memory sizes in modern computing

Of course systems are growing every year and machines ship with more and more ram, which also goes for the smallest machines. But there are still a vast amount of systems out there with limited memory capabilities that want good Internet transfers as well. Also, by keeping sizes down, it allows applications and systems to scale better: a 10% decrease in size can imply a 10% increase in number of possible parallel transfers. curl, and especially libcurl, is still today in 2021 frequently used on machines with limited amounts of available memory. Sometimes in the few megabytes of ram range.

Fixed configuration

In my tests I did for this I used the exact same configuration and build config for all versions tested. The sizes and behavior will vary greatly depending on config, but I tried to use a fairly complete and typical build to see how code and memory use is for “most” users. I ran everything on my x86_64 Debian Linux dev machine. My focus is on curl versions from the last 3-4 years. I figured going back to ancient times won’t help here.

Key structs

struct Curl_easy – this is the “easy handle”, what is allocated by curl_easy_init() and is the anchor for every transfer done with libcurl, no matter which API you’re using. An application creates one of these for each concurrent transfer it wants to do or keep around. Some applications allocate hundreds or even thousands of these.

struct Curl_multi – this is the “multi handle”, allocated with curl_multi_init(). This handle is created by applications as a holder of many concurrent transfers so applications typically do not have a very large amount of these.

struct connectdata – this is an internal struct that isn’t visible externally to applications. It is the holder of connection related data for a connection to a specific server. The connection pool curl uses to handle persistent connections will hold a number of these structs in memory after the transfer has completed, to allow subsequent reuse. The size of the connection pool is customizable. A busy application doing lots of transfers might end up with a sizeable number of connections in the pool, so the size of this struct adds up.

Dynamic allocations

In early curl history, the download and upload buffers for transfers were part of the Curl_easy struct, which made it fairly large.

In curl 7.53.0 (February 2017) the download buffer was turned dynamically sized and is since then allocated separately. Before that transition, curl 7.52.0 had a Curl_easy struct that was 36584 bytes, which included both the download and the upload buffers. In 7.58.0 the size was down to 21264 bytes since the download buffer was then allocated separately and was then also allowed to be done much larger than the previously set 16KB fixed size.

The 16KB upload buffer was moved out of the Curl_easy handle in the 7.62.0 release (October 2018) to be done on demand – which of course especially benefits everyone who doesn’t do uploads at all… The size of this struct was then down to 6208 bytes.

In curl 7.71.0 we also made the download buffer allocated on demand, and immediately freed after the transfer completes. This makes applications that keep handles around for reuse use significantly less memory. Applications are generally encouraged to keep the handles around to better facilitate connection reuse.

Struct size development

Curl_easy

The size in bytes of struct Curl_easy the last few years:

 7.52.0 36584
 7.58.0 21264
 7.61.0 21344
 7.62.0 6208
 7.63.0 6216
 7.64.0 6312
 7.65.0 5976
 7.66.0 6024
 7.67.0 6040
 7.68.0 6040
 7.69.0 6040
 7.70.0 6080
 7.71.0 6448
 7.72.0 6472
 7.73.0 6464
 7.74.0 6512

Current git: 5272 bytes (-19% from last release). With this, the struct is smaller than it has ever been before.

How we made this extra reduction? Primarily I noticed how we had a DoH related struct in the handle by default, which was turned into on-demand allocation. DoH is still rare and that data only needs to be allocated during the name resolving phase.

Curl_multi

The size in bytes of struct Curl_multi the last few years has remained very stable and it keeps being very small. Notable is that when we removed pipelining support in 7.65.0 it took away 96 bytes from this struct.

 7.50.0 384
 7.52.0 384
 7.58.0 480
 7.59.0 488
 7.60.0 488
 7.61.0 512
 7.62.0 512
 7.63.0 512
 7.64.0 512
 7.65.0 416
 7.66.0 416
 7.67.0 424
 7.68.0 432
 7.69.0 416
 7.70.0 416
 7.71.0 416
 7.72.0 416
 7.73.0 416
 7.74.0 416

Current git: 416 bytes.

With this, we’re smaller than we were in the beginning of 2018.

connectdata

The size in bytes of struct connectdata. It’s been both up and down.

 7.50.0 1904 
 7.52.0 2104 
 7.58.0 2112 
 7.59.0 2112 
 7.60.0 2112 
 7.61.0 2128 
 7.62.0 2152 
 7.63.0 2160 
 7.64.0 2160 
 7.65.0 1944 
 7.66.0 1960 
 7.67.0 1976 
 7.68.0 1976 
 7.69.0 2600 
 7.70.0 2608 
 7.71.0 2608 
 7.72.0 2624 
 7.73.0 2640 
 7.74.0 2656

Current git: 1472 bytes (-44% from last release)

The size bump in 7.69.0 was the insertion of a new struct for state data when doing SOCKS connections non-blocking, and the corresponding decrease again for the pending release is the removal of the buffer from that struct. With this, we’re down to a size we had a very long time ago.

Run-time memory use

To make sure that we don’t just move memory to other on-demand buffers that we need to allocate anyway, I ran a script with a lot of curl versions and counted the number of allocations needed and the peak amount of memory allocated. For a plain 512MB download over HTTP from localhost. The counted allocations were only the ones done by curl code (malloc, calloc, realloc, strdup etc).

Number of allocations

There are many reasons to allocate memory and while we want to keep the number down, lots of factors of course needs to be taken into account.

In the list below you’ll see that clearly we had some mistake in 7.52.0 and perhaps some more versions, as it did over 32,000 allocations. The situation was fixed in or before 7.58.0 and I haven’t bothered to go back to check exactly what it was.

 7.52.0 32883 
 7.58.0 82 
 7.59.0 82 
 7.60.0 82 
 7.61.0 82 
 7.62.0 86 
 7.63.0 87 
 7.64.0 87 
 7.65.0 82
 7.66.0 101 
 7.67.0 107 
 7.68.0 111 
 7.69.0 113 
 7.70.0 113 
 7.71.0 99 
 7.72.0 99 
 7.73.0 96 
 7.74.0 96

Current git: 96 allocations.

We do more allocations than some years back, but I think it is still within a reasonable growth.

Peak memory allocations

Here’s some developments to look closer at!

If we start out looking at the oldest versions in my test, we can see that they’re sub 100KB allocated – but we need to take into account the fact that back then we used a fixed 16KB download buffer. In curl 7.54.1 we bumped the default buffer size the curl tool uses to 100K which in the table below is visible in the 7.58.0 allocation.

 7.50.0 84473 
 7.52.0 85329 
 7.58.0 174243 
 7.59.0 174315 
 7.60.0 174339 
 7.61.0 174531 
 7.62.0 143886 
 7.63.0 143928 
 7.64.0 144128 
 7.65.0 143152 
 7.66.0 168188 
 7.67.0 173365 
 7.68.0 168575
 7.69.0 169167
 7.70.0 169303 
 7.71.0 136573
 7.72.0 136765 
 7.73.0 136875 
 7.74.0 137043

Current git: 131680 bytes.

The gain in 7.62.0 was mostly the removal of the default allocation of the upload buffer, which isn’t used in this test…

The current size tells me several things. We’re at a memory consumption level that is probably at its lowest point in the last decade – while at the same time having more features and being better than ever before. If we deduct the download buffer we have 29280 additional bytes allocated. Compare this to 7.50.0 which allocated 68089 bytes on top of the download buffer!

If I change my curl to use the smallest download buffer size allowed by libcurl (1KB) instead of the default 100KB, it ends up peaking at: 30304 bytes. That’s 44% of the memory needed by 7.50.0.

In my opinion, this is very good.

It might also be worth to reiterate that this is with a full featured libcurl build. We can shrink even further if we switch off undesired features or just go tiny-curl.

I hope this goes without saying, but of course all of this work has been done with the API and ABI still intact.

Graphs?

You know I like graphs, but for now I decided this blog post and analysis was enough. I’m going to think about how we can perhaps get this info somehow floated on a more regular and automated way in the future. Not sure it is worth spending a lot of effort on though.

Reproduce

  1. build curl with the --enable-debug option to configure. Don’t use the threaded resolver – I use the c-ares one, because it otherwise breaks the memdebug system.
  2. Run your command line with tracing enabled and then run memanalyze on the log:
#!/bin/sh
export CURL_MEMDEBUG=/tmp/curlmem.log
./src/curl -v localhost/512M -o /dev/null
./tests/memanalyze.pl -v /tmp/curlmem.log

To get the struct sizes, just run pahole on the static libcurl lib after the build:

pahole -s lib/.libs/libcurl.a > sizes.txt

Credits

The photo was taken by me, in Siem Reap, Cambodia. “A smaller transport”

Updates

After the initial posting of this article I optimized the structs even further so the numbers have been updated since then to reflect the state of what’s in git a week later.

everything.curl.dev

The online version of the curl book “everything curl” has been moved to the address shown in the title:

everything.curl.dev

This, after I did a very unscientific and highly self-selective poll on twitter on January 18 2020

The old name (which you can see what the least selected in the poll) will now redirect to the new host name and so will everything.curl.se .

curl.dev

I am the owner of this domain since a little while back but we haven’t yet figured out what to do with the domain – this is the first use of curl.dev for real content.

If you have ideas of how we can improve curl’s web presence with this domain, please let me know! I do not want to move the official curl web site again from its new home at curl.se, that’s not what I would call a productive idea.

Food on the table while giving away code

I founded the curl project early 1998 but had already then been working on the code since November 1996. The source code was always open, free and available to the world. The term “open source” actually wasn’t even coined until early 1998, just weeks before curl was born.

In the beginning of course, the first few years or so, this project wasn’t seen or discovered by many and just grew slowly and silently in a dusty corner of the Internet.

Already when I shipped the first versions I wanted the code to be open and freely available. For years I had seen the cool free software put out the in the world by others and I wanted my work to help build this communal treasure trove.

License

When I started this journey I didn’t really know what I wanted with curl’s license and exactly what rights and freedoms I wanted to give away and it took a few years and attempts before it landed.

The early versions were GPL licensed, but as I learned about resistance from proprietary companies and thought about it further, I changed the license to be more commercially friendly and to match my conviction better. I ended up with MIT after a brief experimental time using MPL. (It was easy to change the license back then because I owned all the copyrights at that point.)

To be exact: we actually have a slightly modified MIT license with some very subtle differences. The reason for the changes have been forgotten and we didn’t get those commits logged in the “big transition” to Sourceforge that we did in late 1999… The end result is that this is now often recognized as “the curl license”, even though it is in effect the MIT license.

The license says everyone can use the code for whatever purpose and nobody is required to ship any source code to anyone, but they cannot claim they wrote it themselves and the license/use of the code should be mentioned in documentation or another relevant location.

As licenses go, this has to be one of the most frictionless ones there is.

Copyright

Open source relies on a solid copyright law and the copyright owners of the code are the only ones who can license it away. For a long time I was the sole copyright owner in the project. But as I had decided to stick to the license, I saw no particular downsides with allowing code and contributors (of significant contributions) to retain their copyrights on the parts they brought. To not use that as a fence to make contributions harder.

Today, in early 2021, I count 1441 copyright strings in the curl source code git repository. 94.9% of them have my name.

I never liked how some projects require copyright assignments or license agreements etc to be able to submit code or patches. Partly because of the huge administrative burden it adds to the project, but also for the significant friction and barrier to entry they create for new contributors and the unbalance it creates; some get more rights than others. I’ve always worked on making it easy and smooth for newcomers to start contributing to curl. It doesn’t happen by accident.

Spare time

In many ways, running a spare time open source project is easy. You just need a steady income from a “real” job and sufficient spare time, and maybe a server to host stuff on for the online presence.

The challenge is of course to keep developing it, adding things people want, to help users with problems and to address issues timely. Especially if you happen to be lucky and the user amount increases and the project grows in popularity.

I ran curl as a spare time project for decades. Over the years it became more and more common that users who submitted bug reports or asked for help about things were actually doing that during their paid work hours because they used curl in a commercial surrounding – which sometimes made the situation almost absurd. The ones who actually got paid to work with curl were asking the unpaid developers to help them out.

I changed employers several times. I started my own company and worked as my own boss for a while. I worked for Mozilla on network stuff in Firefox for five years. But curl remained a spare time project because I couldn’t figure out how to turn it into a job without risking the project or my economy.

Earning a living

For many years it was a pipe dream for me to be able to work on curl as a real job. But how do I actually take the step from a spare time project to doing it full time? I give away all the code for free, and it is a solid and reliable product.

The initial seeds were planted when I met and got to know Larry (wolfSSL CEO) and some of the other good people at wolfSSL back in the early 2010s. This, because wolfSSL is a company that write open source libraries and offer commercial support for them – proving that it can work as a business model. Larry always told me he thought there was a possibility waiting here for me with curl.

Apart from the business angle, if I would be able to work more on curl it could really benefit the curl project, and then of course indirectly everyone who uses it.

It was still a step to take. When I gave up on Mozilla in 2018, it just took a little thinking before I decided to try it. I joined wolfSSL to work on curl full time. A dream came true and finally curl was not just something I did “on the side”. It only took 21 years from first curl release to reach that point…

I’m living the open source dream, working on the project I created myself.

Food for free code

We sell commercial support for curl and libcurl. Companies and users that need a helping hand or swift assistance with their problems can get it from us – and with me here I dare to claim that there’s no company anywhere else with the same ability. We can offload engineering teams with their curl issues. Up to 24/7 level!

We also offer custom curl development, debugging help, porting to new platforms and basically any other curl related activity you need. See more on the curl product page on the wolfSSL site.

curl (mostly in the shape of libcurl) runs in ten billion installations: some five, six billion mobile phones and tablets – used by several of the most downloaded apps in existence, in virtually every website and Internet server. In a billion computer games, a billion Windows machines, half a billion TVs, half a billion game consoles and in a few hundred million cars… curl has been made to run on 82 operating systems on 22 CPU architectures. Very few software components can claim a wider use.

“Isn’t it easier to list companies that are not using curl?”

Wide use and being recognized does not bring food on the table. curl is also totally free to download, build and use. It is very solid and stable. It performs well, is documented, well tested and “battle hardened”. It “just works” for most users.

Pay for support!

How to convince companies that they should get a curl support contract with me?

Paying customers get to influence what I work on next. Not only distant road-mapping but also how to prioritize short term bug-fixes etc. We have a guaranteed response-time.

You get your issues first in line to get fixed. Customers also won’t risk getting their issues added the known bugs document and put in the attic to be forgotten. We can help customers make sure their application use libcurl correctly and in the best possible way.

I try to emphasize that by getting support from us, customers can take away some of those tasks from their own engineers and because we are faster and better on curl related issues, that is a pure net gain economically. For all of us.

This is not an easy sell.

Sure, curl is used by thousands of companies everywhere, but most of them do it because it’s free (in all meanings of the word), functional and available. There’s a real challenge in identifying those that actually use it enough and value the functionality enough that they realize they want to improve their curl foo.

Most of our curl customers purchased support first when they faced a complicated issue or problem they couldn’t fix themselves – this fact gives me this weird (to the wider curl community) incentive to not fix some problems too fast, because it then makes it work against my ability to gain new customers!

We need paying customers for this to be sustainable. When wolfSSL has a sustainable curl business, I get paid and the work I do in curl benefits all the curl users; paying as well as non-paying.

Dual license

There’s clearly business in releasing open source under a strong copyleft license such as GPL, and as long as you keep the copyrights, offer customers to purchase that same code under another more proprietary- friendly license. The code is still open source and anyone doing totally open things can still use it freely and at no cost.

We’ve shipped tiny-curl to the world licensed under GPLv3. Tiny-curl is a curl branch with a strong focus on the tiny part: the idea is to provide a libcurl more suitable for smaller systems, the ones that can’t even run a full Linux but rather use an RTOS.

Consider it a sort of experiment. Are users interested in getting a smaller curl onto their products and are they interested in paying for licensing. So far, tiny-curl supports two separate RTOSes for which we haven’t ported the “normal” curl to.

Keeping things separate

Maybe you don’t realize this, but I work hard to keep separate things compartmentalized. I am not curl, curl is not wolfSSL and wolfSSL is not me. But we all overlap greatly!

The Daniel + curl + wolfSSL trinity

I work for wolfSSL. I work on curl. wolfSSL offers commercial curl support.

Reserved features

One idea that we haven’t explored much yet is the ability to make and offer “reserved features” to paying customers only. This of course as another motivation for companies to become curl support customers.

Such reserved features would still have to be sensible for the curl project and most likely we would provide them as specials for paying customers for a period of time and then merge them into the “real” open source curl project. It is very important to note that this will not in any way make the “regular curl” worse or a lesser citizen in any way. It would rather be a like a separate product, a curl+ with extra stuff on top of vanilla curl.

Since we haven’t ventured into this area yet, we haven’t worked out all the details. Chances are we will wander into this territory soon.

Other food-generators

I do occasional speaking gigs on curl and HTTP related topics but even if I charge for them this activity never brings much more than some extra pocket money. I do it because it’s fun and educational.

It has been suggested that I should create a web shop to sell curl branded merchandise in, like t-shirts, mugs, etc but I think that grossly over-estimates the user interest and how much margin I could put on mundane things just because they’d have a curl logo glued on them. Also, I would have a difficult time mentally to sell curl things and claim the profit personally. I rather keep giving away curl stash (mostly stickers) for free as a means to market the project and long term encourage users into buying support.

Donations

We receive money to the curl project through donations, most of them via our opencollective account. It is important to note that even if I’m a key figure in the project, this is not my money and it’s not my project. Donated money is spent on project related expenses, which so far primarily is our bug bounty program. We’ve avoided to spend donated money on direct curl development, and especially such that I could provide or benefit from myself, as that would totally blur the boundaries. I’m not ruling out taking that route in a future though. As long as and only if it is to the project’s benefit.

Donations via GitHub to me personally sponsors me personally and ends up in my pockets. That’s not curl money but I spend it mostly on curl development, equipment etc and it makes me able to not have to think twice when sending curl stickers to fans and friends all over the world. It contributes to food on my table and I like to think that an occasional beer I drink is sponsored by friends out there!

The future I dream of

We get a steady number of companies paying for support at a level that allows us to also pay for a few more curl engineers than myself.

Credits

Image by Khusen Rustamov from Pixabay

Age is just a number or two

Kjell, a friend of mine, mailed me a zip file this morning saying he’d found an earlier version of “urlget” lying around. Meaning: an older version than what we provide on the curl download page. urlget was the name we used for the command line tool before we changed the name to curl in March 1998.

I’ve been reckless with some of the source code and keeping track of early history so this made me curios and when I glanced through the source code for urlget 2.4, shipped in October 1997. Kjell had found a project of his own where he’d imported the urlget sources as that was from before the days curl was also a library.

In this source code I also found the original URL to the home page for urlget and its predecessor, httpget: http://www.inf.ufrgs.br/~sagula/urlget.html

I don’t know if I have this info stored somewhere else too, but the important thing here is that it then struck me that I hadn’t checked the Internet Archive for what it has archived for this URL!

The earliest archived version of the urlget page is from February 16 1998. I checked, but there’s no archived version from slightly earlier when the tool was named just httpget. I did however find source code from httpget that was older than I had saved from before: httpget 1.3! 320 lines of hopelessly naive code. From April 14, 1997.

That date is also the only one that has content as the next archived one is just a redirect to the first curl web site over at http://www.fts.frontec.se/~dast/curl/

Two birthdays?

I used this newly found gem to update the curl history page with exact dates for some of the earliest releases and events, as it was previously not very specific there as I hadn’t kept notes.

I could now also once and for all note that the first release of HttpGet (version 0.1) was done on November 11, 1996. My personal participation in the project began at some days/weeks after that, as it is recorded that I provided improvements in the HttpGet 0.2 release that was done on December 17 the same year.

I’ve always counted the age of curl from March 20, 1998 which is when I first released something under the name “curl”, but since we released it as curl 4.0 that is certainly a sign that the time up to that point could possibly also be counted into its age.

It’s not terribly important of when to start the count.

What’s more fun with the particular HttpGet 0.1 release date, is that it is the exact same date Wget was released the first time under that name! It had previously been developed and released under a different name (“geturl”) and exactly on November 11, 1996 Hrvoje Nikši? released Wget 1.4.0 to the world.

Why not go with Wget?

People sometimes ask me why I didn’t use wget to get currencies that winter day back in 1996 when I found HttpGet and started to work on that HTTP client, but the fact is then that not only was the search engines and software hosting alternatives clearly inferior back in those days so finding software could be difficult, wget was also very new to the world. I didn’t learn about the existence of wget until many months later – although I can’t recall exactly when or how.

I also think, looking back at myself in that time, that if I would’ve found wget then, I would probably have thought it to be overkill for my use case and opted to use something else anyway. I mean, I was “just getting a HTTP page” and the wget package was 171KB compressed, while HttpGet 1.3 was still less than 8K in a single source code file… I’m not saying that way of thinking was right!

The curl year 2020

As we’re approaching the end of the year, I just want to sum up the curl year with a few words.

2020 has been another glorious year in the curl project. We’ve seen a series of accomplishments and introductions of new things during this the year of the plague.

Accomplishments

I personally have done more commits in the git repository since any year after 2004 (890 so far).

The total number of commits done in git is the largest since 2014 (1445 plus some).

The number of published curl related CVEs is the lowest since 2013 (6). For the ones we announced, we could reward record amounts in our bug bounty program!

139 authors wrote commits that were merged (so far).

We did nine curl releases, out of which two unfortunately were quicker “panic releases” that patched up problems in the previous release.

Seven changes to remember

We’ve logged no less than 905 bug-fixes and 30 changes in the releases of this year, but the seven perhaps most memorable things we’ve introduced in 2020 are…

Videos

This year I’ve introduced the concept of doing a “release presentation” for every release. Those are videos where I go through and discuss the changes, the security releases and some interesting bug-fixes. Each release links to those from the changelog page on the website.

New home

This is the year when we finally got ourselves a curl domain. curl.se is our new home.

What didn’t happen

We cancelled curl up 2020 due to Covid-19. It was planned to happen in Berlin. We did it purely online instead. We’re not planning any new physical curl up for 2021 either. Let’s just wait and see what happens with the pandemic next year and hope that we might be able to go back and have a physical meetup in 2022…

It is a curl world

curl supports NASA

Not everyone understands how open source is made. I received the following email from NASA a while ago.

Subject: Curl Country of Origin and NDAA Compliance

Hello, my name is [deleted] and I am a Supply Chain Risk Management Analyst at NASA. As such, I ensure that all NASA acquisitions of Covered Articles comply with Section 208 of the Further Consolidated Appropriations Act, 2020, Public Law 116-94, enacted December 20, 2019. To do so, the Country of Origin (CoO) information must be obtained from the company that develops, produces, manufactures, or assembles the product(s). To do so, please provide an email response or a formal document (a PDF on company letterhead is preferred, but a simple statement is sufficient) specifically identifying the country, or countries, in which Curl is developed and maintained

If the country of origin is outside the United States, please provide any information you may have stating that testing is performed in the United States prior to supplying products to customers. Additionally, if available, please identify all authorized resellers of the product in question.

Lastly, please confirm that Curl is not developed by, contain components developed by, or receive substantial influence from entities prohibited by Section 889 of the 2019 NDAA. These entities include the following companies and any of their subsidiaries or affiliates:

Hytera Communications Corporation
Huawei Technologies Company
ZTE Corporation
Dahua Technology Company
Hangzhou Hikvision Digital Technology Company

Finally, we have a time frame of 5 days for a response.
Thank you,

My answer

Okay, I first considered going with strong sarcasm in my reply due to the complete lack of understanding, and the implied threat in that last line. What would happen if I wouldn’t respond in time?

Then it struck me that this could be my chance to once and for all get a confirmation if curl is already actually used in space or not. So I went with informative and a friendly tone.

Hi [name],

I will answer to these questions below to the best of my ability, and maybe you can answer something for me?

curl (https://curl.se) is an open source project that creates two products, curl the command line tool and libcurl the library. I am the founder, lead developer and core maintainer of the project. To this date, I have done about 57% of the 26,000 changes in the source code repository. The remaining 43% have been done by 841 different volunteers and contributors from all over the world. Their names can be extracted from our git repository: https://github.com/curl/curl

You can also see that I own most, but not all, copyrights in the project.

I am a citizen of Sweden and I’ve been a citizen of Sweden during the entire time I’ve done all and any work on curl. The remaining 841 co-authors are from all over the world, but primarily from western European countries and the US. You could probably say that we live primarily “on the Internet” and not in any particular country.

We don’t have resellers. I work for an American company (wolfSSL) where we do curl support for customers world-wide.

Our testing is done universally and is not bound to any specific country or region. We test our code substantially before release.

Me knowingly, we do not have any components or code authored by people at any of the mentioned companies.

So finally my question: can you tell me anything about where or for what you use curl? Is it used in anything in space?

Regards,
Daniel

Used in space?

Of course my attempt was completely in vain and the answer back was very brief and it just said…

“We are using curl to support NASA’s mission and vision.”

Credits

Space ship image by Elias Sch. from Pixabay

the critical curl

Google has, as part of their involvement in the Open Source Security Foundation (OpnSSF), come up with a “Criticality Score” for open source projects.

It is a score between 0 (least critical) and 1 (most critical)

The input variables are:

  • time since project creation
  • time since last update
  • number of committers
  • number or organizations among the top committers
  • number of commits per week the last year
  • number of releases the last year
  • number of closed issues the last 90 days
  • number of updated issues the last 90 days
  • average number of comments per issue the last 90 days
  • number of project mentions in the commit messages

The best way to figure out exactly how to calculate the score based on these variables is to check out their github page.

The top-10 C based projects

The project has run the numbers on projects hosted on GitHub (which admittedly seriously limits the results) and they host these generated lists of the 200 most critical projects written in various languages.

Checking out the top list for C based projects, we can see the top 10 projects with the highest criticality scores being:

  1. git
  2. Linux (raspberry pi)
  3. Linux (torvald version)
  4. PHP
  5. OpenSSL
  6. systemd
  7. curl
  8. u-boot
  9. qemu
  10. mbed-os

What now then?

After having created the scoring system and generated lists, step 3 is said to be “Use this data to proactively improve the security posture of these critical projects.“.

Now I think we have a pretty strong effort on security already in curl and Google helped us strengthen it even more recently, but I figure we can never have too much help or focus on improving our project.

Credits

Image by Thaliesin from Pixabay

curl 7.74.0 with HSTS

Welcome to another curl release, 56 days since the previous one.

Release presentation

Numbers

the 196th release
1 change
56 days (total: 8,301)

107 bug fixes (total: 6,569)
167 commits (total: 26,484)
0 new public libcurl function (total: 85)
6 new curl_easy_setopt() option (total: 284)

1 new curl command line option (total: 235)
46 contributors, 22 new (total: 2,292)
22 authors, 8 new (total: 843)
3 security fixes (total: 98)
1,600 USD paid in Bug Bounties (total: 4,400 USD)

Security

This time around we have no less than three vulnerabilities fixed and as shown above we’ve paid 1,600 USD in reward money this time, out of which the reporter of the CVE-2020-8286 issue got the new record amount 900 USD. The second one didn’t get any reward simply because it was not claimed. In this single release we doubled the number of vulnerabilities we’ve published this year!

The six announced CVEs during 2020 still means this has been a better year than each of the six previous years (2014-2019) and we have to go all the way back to 2013 to find a year with fewer CVEs reported.

I’m very happy and proud that we as an small independent open source project can reward these skilled security researchers like this. Much thanks to our generous sponsors of course.

CVE-2020-8284: trusting FTP PASV responses

When curl performs a passive FTP transfer, it first tries the EPSV command and if that is not supported, it falls back to using PASV. Passive mode is what curl uses by default.

A server response to a PASV command includes the (IPv4) address and port number for the client to connect back to in order to perform the actual data transfer.

This is how the FTP protocol is designed to work.

A malicious server can use the PASV response to trick curl into connecting back to a given IP address and port, and this way potentially make curl extract information about services that are otherwise private and not disclosed, for example doing port scanning and service banner extractions.

If curl operates on a URL provided by a user (which by all means is an unwise setup), a user can exploit that and pass in a URL to a malicious FTP server instance without needing any server breach to perform the attack.

There’s no really good solution or fix to this, as this is how FTP works, but starting in curl 7.74.0, curl will default to ignoring the IP address in the PASV response and instead just use the address it already uses for the control connection. In other words, we will enable the CURLOPT_FTP_SKIP_PASV_IP option by default! This will cause problems for some rare use cases (which then have to disable this), but we still think it’s worth doing.

CVE-2020-8285: FTP wildcard stack overflow

libcurl offers a wildcard matching functionality, which allows a callback (set with CURLOPT_CHUNK_BGN_FUNCTION) to return information back to libcurl on how to handle a specific entry in a directory when libcurl iterates over a list of all available entries.

When this callback returns CURL_CHUNK_BGN_FUNC_SKIP, to tell libcurl to not deal with that file, the internal function in libcurl then calls itself recursively to handle the next directory entry.

If there’s a sufficient amount of file entries and if the callback returns “skip” enough number of times, libcurl runs out of stack space. The exact amount will of course vary with platforms, compilers and other environmental factors.

The content of the remote directory is not kept on the stack, so it seems hard for the attacker to control exactly what data that overwrites the stack – however it remains a Denial-Of-Service vector as a malicious user who controls a server that a libcurl-using application works with under these premises can trigger a crash.

CVE-2020-8286: Inferior OCSP verification

libcurl offers “OCSP stapling” via the CURLOPT_SSL_VERIFYSTATUS option. When set, libcurl verifies the OCSP response that a server responds with as part of the TLS handshake. It then aborts the TLS negotiation if something is wrong with the response. The same feature can be enabled with --cert-status using the curl tool.

As part of the OCSP response verification, a client should verify that the response is indeed set out for the correct certificate. This step was not performed by libcurl when built or told to use OpenSSL as TLS backend.

This flaw would allow an attacker, who perhaps could have breached a TLS server, to provide a fraudulent OCSP response that would appear fine, instead of the real one. Like if the original certificate actually has been revoked.

Change

There’s really only one “change” this time, and it is an experimental one which means you need to enable it explicitly in the build to get to try it out. We discourage people from using this in production until we no longer consider it experimental but we will of course appreciate feedback on it and help to perfect it.

The change in this release introduces no less than 6 new easy setopts for the library and one command line option: support HTTP Strict-Transport-Security, also known as HSTS. This is a system for HTTPS hosts to tell clients to attempt to contact them over insecure methods (ie clear text HTTP).

One entry-point to the libcurl options for HSTS is the CURLOPT_HSTS_CTRL man page.

Bug-fixes

Yet another release with over one hundred bug-fixes accounted for. I’ve selected a few interesting ones that I decided to highlight below.

enable alt-svc in the build by default

We landed the code and support for alt-svc: headers in early 2019 marked as “experimental”. We feel the time has come for this little baby to grow up and step out into the real world so we removed the labeling and we made sure the support is enabled by default in builds (you can still disable it if you want).

8 cmake fixes bring cmake closer to autotools level

In curl 7.73.0 we removed the “scary warning” from the cmake build that warned users that the cmake build setup might be inferior. The goal was to get more people to use it, and then by extension help out to fix it. The trick might have worked and we’ve gotten several improvements to the cmake build in this cycle. More over, we’ve gotten a whole slew of new bug reports on it as well so now we have a list of known cmake issues in the KNOWN_BUGS document, ready for interested contributors to dig into!

configure now uses pkg-config to find openSSL when cross-compiling

Just one of those tiny weird things. At some point in the past someone had trouble building OpenSSL cross-compiled when pkg-config was used so it got disabled. I don’t recall the details. This time someone had the reversed problem so now the configure script was fixed again to properly use pkg-config even when cross-compiling…

curl.se is the new home

You know it.

curl: only warn not fail, if not finding the home dir

The curl tool attempts to find the user’s home dir, the user who invokes the command, in order to look for some files there. For example the .curlrc file. More importantly, when doing SSH related protocol it is somewhat important to find the file ~/.ssh/known_hosts. So important that the tool would abort if not found. Still, a command line can still work without that during various circumstances and in particular if -k is used so bailing out like that was nothing but wrong…

curl_easy_escape: limit output string length to 3 * max input

In general, libcurl enforces an internal string length limit that prevents any string to grow larger than 8MB. This is done to prevent mistakes or abuse. Due a mistake, the string length limit was enforced wrongly in the curl_easy_escape function which could make the limit a third of the intended size: 2.67 MB.

only set USE_RESOLVE_ON_IPS for Apple’s native resolver use

This define is set internally when the resolver function is used even when a plain IP address is given. On macOS for example, the resolver functions are used to do some conversions and thus this is necessary, while for other resolver libraries we avoid the resolver call when we can convert the IP number to binary internally more effectively.

By a mistake we had enabled this “call getaddrinfo() anyway”-logic even when curl was built to use c-ares on macOS.

fix memory leaks in GnuTLS backend

We used two functions to extract information from the server certificate that didn’t properly free the memory after use. We’ve filed subsequent bug reports in the GnuTLS project asking them to make the required steps much clearer in their documentation so that perhaps other projects can avoid the same mistake going forward.

libssh2: fix transport over HTTPS proxy

SFTP file transfers didn’t work correctly since previous fixes obviously weren’t thorough enough. This fix has been confirmed fine in use.

make curl –retry work for HTTP 408 responses too

Again. We made the --retry logic work for 408 once before, but for some inexplicable reasons the support for that was accidentally dropped when we introduced parallel transfer support in curl. Regression fixed!

use OPENSSL_init_ssl() with >= 1.1.0

Initializing the OpenSSL library the correct way is a task that sounds easy but always been a source for problems and misunderstandings and it has never been properly documented. It is a long and boring story that has been going on for a very long time. This time, we add yet another chapter to this novel when we start using this function call when OpenSSL 1.1.0 or later (or BoringSSL) is used in the build. Hopefully, this is one of the last chapters in this book.

“scheme-less URLs” not longer accept blank port number

curl operates on “URLs”, but as a special shortcut it also supports URLs without the scheme. For example just a plain host name. Such input isn’t at all by any standards an actual URL or URI; curl was made to handle such input to mimic how browsers work. curl “guesses” what scheme the given name is meant to have, and for most names it will go with HTTP.

Further, a URL can provide a specific port number using a colon and a port number following the host name, like “hostname:80” and the path then follows the port number: “hostname:80/path“. To complicate matters, the port number can be blank, and the path can start with more than one slash: “hostname://path“.

curl’s logic that determines if a given input string has a scheme present checks the first 40 bytes of the string for a :// sequence and if that is deemed not present, curl determines that this is a scheme-less host name.

This means [39-letter string]:// as input is treated as a URL with a scheme and a scheme that curl doesn’t know about and therefore is rejected as an input, while [40-letter string]:// is considered a host name with a blank port number field and a path that starts with double slash!

In 7.74.0 we remove that potentially confusing difference. If the URL is determined to not have a scheme, it will not be accepted if it also has a blank port number!

I am an 80 column purist

I write and prefer code that fits within 80 columns in curl and other projects – and there are reasons for it. I’m a little bored by the people who respond and say that they have 400 inch monitors already and they can use them.

I too have multiple large high resolution screens – but writing wide code is still a bad idea! So I decided I’ll write down my reasoning once and for all!

Narrower is easier to read

There’s a reason newspapers and magazines have used narrow texts for centuries and in fact even books aren’t using long lines. For most humans, it is simply easier on the eyes and brain to read texts that aren’t using really long lines. This has been known for a very long time.

Easy-to-read code is easier to follow and understand which leads to fewer bugs and faster debugging.

Side-by-side works better

I never run windows full sized on my screens for anything except watching movies. I frequently have two or more editor windows next to each other, sometimes also with one or two extra terminal/debugger windows next to those. To make this feasible and still have the code readable, it needs to fit “wrapless” in those windows.

Sometimes reading a code diff is easier side-by-side and then too it is important that the two can fit next to each other nicely.

Better diffs

Having code grow vertically rather than horizontally is beneficial for diff, git and other tools that work on changes to files. It reduces the risk of merge conflicts and it makes the merge conflicts that still happen easier to deal with.

It encourages shorter names

A side effect by strictly not allowing anything beyond column 80 is that it becomes really hard to use those terribly annoying 30+ letters java-style names on functions and identifiers. A function name, and especially local ones, should be short. Having long names make them really hard to read and makes it really hard to spot the difference between the other functions with similarly long names where just a sub-word within is changed.

I know especially Java people object to this as they’re trained in a different culture and say that a method name should rather include a lot of details of the functionality “to help the user”, but to me that’s a weak argument as all non-trivial functions will have more functionality than what can be expressed in the name and thus the user needs to know how the function works anyway.

I don’t mean 2-letter names. I mean long enough to make sense but not be ridiculous lengths. Usually within 15 letters or so.

Just a few spaces per indent level

To make this work, and yet allow a few indent levels, the code basically have to have small indent-levels, so I prefer to have it set to two spaces per level.

Many indent levels is wrong anyway

If you do a lot of indent levels it gets really hard to write code that still fits within the 80 column limit. That’s a subtle way to suggest that you should not write functions that needs or uses that many indent levels. It should then rather be split out into multiple smaller functions, where then each function won’t need that many levels!

Why exactly 80?

Once upon the time it was of course because terminals had that limit and these days the exact number 80 is not a must. I just happen to think that the limit has worked fine in the past and I haven’t found any compelling reason to change it since.

It also has to be a hard and fixed limit as if we allow a few places to go beyond the limit we end up on a slippery slope and code slowly grow wider over time – I’ve seen it happen in many projects with “soft enforcement” on code column limits.

Enforced by a tool

In curl, we have ‘checksrc’ which will yell errors at any user trying to build code with a too long line present. This is good because then we don’t have to “waste” human efforts to point this out to contributors who offer pull requests. The tool will point out such mistakes with ruthless accuracy.

Credits

Image by piotr kurpaska from Pixabay

The curl web infrastructure

The purpose of the curl web site is to inform the world about what curl and libcurl are and provide as much information as possible about the project, the products and everything related to that.

The web site has existed in some form for as long as the project has, but it has of course developed and changed over time.

Independent

The curl project is completely independent and stands free from influence from any parent or umbrella organization or company. It is not even a legal entity, just a bunch of random people cooperating over the Internet. And a bunch of awesome sponsors to help us.

This means that we have no one that provides the infrastructure or marketing for us. We need to provide, run and care for our own servers and anything else we think we should offer our users.

I still do a lot of the work in curl and the curl web site and I work full time on curl, for wolfSSL. This might of course “taint” my opinions and views on matters, but doesn’t imply ownership or control. I’m sure we’re all colored by where we work and where we are in our lives right now.

Contents

Most of the web site is static content: generated HTML pages. They are served super-fast and very lightweight by any web server software.

The web site source exists in the curl-www repository (hosted on GitHub) and the web site syncs itself with the latest repository changes several times per hour. The parts of the site that aren’t static are mostly consisting of smaller scripts that run either on demand at the time of a request or on an interval in a cronjob in the background. That is part of the reason why pushing an update to the web site’s repository can take a little while until it shows up on the live site.

There’s a deliberate effort at not duplicating information so a lot of the web pages you can find on the web site are files that are converted and “HTMLified” from the source code git repository.

“Design”

Some people say the curl web site is “retro”, others that it is plain ugly. My main focus with the site is to provide and offer all the info, and have it be accurate and accessible. The look and the design of the web site is a constant battle, as nobody who’s involved in editing or polishing the web site is really interested in or particularly good at design, looks or UX. I personally have done most of the editing of it, including CSS etc and I can tell you that I’m not good at it and I don’t enjoy it. I do it because I feel I have to.

I get occasional offers to “redesign” the web site, but the general problem is that those offers almost always involve rebuilding the entire thing using some current web framework, not just fixing the looks, layout or hierarchy. By replacing everything like that we’d get a lot of problems to get the existing information in there – and again, the information is more important than the looks.

The curl logo is designed by a proper designer however (Adrian Burcea).

If you want to help out designing and improving the web site, you’d be most welcome!

Who

I’ve already touched on it: the web site is mostly available in git so “anyone” can submit issues and pull-requests to improve it, and we are around twenty persons who have push rights that can then make a change on the live site. In reality of course we are not that many who work on the site any ordinary month, or even year. During the last twelve month period, 10 persons authored commits in the web repository and I did 90% of those.

How

Technically, we build the site with traditional makefiles and we generate the web contents mostly by preprocessing files using a C-like preprocessor called fcpp. This is an old and rather crude setup that we’ve used for over twenty years but it’s functional and it allows us to have a mostly static web site that is also fairly easy to build locally so that we can work out and check improvements before we push them to the git repository and then out to the world.

The web site is of course only available over HTTPS.

Hosting

The curl web site is hosted on an origin VPS server in Sweden. The machine is maintained by primarily by me and is paid for by Haxx. The exact hosting is not terribly important because users don’t really interact with our server directly… (Also, as they’re not sponsors we’re just ordinary customers so I won’t mention their name here.)

CDN

A few years ago we experienced repeated server outages simply because our own infrastructure did not handle the load very well, and in particular not the traffic spikes that could occur when I would post a blog post that would suddenly reach a wide audience.

Enter Fastly. Now, when you go to curl.se (or daniel.haxx.se) you don’t actually reach the origin server we admin, you will instead reach one of Fastly’s servers that are distributed across the world. They then fetch the web contents from our origin, cache it on their edge servers and send it to you when you browse the site. This way, your client speaks to a server that is likely (much) closer to you than the origin server is and you’ll get the content faster and experience a “snappier” web site. And our server only gets a tiny fraction of the load.

Technically, this is achieved by the name curl.se resolving to a number of IP addresses that are anycasted. Right now, that’s 4 IPv4 addresses and 4 IPv6 addresses.

The fact that the CDN servers cache content “a while” is another explanation to why updated contents take a little while to “take effect” for all visitors.

DNS

When we just recently switched the site over to curl.se, we also adjusted how we handle DNS.

I run our own main DNS server where I control and admin the zone and the contents of it. We then have four secondary servers to help us really up our reliability. Out of those four secondaries, three are sponsored by Kirei and are anycasted. They should be both fast and reliable for most of the world.

With the help of fabulous friends like Fastly and Kirei, we hope that the curl web site and services shall remain stable and available.

DNS enthusiasts have remarked that we don’t do DNSSEC or registry-lock on the curl.se domain. I think we have reason to consider and possibly remedy that going forward.

Traffic

The curl web site is just the home of our little open source project. Most users out there in the world who run and use curl or libcurl will not download it from us. Most curl users get their software installation from their Linux distribution or operating system provider. The git repository and all issues and pull-requests are done on GitHub.

Relevant here is that we have no logging and we run no ads or any analytics. We do this for maximum user privacy and partly because of laziness, since handling logging from the CDN system is work. Therefore, I only have aggregated statistics.

In this autumn of 2020, over a normal 30 day period, the web site serves almost 11 TB of data to 360 million HTTP requests. The traffic volume is up from 3.5 TB the same time last year. 11 terabytes per 30 days equals about 4 megabytes per second on average.

Without logs we cannot know what people are downloading – but we can guess! We know that the CA cert bundle is popular and we also know that in today’s world of containers and CI systems, a lot of things out there will download the same packages repeatedly. Otherwise the web site is mostly consisting of text and very small images.

One interesting specific pattern on the server load that’s been going on for months: every morning at 05:30 UTC, the site gets over 50,000 requests within that single minute, during which 10 gigabytes of data is downloaded. The clients are distributed world wide as I see the same pattern on access points all over. The minute before and the minute after, the average traffic rate remains at 200MB/minute. It makes for a fun graph:

An eight hour zoomed in window of bytes transferred from the web site. UTC times.

Our servers suffer somewhat from being the target of weird clients like qqgamehall that continuously “hammer” the site with requests at a high frequency many months after we started always returning error to them. An effect they have is that they make the admin dashboard to constantly show a very high error rate.

Software

The origin server runs Debian Linux and Apache httpd. It has a reverse proxy based on nginx. The DNS server is bind. The entire web site is built with free and open source. Primarily: fcpp, make, roffit, perl, curl, hypermail and enscript.

If you curl the curl site, you can see in response headers that Fastly uses Varnish.