Category Archives: cURL and libcurl

curl and/or libcurl related

Reducing mallocs for fun

September 24, 2020 Daniel Stenberg 2 Comments

Everyone needs something fun to do in their spare time. And digging deep into curl internals is mighty fun!

One of the things I do in curl every now and then is to run a few typical command lines and count how much memory is allocated and how many memory allocation calls that are made. This is good project hygiene and is a basic check that we didn’t accidentally slip in a malloc/free sequence in the transfer path or something.

We have extensive memory checks for leaks etc in the test suite so I’m not worried about that. Those things we detect and fix immediately, even when the leaks occur in error paths – thanks to our fancy “torture tests” that do error injections.

The amount of memory needed or number of mallocs used is more of a boiling frog problem. We add one now, then another months later and a third the following year. Each added malloc call is motivated within the scope of that particular change. But taken all together, does the pattern of memory use make sense? Can we make it better?

How?

Now this is easy because when we build curl debug enabled, we have a fancy logging system (we call it memdebug) that logs all calls to “fallible” system functions so after the test is completed we can just easily grep for them and count. It also logs the exact source code and line number.

cd tests
./runtests -n [number]
egrep -c 'alloc|strdup' log/memdump

Let’s start

Let me start out with a look at the history and how many allocations (calloc, malloc, realloc or strdup) we do to complete test 103. The reason I picked 103 is somewhat random, but I wanted to look at FTP and this test happens to do an “active” transfer of content and makes a total of 10 FTP commands in the process.

The reason I decided to take a closer look at FTP this time is because I fixed an issue in the main ftp source code file the other day and that made me remember the Curl_pp_send() function we have. It is the function that sends FTP commands (and IMAP, SMTP and POP3 commands too, the family of protocols we refer to as the “ping pong protocols” internally because of their command-response nature and that’s why it has “pp” in the name).

When I reviewed the function now with my malloc police hat on, I noticed how it made two calls to aprintf(). Our printf version that returns a freshly malloced area – which can even cause several reallocs in the worst case. But this meant at least two mallocs per issued command. That’s a bit unnecessary, isn’t it?

What about a few older versions

I picked a few random older versions, checked them out from git, built them and counted the number of allocs they did for test 103:

7.52.1: 141
7.68.0: 134
7.70.0: 137
7.72.0: 123

It’s been up but it has gone down too. Nothing alarming, Is that a good amount or a bad amount? We shall see…

Cleanup step one

The function gets printf style arguments and sends them to the server. The sent command also needs to append CRLF to the data. It was easy to make sure the CRLF appending wouldn’t need an extra malloc. That was just sloppy of us to have there in the first place. Instead of mallocing the new printf format string with CRLF appended, it could use one in a stack based buffer. I landed that as a first commit.

This trimmed off 10 mallocs for test 103.

Step two, bump it up a notch

The remaining malloc allocated the memory block for protocol content to send. It can be up to several kilobytes but is usually just a few bytes. It gets allocated in case it needs to be held on to if the entire thing cannot be sent off over the wire immediately. Remember, curl is non-blocking internally so it cannot just sit waiting for the data to get transferred.

I switched the malloc’ed buffer to instead use a ‘dynbuf’. That’s our internal “dynamic buffer” system that was introduced earlier this year and that we’re gradually switching all internals over to use instead of doing “custom” buffer management in various places. The internal API for dynbuf is documented here.

The internal API Curl_dyn_addf() adds a printf()-style string at the end of a “dynbuf”, and it seemed perfectly suitable to use here. I only needed to provide a vprintf() alternative since the printf() format was already received by Curl_pp_sendf()… I created Curl_dyn_vaddf() for this.

This single dynbuf is kept for the entire transfer so that it can be reused for subsequent commands and grow only if needed. Usually the initial 32 bytes malloc should be sufficient for all commands.

Not good enough

It didn’t help!

Counting the mallocs showed me with brutal clarity that my job wasn’t done there. Having dug this deep already I wasn’t ready to give this up just yet…

Why? Because Curl_dyn_addf() was still doing a separate alloc of the printf string that it then appended to the dynamic buffer. But okay, having our own printf() implementation in the code has its perks.

Add a printf() string without extra malloc

Back in May 2020 when I introduced this dynbuf thing, I converted the aprintf() code over to use dynbuf to truly unify our use of dynamically growing buffers. That was a main point with it after all.

As all the separate individual pieces I needed for this next step were already there, all I had to do was to add a new entry point to the printf() code that would accept a dynbuf as input and write directly into that (and grow it if needed), and then use that new function (Curl_dyn_vprintf) from the Curl_dyn_addf().

Phew. Now let’s see what we get…

There are 10 FTP commands that previously did 2 mallocs each: 20 mallocs were spent in this function when test 103 was executed. Now we are down to the ideal case of one alloc in there for the entire transfer.

Test 103 after polish

The code right now in master (to eventually get released as 7.73.0 in a few weeks), now shows a total of 104 allocations. Down from 123 in the previous release, which not entirely surprising is 19 fewer and thus perfectly matching the logic above.

All tests and CI ran fine. I merged it. This is a change that benefits all transfers done with any of the “ping pong protocols”. And it also makes the code easier to understand!

Compared to curl 7.52.1, this is a 26% reduction in number of allocation; pretty good, but even compared to 7.72.0 it is still a 15% reduction.

More?

There is always more to do, but there’s also a question of diminishing returns. I will continue to look at curl’s memory use going forward too and make sure everything is motivated and reasonable. At least every once in a while.

I have some additional ideas for further improvements in the memory use area to look into. We’ll see if they pan out…

Don’t count on me to blog about every such finding with this level of detail! If you want to make sure you don’t miss any of these fine-tunes in the future, follow the curl github repo.

Credits

Image by Julio César Velásquez Mejía from Pixabay

cURL and libcurl, Security

a Google grant for libcurl work

September 23, 2020 Daniel Stenberg 1 Comment

Earlier this year I was the recipient of a monetary Google patch grant with the expressed purpose of improving security in libcurl.

This was an upfront payout under this Google program describing itself as “an experimental program that rewards proactive security improvements to select open-source projects”.

I accepted this grant for the curl project and I intend to keep working fiercely on securing curl. I recognize the importance of curl security as curl remains one of the most widely used software components in the world, and even one that is doing network data transfers which typically is a risky business. curl is responsible for a measurable share of all Internet transfers done over the Internet an average day. My job is to make sure those transfers are done as safe and secure as possible. It isn’t my only responsibility of course, as I have other tasks to attend to as well, but still.

Do more

Security is already and always a top priority in the curl project and for myself personally. This grant will of course further my efforts to strengthen curl and by association, all the many users of it.

What I will not do

When security comes up in relation to curl, some people like to mention and propagate for other programming languages, But curl will not be rewritten in another language. Instead we will increase our efforts in writing good C and detecting problems in our code earlier and better.

Proactive counter-measures

Things we have done lately and working on to enforce everywhere:

String and buffer size limits – all string inputs and all buffers in libcurl that are allowed to grow now have a maximum allowed size, that makes sense. This stops malicious uses that could make things grow out of control and it helps detecting programming mistakes that would lead to the same problems. Also, by making sure strings and buffers are never ridiculously large, we avoid a whole class of integer overflow risks better.

Unified dynamic buffer functions – by reducing the number of different implementations that handle “growing buffers” we reduce the risk of a bug in one of them, even if it is used rarely or the spot is hard to reach with and “exercise” by the fuzzers. The “dynbuf” internal API first shipped in curl 7.71.0 (June 2020).

Realloc buffer growth unification – pretty much the same point as the previous, but we have earlier in our history had several issues when we had silly realloc() treatment that could lead to bad things. By limiting string sizes and unifying the buffer functions, we have reduced the number of places we use realloc and thus we reduce the number of places risking new realloc mistakes. The realloc mistakes were usually in combination with integer overflows.

Code style – we’ve gradually improved our code style checker (checksrc.pl) over time and we’ve also gradually made our code style more strict, leading to less variations in code, in white spacing and in naming. I’m a firm believer this makes the code look more coherent and therefore become more readable which leads to fewer bugs and easier to debug code. It also makes it easier to grep and search for code as you have fewer variations to scan for.

More code analyzers – we run every commit and PR through a large number of code analyzers to help us catch mistakes early, and we always remove detected problems. Analyzers used at the time of this writing: lgtm.com, Codacy, Deepcode AI, Monocle AI, clang tidy, scan-build, CodeQL, Muse and Coverity. That’s of course in addition to the regular run-time tools such as valgrind and sanitizer builds that run the entire test suite.

Memory-safe components – curl already supports getting built with a plethora of different libraries and “backends” to cater for users’ needs and desires. By properly supporting and offering users to build with components that are written in for example rust – or other languages that help developers avoid pitfalls – future curl and libcurl builds could potentially avoid a whole section of risks. (Stay tuned for more on this topic in a near future.)

Reactive measures

Recognizing that whatever we do and however tight ship we run, we will continue to slip every once in a while, is important and we should make sure we find and fix such slip-ups as good and early as possible.

Raising bounty rewards. While not directly fixing things, offering more money in our bug-bounty program helps us get more attention from security researchers. Our ambition is to gently drive up the reward amounts progressively to perhaps multi-thousand dollars per flaw, as long as we have funds to pay for them and we mange keep the security vulnerabilities at a reasonably low frequency.

More fuzzing. I’ve said it before but let me say it again: fuzzing is really the top method to find problems in curl once we’ve fixed all flaws that the static analyzers we use have pointed out. The primary fuzzing for curl is done by OSS-Fuzz, that tirelessly keeps hammering on the most recent curl code.

Good fuzzing needs a certain degree of “hand-holding” to allow it to really test all the APIs and dig into the dustiest corners, and we should work on adding more “probes” and entry-points into libcurl for the fuzzer to make it exercise more code paths to potentially detect more mistakes.

See also my presentation testing curl for security.

cURL and libcurl

My first 15,000 curl commits

September 18, 2020 Daniel Stenberg 5 Comments

I’ve long maintained that persistence is one of the main qualities you need in order to succeed with your (software) project. In order to manage to ship a product that truly conquers the world. By continuously and never-ending keeping at it: polishing away flaws and adding good features. On and on and on.

Today marks the day when I landed my 15,000th commit in the master branch in curl’s git repository – and we don’t do merge commits so this number doesn’t include such. Funnily enough, GitHub can’t count and shows a marginally lower number.

This is of course a totally meaningless number and I’m only mentioning it here because it’s even and an opportunity for me to celebrate something. To cross off an imaginary milestone. This is not even a year since we passed 25,000 total number of commits. Another meaningless number.

15,000 commits equals 57% of all commits done in curl so far and it makes me the only committer in the curl project with over 10% of the commits.

The curl git history starts on December 29 1999, so the first 19 months of commits from the early curl history are lost. 15,000 commits over this period equals a little less than 2 commits per day on average. I reached 10,000 commits in December 2011, so the latest 5,000 commits were done at a slower pace than the first 10,000.

I estimate that I’ve spent more than 15,000 hours working on curl over this period, so it would mean that I spend more than one hour of “curl time” per commit on average. According to gitstats, these 15,000 commits were done on 4,271 different days.

We also have other curl repositories that aren’t included in this commit number. For example, I have done over 4,400 commits in curl’s website repository.

With these my first 15,000 commits I’ve added 627,000 lines and removed 425,000, making an average commit adding 42 and removing 28 lines. (Feels pretty big but I figure the really large ones skew the average.)

The largest time gap ever between two of my commits in the curl tree is almost 35 days back in June 2000. If we limit the check to “modern times”, as in 2010 or later, there was a 19 day gap in July 2015. I do take vacations, but I usually keep up with the most important curl development even during those.

On average it is one commit done by me every 12.1 hours. Every 15.9 hours since 2010.

I’ve been working full time on curl since early 2019, up until then it was a spare time project only for me. Development with pull-requests and CI and things that verify a lot of the work before merge is a recent thing so one explanation for a slightly higher commit frequency in the past is that we then needed more “oops” commits to rectify mistakes. These days, most of them are done in the PR branches that are squashed when subsequently merged into master. Fewer commits with higher quality.

curl committers

We have merged commits authored by over 833 authors into the curl master repository. Out of these, 537 landed only a single commit (so far).

We are 48 authors who ever wrote 10 or more commits within the same year. 20 of us committed that amount of commits during more than one year.

We are 9 authors who wrote more than 1% of the commits each.

We are 5 authors who ever wrote 10 or more commits within the same year in 10 or more years.

Our second-most committer (by commit count) has not merged a commit for over seven years.

To reach curl’s top-100 committers list right now, you only need to land 6 commits.

can I keep it up?

I intend to stick around in the curl project going forward as well. If things just are this great and life remains fine, I hope that I will be maintaining roughly this commit speed for years to come. My prediction is therefore that it will take longer than another twenty years to reach 30,000 commits.

I’ve worked on curl and its precursors for almost twenty-four years. In another twenty-four years I will be well into my retirement years. At some point I will probably not be fit to shoulder this job anymore!

I have never planned long ahead before and I won’t start now. I will instead keep focused on keeping curl top quality, an exemplary open source project and a welcoming environment for newcomers and oldies alike. I will continue to make sure the project is able to function totally independently if I’m present or not.

The 15,000th commit?

So what exactly did I change in the project when I merged my 15,000th ever change into the branch?

It was a pretty boring and non-spectacular one. I removed a document (RESOURCES) from the docs/ folder as that has been a bit forgotten and now is just completely outdated. There’s a much better page for this provided on the web site: https://curl.haxx.se/rfc/

Celebrations!

I of coursed asked my twitter friends a few days ago on how this occasion is best celebrated:

I showed these results to my wife. She approved.

cURL and libcurl

store the curl output over there

September 10, 2020 Daniel Stenberg

tldr: --output-dir [directory] comes in curl 7.73.0

The curl options to store the contents of a URL into a local file, -o (--output) and -O (--remote-name) were part of curl 4.0, the first ever release, already in March 1998.

Even though we often get to hear from users that they can’t remember which of the letter O’s to use, they’ve worked exactly the same for over twenty years. I believe the biggest reason why they’re hard to keep apart is because of other tools that use similar options for maybe not identical functionality so a command line cowboy really needs to remember the exact combination of tool and -o type.

Later on, we also brought -J to further complicate things. See below.

Let’s take a look at what these options do before we get into the new stuff:

`--output [file]`

This tells curl to store the downloaded contents in that given file. You can specify the file as a local file name for the current directory or you can specify the full path. Example, store the the HTML from example.org in "/tmp/foo":

curl -o /tmp/foo https://example.org

`--remote-name`

This option is probably much better known as its short form: -O (upper case letter o).

This tells curl to store the downloaded contents in a file name name that is extracted from the given URL’s path part. For example, if you download the URL "https://example.com/pancakes.jpg" users often think that saving that using the local file name “pancakes.jpg” is a good idea. -O does that for you. Example:

curl -O https://example.com/pancakes.jpg

The name is extracted from the given URL. Even if you tell curl to follow redirects, which then may go to URLs using different file names, the selected local file name is the one in the original URL. This way you know before you invoke the command which file name it will get.

`--remote-header-name`

This option is commonly used as -J (upper case letter j) and needs to be set in combination with --remote-name.

This makes curl parse incoming HTTP response headers to check for a Content-Disposition: header, and if one is present attempt to parse a file name out of it and then use that file name when saving the content.

This then naturally makes it impossible for a user to be really sure what file name it will end up with. You leave the decision entirely to the server. curl will make an effort to not overwrite any existing local file when doing this, and to reduce risks curl will always cut off any provided directory path from that file name.

Example download of the pancake image again, but allow the server to set the local file name:

curl -OJ https://example.com/pancakes.jpg

(it has been said that “-OJ is a killer feature” but I can’t take any credit for having come up with that.)

Which directory

So in particular with -O, with or without -J, the file is download in the current working directory. If you want the download to be put somewhere special, you had to first ‘cd’ there.

When saving multiple URLs within a single curl invocation using -O, storing those in different directories would thus be impossible as you can only cd between curl invokes.

Introducing `--output-dir`

In curl 7.73.0, we introduce this new command line option --output-dir that goes well together with all these output options. It tells curl in which directory to create the file. If you want to download the pancake image, and put it in $HOME/tmp no matter which your current directory is:

curl -O --output-dir $HOME/tmp https://example.com/pancakes.jpg

And if you allow the server to select the file name but still want it in $HOME/tmp

curl -OJ --output-dir $HOME/tmp https://example.com/pancakes.jpg

Create the directory!

This new option also goes well in combination with --create-dirs, so you can specify a non-existing directory with --output-dir and have curl create it for the download and then store the file in there:

curl --create-dirs -O --output-dir /tmp/receipes https://example.com/pancakes.jpg

Ships in 7.73.0

This new option comes in curl 7.73.0. It is curl’s 233rd command line option.

You can always find the man page description of the option on the curl website.

Credits

I (Daniel) wrote the code, docs and tests for this feature.

Image by Alexas_Fotos from Pixabay

cURL and libcurl

curl help remodeled

September 4, 2020 Daniel Stenberg

curl 4.8 was released in 1998 and contained 46 command line options. curl --help would list them all. A decent set of options.

When we released curl 7.72.0 a few weeks ago, it contained 232 options… and curl --help still listed all available options.

What was once a long list of options grew over the decades into a crazy long wall of text shock to users who would enter this command and option, thinking they would figure out what command line options to try next.

–help me if you can

We’ve known about this usability flaw for a while but it took us some time to figure out how to approach it and decide what the best next step would be. Until this year when long time curl veteran Dan Fandrich did his presentation at curl up 2020 titled –help me if you can.

Emil Engler subsequently picked up the challenge and converted ideas surfaced by Dan into reality and proper code. Today we merged the refreshed and improved --help behavior in curl.

Perhaps the most notable change in curl for many users in a long time. Targeted for inclusion in the pending 7.73.0 release.

help categories

First out, curl --help will now by default only list a small subset of the most “important” and frequently used options. No massive wall, no shock. Not even necessary to pipe to more or less to see proper.

Then: each curl command line option now has one or more categories, and the help system can be asked to just show command line options belonging to the particular category that you’re interested in.

For example, let’s imagine you’re interested in seeing what curl options provide for your HTTP operations:

$ curl --help http
Usage: curl [options…]
http: HTTP and HTTPS protocol options
--alt-svc Enable alt-svc with this cache file
--anyauth Pick any authentication method
--compressed Request compressed response
-b, --cookie Send cookies from string/file
-c, --cookie-jar Write cookies to after operation
-d, --data HTTP POST data
--data-ascii HTTP POST ASCII data
--data-binary HTTP POST binary data
--data-raw HTTP POST data, '@' allowed
--data-urlencode HTTP POST data url encoded
--digest Use HTTP Digest Authentication
[...]

list categories

To figure out what help categories that exists, just ask with curl --help category, which will show you a list of the current twenty-two categories: auth, connection, curl, dns, file, ftp, http, imap, misc, output, pop3, post, proxy, scp, sftp, smtp, ssh, telnet ,tftp, tls, upload and verbose. It will also display a brief description of each category.

Each command line option can be put into multiple categories, so the same one may be displayed in both in the “http” category as well as in “upload” or “auth” etc.

`--help all`

You can of course still get the old list of every single command line option by issuing curl --help all. Handy for grepping the list and more.

“important”

The meta category “important” is what we use for the options that we show when just curl --help is issued. Presumably those options should be the most important, in some ways.

Credits

Code by Emil Engler. Ideas and research by Dan Fandrich.

cURL and libcurl

Enabling better curl bindings

August 28, 2020 Daniel Stenberg 1 Comment

I think it is fair to say that libcurl is a library that is very widely spread, widely used and powers a sizable share of Internet transfers. It’s age, it’s availability, it’s stability and its API contribute to it having gotten to this position.

libcurl is in a position where it could remain for a long time to come, unless we do something wrong and given that we stay focused on what we are and what we’re here for. I believe curl and libcurl might still be very meaningful in ten years.

Bindings are key

Another explanation is the fact that there are a large number of bindings to libcurl. A binding is a piece of code that allows libcurl to be used directly and conveniently from another programming language. Bindings are typically authored and created by enthusiasts of a particular language. To bring libcurl powers to applications written in that language.

The list of known bindings we feature on the curl web sites lists around 70 bindings for 62 something different languages. You can access and use libcurl with (almost) any language you can dream of. I figure most mortals can’t even name half that many programming languages! The list starts out with Ada95, Basic, C++, Ch, Cocoa, Clojure, D, Delphi, Dylan, Eiffel, Euphoria and it goes on for quite a while more.

Keeping bindings in sync is work

The bindings are typically written to handle transfers with libcurl as it was working at a certain point in time, knowing what libcurl supported at that moment. But as readers of this blog and followers of the curl project know, libcurl keeps advancing and we change and improve things regularly. We add functionality and new features in almost every new release.

This rather fast pace of development offers a challenge to binding authors, as they need to write the binding in a very clever way and keep up with libcurl developments in order to offer their users the latest libcurl features via their binding.

With libcurl being the foundational underlying engine for so many applications and the number of applications and services accessing libcurl via bindings is truly uncountable – this work of keeping bindings in sync is not insignificant.

If we can provide mechanisms in libcurl to ease that work and to reduce friction, it can literally affect the world.

“easy options” are knobs and levers

Users of the libcurl knows that one of the key functions in the API is the curl_easy_setopt function. Using this function call, the application sets specific options for a transfer, asking for certain behaviors etc. The URL to use, user name, authentication methods, where to send the output, how to provide the input etc etc.

At the time I write this, this key function features no less than 277 different and well-documented options. Of course we should work hard at not adding new options unless really necessary and we should keep the option growth as slow as possible, but at the same time the Internet isn’t stopping and as the whole world is developing we need to follow along.

Options generally come using one of a set of predefined kinds. Like a string, a numerical value or list of strings etc. But the names of the options and knowing about their existence has always been knowledge that exists in the curl source tree, requiring each bindings to be synced with the latest curl in order to get knowledge about the most recent knobs libcurl offers.

Until now…

Introducing an easy options info API

Starting in the coming version 7.73.0 (due to be released on October 14, 2020), libcurl offers API functions that allow applications and bindings to query it for information about all the options this libcurl instance knows about.

curl_easy_option_next lets the application iterate over options, to either go through all of them or a set of them. For each option, there’s details to extract about it that tells what kind of input data that option expects.

curl_easy_option_by_name allows the application to look up details about a specific option using its name. If the application instead has the internal “id” for the option, it can look it up using curl_easy_option_by_id.

With these new functions, bindings should be able to better adapt to the current run-time version of the library and become less dependent on syncing with the latest libcurl source code. We hope this will make it easier to make bindings stay in sync with libcurl developments.

Legacy is still legacy

Lots of bindings have been around for a long time and many of them of course still want to support libcurl versions much older than 7.73.0 so jumping onto this bandwagon of new fancy API for this will not be an instant success or take away code needed for that long tail of old version everyone wants to keep supporting.

We can’t let the burden of legacy stand in the way for improvement and going forward. At least if you find that you are lucky enough to have 7.73.0 or later installed, you can dynamically figure out these things about options. Maybe down the line the number of legacy versions will shrink. Maybe if libcurl still is relevant in ten years none of the pre curl 7.73.0 versions need to be supported anymore!

Credits

Lots of the discussions and ideas for this API come from Jeroen Ooms, author of the R binding for libcurl.

Image by Rudy and Peter Skitterians from Pixabay

cURL and libcurl

tiny-curl 7.72.0 – Micrium

August 27, 2020 Daniel Stenberg

You remember my tiny-curl effort to port libcurl to more Real-time operating systems? Back in May 2020 I announced it in association with me porting tiny-curl to FreeRTOS.

Today I’m happy to bring you the news that tiny-curl 7.72.0 was just released. Now it also builds and runs fine on the Micrium OS.

Timed with this release, I changed the tiny-curl version number to use the same as the curl release on which this is based on, and I’ve created a new dedicated section on the curl web site for tiny-curl:

https://curl.haxx.se/tiny/

Head over there to download.

Why tiny-curl

With tiny-curl you get an HTTPS-focused small library, that typically fits in 100Kb storage, needing less than 20Kb of dynamic memory to run (excluding TLS and regular libc needs).

You want to go with libcurl even in these tiny devices because your other options are all much much worse. Lots of devices in this category (I call it “devices that are too small to run Linux“) basically go with some default example HTTP code from the OS vendor or similar and sure, that can often be built into a much smaller foot-print than libcurl can but you also get something that is very fragile and error prone. With libcurl, and tiny-curl, instead you get:

the same API on all systems – porting your app over now or later becomes a smooth ride
a secure and safe library that’s been battle-proven, tested and checked a lot
the best documented HTTP library in existence
commercial support is readily available

tiny and upward

tiny-curl comes already customized as small as possible, but you always have the option to enable additional powers and by going up slightly in size you can also add more features from the regular libcurl plethora of powerful offerings.

cURL and libcurl

curl ping pong

August 20, 2020 Daniel Stenberg

Pretend that a ping pong ball represents a single curl installation somewhere in the world. Here’s a picture of one to help you get an image in your head.

Moving on with this game, you get one ball for every curl installation out there and your task is to put all those balls on top of each other. Okay, that’s hard to balance but for this game we can also pretend you have glue enough to make sure they stay like this. A tower of ping pong balls.

You soon realize that this is quite a lot of work. The balls keep pouring in.

How fast can you build?

If you manage to do this construction work non-stop at the rate of one ball per second (which seems like it maybe would be hard after a while but let’s not make that ruin the fun), it will keep you occupied for no less than a little bit over 317 years. (That also assumes the number of curl installations doesn’t grow significantly in the mean time.)

That’s a lot of ping pong balls. Ten billion of them, give or take.

Assuming you have friends to help you build this tower you can probably build it faster. If you can instead sustain a rate of 1000 balls per second, you’d be done in less than four months.

One official ping pong ball weighs 2.7 grams. It makes a total of 27,000 tonnes of balls. That’s quite some pressure on such a small surface. You better make sure to build the tower on something solid. The heaviest statue in the world is the Statue of Liberty in New York, clocking in at 24,500 tonnes.

That takes a lot of balls

But wait, the biggest ping pong ball manufacturer in the world (Double Happiness, in China – yes it’s really called that) “only” produces 200 million balls per year. It would take them 50 years to make balls for this tower. You clearly need to engage many factories.

You can get 100 balls for roughly 10 USD on Amazon right now. Maybe not the best balls to play with, but I think they might still suit this game. That’s a billion US dollars for the balls. Maybe you’d get a discount, but you’d also drastically increase demand, so…

How tall is that?

A tower of ten billion ping pong balls, how tall is that? It would reach the moon.

The diameter of a ping pong ball is 40 mm (it was officially increased from 38 mm back in 2000). This makes 25 balls per meter of tower. Conveniently aligned for our game here.

10,000,000,000 balls / 25 balls per meter = 400,000,000 meters. 400,000 km.

Distance from earth to moon? 384,400 km. The fully built tower is actually a little taller than the average distance to the moon! Here’s another picture to help you get an image in your head. (Although this image is not drawn to scale!)

A challenge will of course be to keep this very thin tower steady when that tall. Winds and low temperatures should be challenging. And there’s the additional risk of airplanes or satellites colliding with it. Or even just birds interfering in the lower altitudes. I suspect there are also laws prohibiting such a construction.

Never mind

Come to think of it. This was just a mind game. Forget about it now. Let’s move on with our lives instead. We have better things to do.

cURL and libcurl

curl 7.72.0 – more compression

August 19, 2020 Daniel Stenberg 2 Comments

Welcome to another release, seven weeks since we did the patch release 7.71.1. This time we add a few new subtle features so the minor number is bumped yet again. Details below.

Release presentation video

Numbers

the 194th release
3 changes
49 days (total: 8,188)
100 bug fixes (total: 6,327)
134 commits (total: 26,077)
0 new public libcurl function (total: 82)
0 new curl_easy_setopt() option (total: 277)
0 new curl command line option (total: 232)
52 contributors, 29 new (total: 2,239)
30 authors, 14 new (total: 819)
1 security fix (total: 95)
500 USD paid in Bug Bounties (total: 2,800 USD)

Security

CVE-2020-8132: “libcurl: wrong connect-only connection”. This a rather obscure issue that we’ve graded severity Low. There’s a risk that an application that’s using libcurl to do connect-only connections (ie not doing the full transfer with libcurl, just using it to setup the connection) accidentally sends or reads data over the wrong connection, as libcurl could mix them up internally in rare circumstances.

We rewarded 500 USD to the reporter of this security flaw.

Features

This is the first curl release that supports zstd compression. zstd is a yet another way to compressed content data over HTTP and if curl supports it, it can then automatically decompress it on the fly. zstd is designed to compress better and faster than gzip and if I understand the numbers shown, it is less CPU intensive than brotli. In pure practical terms, curl will ask for this compression in addition to the other supported algorithms if you tell curl you want compressed content. zstd is still not widely supported by browsers.

For clients that supports HTTP/2 and server push, libcurl now allows the controlling callback (“should this server push be accepted?”) to return an error code that will tear down the entire connection.

There’s a new option for curl_easy_getinfo called CURLINFO_EFFECTIVE_METHOD that lets the application ask libcurl what the most resent request method used was. This is relevant in case you’ve allowed libcurl to follow redirects for a POST where it might have changed the method as a result of what particular HTTP response the server responded with.

Bug-fixes

Here are a collection of bug-fixes I think stood out a little extra in this cycle.

cmake: fix windows xp build

I just love the fact that someone actually tried to build curl for Windows XP, noticed it failed in doing so and provided the fix to make it work again…

curl: improve the existing file check with -J

There were some minor mistakes in the code that checks if the file you get when you use -J already existed. That logic has now been tightened. Presumably not a single person ever actually had an actual problem with that before either, but…

ftp: don’t do ssl_shutdown instead of ssl_close

We landed an FTPS regression in 7.71.1 where we accidentally did the wrong function call when closing down the data connection. It could make consecutive FTPS transfers terribly slow.

http2: repair trailer handling

We had another regression reported where HTTP trailers when using HTTP/2 really didn’t work. Obviously not a terribly well-used feature…

http2: close the http2 connection when no more requests may be sent

Another little HTTP/2 polish: make sure that connections that have received a GOAWAY is marked for closure so that it gets closed sooner rather than later as no new streams can be created on it anyway!

multi_remove_handle: close unused connect-only connections

“connect-only connections” are those where the application asks libcurl to just connect to the site and not actually perform any request or transfer. Previously when that was done, the connection would remain in the multi handle until it was closed and it couldn’t be reused. Starting now, when the easy handle that “owns” the connection is removed from the multi handle the associated connect-only connection will be closed and removed. This is just sensible.

ngtcp2: adapt to changes

ngtcp2 is a QUIC library and is used in one of the backends curl supports for HTTP/3. HTTP/3 in curl is still marked experimental and we aim at keeping the latest curl code work with the latest QUIC libraries – since they’re both still “pre-beta” versions and don’t do releases yet. So, if you find that the HTTP/3 build fails, make sure you use the latest git commits of all the h3 components!

quiche: handle calling disconnect twice

If curl would call the QUIC disconnect function twice, using the quiche backend, it would crash hard. Would happen if you tried to connect to a host that didn’t listen to the UDP port at all for example…

setopt: unset NOBODY switches to GET if still HEAD

We recently fixed a bug for storing the HTTP method internally and due to refactored code, the behavior of unsetting the CURLOPT_NOBODY option changed slightly. There was never any promise as to what exactly that would do – but apparently several users had already drawn conclusions and written applications based on that. We’ve now adapted somewhat to that presumption on undocumented behavior by documenting better what it should do and by putting back some code to back it up…

http2: move retrycount from connect struct to easy handle

Yet another HTTP/2 fix. In a recent release we fixed a problem that materialized when libcurl received a GOAWAY on a stream for a HTTP/2 connection, and it would then instead try a new connection to issue the request over and that too would get a GOAWAY. libcurl will do these retry attempts up to 5 times but due to a mistake, the counter was stored wrongly and was cleared when each new connection was made…

url: fix CURLU and location following

libcurl supports two ways of setting the URL to work with. The good old string to the entire URL and the option CURLOPT_CURLU where you provide the handle to an already parsed URL. The latter is of course a much newer option and it turns out that libcurl didn’t properly handle redirects when the URL was set with this latter option!

Coming up

There are already several Pull Requests waiting in line to get merged that add new features and functionality. We expect the next release to become 7.73.0 and ship on October 14, 2020. Fingers crossed.

cURL and libcurl

Video: Landing code in curl

August 13, 2020 Daniel Stenberg

A few hours ago I ended my webinar on how to get your code contribution merged into curl. Here’s the video of it:

Here are the slides.

How?

Let’s start

What about a few older versions

Cleanup step one

Step two, bump it up a notch

Not good enough

Add a printf() string without extra malloc

Test 103 after polish

More?

Credits

Do more

What I will not do

Proactive counter-measures

Reactive measures

curl committers

can I keep it up?

The 15,000th commit?

Celebrations!

--output [file]

--remote-name

--remote-header-name

Which directory

Introducing --output-dir

Create the directory!

Ships in 7.73.0

Credits

–help me if you can

help categories

list categories

--help all

“important”

Credits

Bindings are key

Keeping bindings in sync is work

“easy options” are knobs and levers

Introducing an easy options info API

Legacy is still legacy

Credits

Why tiny-curl

tiny and upward

How fast can you build?

That takes a lot of balls

How tall is that?

Never mind

Release presentation video

Numbers

Security

Features

Bug-fixes

cmake: fix windows xp build

curl: improve the existing file check with -J

ftp: don’t do ssl_shutdown instead of ssl_close

http2: repair trailer handling

http2: close the http2 connection when no more requests may be sent

multi_remove_handle: close unused connect-only connections

ngtcp2: adapt to changes

quiche: handle calling disconnect twice

setopt: unset NOBODY switches to GET if still HEAD

http2: move retrycount from connect struct to easy handle

url: fix CURLU and location following

Coming up

curl, open source and networking

`--output [file]`

`--remote-name`

`--remote-header-name`

Introducing `--output-dir`

`--help all`