Category Archives: Open Source

Open Source, Free Software, and similar

store the curl output over there

tldr: --output-dir [directory] comes in curl 7.73.0

The curl options to store the contents of a URL into a local file, -o (--output) and -O (--remote-name) were part of curl 4.0, the first ever release, already in March 1998.

Even though we often get to hear from users that they can’t remember which of the letter O’s to use, they’ve worked exactly the same for over twenty years. I believe the biggest reason why they’re hard to keep apart is because of other tools that use similar options for maybe not identical functionality so a command line cowboy really needs to remember the exact combination of tool and -o type.

Later on, we also brought -J to further complicate things. See below.

Let’s take a look at what these options do before we get into the new stuff:

--output [file]

This tells curl to store the downloaded contents in that given file. You can specify the file as a local file name for the current directory or you can specify the full path. Example, store the the HTML from example.org in "/tmp/foo":

curl -o /tmp/foo https://example.org

--remote-name

This option is probably much better known as its short form: -O (upper case letter o).

This tells curl to store the downloaded contents in a file name name that is extracted from the given URL’s path part. For example, if you download the URL "https://example.com/pancakes.jpg" users often think that saving that using the local file name “pancakes.jpg” is a good idea. -O does that for you. Example:

curl -O https://example.com/pancakes.jpg

The name is extracted from the given URL. Even if you tell curl to follow redirects, which then may go to URLs using different file names, the selected local file name is the one in the original URL. This way you know before you invoke the command which file name it will get.

--remote-header-name

This option is commonly used as -J (upper case letter j) and needs to be set in combination with --remote-name.

This makes curl parse incoming HTTP response headers to check for a Content-Disposition: header, and if one is present attempt to parse a file name out of it and then use that file name when saving the content.

This then naturally makes it impossible for a user to be really sure what file name it will end up with. You leave the decision entirely to the server. curl will make an effort to not overwrite any existing local file when doing this, and to reduce risks curl will always cut off any provided directory path from that file name.

Example download of the pancake image again, but allow the server to set the local file name:

curl -OJ https://example.com/pancakes.jpg

(it has been said that “-OJ is a killer feature” but I can’t take any credit for having come up with that.)

Which directory

So in particular with -O, with or without -J, the file is download in the current working directory. If you want the download to be put somewhere special, you had to first ‘cd’ there.

When saving multiple URLs within a single curl invocation using -O, storing those in different directories would thus be impossible as you can only cd between curl invokes.

Introducing --output-dir

In curl 7.73.0, we introduce this new command line option --output-dir that goes well together with all these output options. It tells curl in which directory to create the file. If you want to download the pancake image, and put it in $HOME/tmp no matter which your current directory is:

curl -O --output-dir $HOME/tmp https://example.com/pancakes.jpg

And if you allow the server to select the file name but still want it in $HOME/tmp

curl -OJ --output-dir $HOME/tmp https://example.com/pancakes.jpg

Create the directory!

This new option also goes well in combination with --create-dirs, so you can specify a non-existing directory with --output-dir and have curl create it for the download and then store the file in there:

curl --create-dirs -O --output-dir /tmp/receipes https://example.com/pancakes.jpg

Ships in 7.73.0

This new option comes in curl 7.73.0. It is curl’s 233rd command line option.

You can always find the man page description of the option on the curl website.

Credits

I (Daniel) wrote the code, docs and tests for this feature.

Image by Alexas_Fotos from Pixabay

curl help remodeled

curl 4.8 was released in 1998 and contained 46 command line options. curl --help would list them all. A decent set of options.

When we released curl 7.72.0 a few weeks ago, it contained 232 options… and curl --help still listed all available options.

What was once a long list of options grew over the decades into a crazy long wall of text shock to users who would enter this command and option, thinking they would figure out what command line options to try next.

–help me if you can

We’ve known about this usability flaw for a while but it took us some time to figure out how to approach it and decide what the best next step would be. Until this year when long time curl veteran Dan Fandrich did his presentation at curl up 2020 titled –help me if you can.

Emil Engler subsequently picked up the challenge and converted ideas surfaced by Dan into reality and proper code. Today we merged the refreshed and improved --help behavior in curl.

Perhaps the most notable change in curl for many users in a long time. Targeted for inclusion in the pending 7.73.0 release.

help categories

First out, curl --help will now by default only list a small subset of the most “important” and frequently used options. No massive wall, no shock. Not even necessary to pipe to more or less to see proper.

Then: each curl command line option now has one or more categories, and the help system can be asked to just show command line options belonging to the particular category that you’re interested in.

For example, let’s imagine you’re interested in seeing what curl options provide for your HTTP operations:

$ curl --help http
Usage: curl [options…]
http: HTTP and HTTPS protocol options
--alt-svc Enable alt-svc with this cache file
--anyauth Pick any authentication method
--compressed Request compressed response
-b, --cookie Send cookies from string/file
-c, --cookie-jar Write cookies to after operation
-d, --data HTTP POST data
--data-ascii HTTP POST ASCII data
--data-binary HTTP POST binary data
--data-raw HTTP POST data, '@' allowed
--data-urlencode HTTP POST data url encoded
--digest Use HTTP Digest Authentication
[...]

list categories

To figure out what help categories that exists, just ask with curl --help category, which will show you a list of the current twenty-two categories: auth, connection, curl, dns, file, ftp, http, imap, misc, output, pop3, post, proxy, scp, sftp, smtp, ssh, telnet ,tftp, tls, upload and verbose. It will also display a brief description of each category.

Each command line option can be put into multiple categories, so the same one may be displayed in both in the “http” category as well as in “upload” or “auth” etc.

--help all

You can of course still get the old list of every single command line option by issuing curl --help all. Handy for grepping the list and more.

“important”

The meta category “important” is what we use for the options that we show when just curl --help is issued. Presumably those options should be the most important, in some ways.

Credits

Code by Emil Engler. Ideas and research by Dan Fandrich.

Enabling better curl bindings

I think it is fair to say that libcurl is a library that is very widely spread, widely used and powers a sizable share of Internet transfers. It’s age, it’s availability, it’s stability and its API contribute to it having gotten to this position.

libcurl is in a position where it could remain for a long time to come, unless we do something wrong and given that we stay focused on what we are and what we’re here for. I believe curl and libcurl might still be very meaningful in ten years.

Bindings are key

Another explanation is the fact that there are a large number of bindings to libcurl. A binding is a piece of code that allows libcurl to be used directly and conveniently from another programming language. Bindings are typically authored and created by enthusiasts of a particular language. To bring libcurl powers to applications written in that language.

The list of known bindings we feature on the curl web sites lists around 70 bindings for 62 something different languages. You can access and use libcurl with (almost) any language you can dream of. I figure most mortals can’t even name half that many programming languages! The list starts out with Ada95, Basic, C++, Ch, Cocoa, Clojure, D, Delphi, Dylan, Eiffel, Euphoria and it goes on for quite a while more.

Keeping bindings in sync is work

The bindings are typically written to handle transfers with libcurl as it was working at a certain point in time, knowing what libcurl supported at that moment. But as readers of this blog and followers of the curl project know, libcurl keeps advancing and we change and improve things regularly. We add functionality and new features in almost every new release.

This rather fast pace of development offers a challenge to binding authors, as they need to write the binding in a very clever way and keep up with libcurl developments in order to offer their users the latest libcurl features via their binding.

With libcurl being the foundational underlying engine for so many applications and the number of applications and services accessing libcurl via bindings is truly uncountable – this work of keeping bindings in sync is not insignificant.

If we can provide mechanisms in libcurl to ease that work and to reduce friction, it can literally affect the world.

“easy options” are knobs and levers

Users of the libcurl knows that one of the key functions in the API is the curl_easy_setopt function. Using this function call, the application sets specific options for a transfer, asking for certain behaviors etc. The URL to use, user name, authentication methods, where to send the output, how to provide the input etc etc.

At the time I write this, this key function features no less than 277 different and well-documented options. Of course we should work hard at not adding new options unless really necessary and we should keep the option growth as slow as possible, but at the same time the Internet isn’t stopping and as the whole world is developing we need to follow along.

Options generally come using one of a set of predefined kinds. Like a string, a numerical value or list of strings etc. But the names of the options and knowing about their existence has always been knowledge that exists in the curl source tree, requiring each bindings to be synced with the latest curl in order to get knowledge about the most recent knobs libcurl offers.

Until now…

Introducing an easy options info API

Starting in the coming version 7.73.0 (due to be released on October 14, 2020), libcurl offers API functions that allow applications and bindings to query it for information about all the options this libcurl instance knows about.

curl_easy_option_next lets the application iterate over options, to either go through all of them or a set of them. For each option, there’s details to extract about it that tells what kind of input data that option expects.

curl_easy_option_by_name allows the application to look up details about a specific option using its name. If the application instead has the internal “id” for the option, it can look it up using curl_easy_option_by_id.

With these new functions, bindings should be able to better adapt to the current run-time version of the library and become less dependent on syncing with the latest libcurl source code. We hope this will make it easier to make bindings stay in sync with libcurl developments.

Legacy is still legacy

Lots of bindings have been around for a long time and many of them of course still want to support libcurl versions much older than 7.73.0 so jumping onto this bandwagon of new fancy API for this will not be an instant success or take away code needed for that long tail of old version everyone wants to keep supporting.

We can’t let the burden of legacy stand in the way for improvement and going forward. At least if you find that you are lucky enough to have 7.73.0 or later installed, you can dynamically figure out these things about options. Maybe down the line the number of legacy versions will shrink. Maybe if libcurl still is relevant in ten years none of the pre curl 7.73.0 versions need to be supported anymore!

Credits

Lots of the discussions and ideas for this API come from Jeroen Ooms, author of the R binding for libcurl.

Image by Rudy and Peter Skitterians from Pixabay

tiny-curl 7.72.0 – Micrium

You remember my tiny-curl effort to port libcurl to more Real-time operating systems? Back in May 2020 I announced it in association with me porting tiny-curl to FreeRTOS.

Today I’m happy to bring you the news that tiny-curl 7.72.0 was just released. Now it also builds and runs fine on the Micrium OS.

Timed with this release, I changed the tiny-curl version number to use the same as the curl release on which this is based on, and I’ve created a new dedicated section on the curl web site for tiny-curl:

https://curl.haxx.se/tiny/

Head over there to download.

Why tiny-curl

With tiny-curl you get an HTTPS-focused small library, that typically fits in 100Kb storage, needing less than 20Kb of dynamic memory to run (excluding TLS and regular libc needs).

You want to go with libcurl even in these tiny devices because your other options are all much much worse. Lots of devices in this category (I call it “devices that are too small to run Linux“) basically go with some default example HTTP code from the OS vendor or similar and sure, that can often be built into a much smaller foot-print than libcurl can but you also get something that is very fragile and error prone. With libcurl, and tiny-curl, instead you get:

  • the same API on all systems – porting your app over now or later becomes a smooth ride
  • a secure and safe library that’s been battle-proven, tested and checked a lot
  • the best documented HTTP library in existence
  • commercial support is readily available

tiny and upward

tiny-curl comes already customized as small as possible, but you always have the option to enable additional powers and by going up slightly in size you can also add more features from the regular libcurl plethora of powerful offerings.

curl ping pong

Pretend that a ping pong ball represents a single curl installation somewhere in the world. Here’s a picture of one to help you get an image in your head.

Moving on with this game, you get one ball for every curl installation out there and your task is to put all those balls on top of each other. Okay, that’s hard to balance but for this game we can also pretend you have glue enough to make sure they stay like this. A tower of ping pong balls.

You soon realize that this is quite a lot of work. The balls keep pouring in.

How fast can you build?

If you manage to do this construction work non-stop at the rate of one ball per second (which seems like it maybe would be hard after a while but let’s not make that ruin the fun), it will keep you occupied for no less than a little bit over 317 years. (That also assumes the number of curl installations doesn’t grow significantly in the mean time.)

That’s a lot of ping pong balls. Ten billion of them, give or take.

Assuming you have friends to help you build this tower you can probably build it faster. If you can instead sustain a rate of 1000 balls per second, you’d be done in less than four months.

One official ping pong ball weighs 2.7 grams. It makes a total of 27,000 tonnes of balls. That’s quite some pressure on such a small surface. You better make sure to build the tower on something solid. The heaviest statue in the world is the Statue of Liberty in New York, clocking in at 24,500 tonnes.

That takes a lot of balls

But wait, the biggest ping pong ball manufacturer in the world (Double Happiness, in China – yes it’s really called that) “only” produces 200 million balls per year. It would take them 50 years to make balls for this tower. You clearly need to engage many factories.

You can get 100 balls for roughly 10 USD on Amazon right now. Maybe not the best balls to play with, but I think they might still suit this game. That’s a billion US dollars for the balls. Maybe you’d get a discount, but you’d also drastically increase demand, so…

How tall is that?

A tower of ten billion ping pong balls, how tall is that? It would reach the moon.

The diameter of a ping pong ball is 40 mm (it was officially increased from 38 mm back in 2000). This makes 25 balls per meter of tower. Conveniently aligned for our game here.

10,000,000,000 balls / 25 balls per meter = 400,000,000 meters. 400,000 km.

Distance from earth to moon? 384,400 km. The fully built tower is actually a little taller than the average distance to the moon! Here’s another picture to help you get an image in your head. (Although this image is not drawn to scale!)

A challenge will of course be to keep this very thin tower steady when that tall. Winds and low temperatures should be challenging. And there’s the additional risk of airplanes or satellites colliding with it. Or even just birds interfering in the lower altitudes. I suspect there are also laws prohibiting such a construction.

Never mind

Come to think of it. This was just a mind game. Forget about it now. Let’s move on with our lives instead. We have better things to do.

curl 7.72.0 – more compression

Welcome to another release, seven weeks since we did the patch release 7.71.1. This time we add a few new subtle features so the minor number is bumped yet again. Details below.

Release presentation video

Numbers

the 194th release
3 changes
49 days (total: 8,188)

100 bug fixes (total: 6,327)
134 commits (total: 26,077)
0 new public libcurl function (total: 82)
0 new curl_easy_setopt() option (total: 277)

0 new curl command line option (total: 232)
52 contributors, 29 new (total: 2,239)
30 authors, 14 new (total: 819)
1 security fix (total: 95)
500 USD paid in Bug Bounties (total: 2,800 USD)

Security

CVE-2020-8132: “libcurl: wrong connect-only connection”. This a rather obscure issue that we’ve graded severity Low. There’s a risk that an application that’s using libcurl to do connect-only connections (ie not doing the full transfer with libcurl, just using it to setup the connection) accidentally sends or reads data over the wrong connection, as libcurl could mix them up internally in rare circumstances.

We rewarded 500 USD to the reporter of this security flaw.

Features

This is the first curl release that supports zstd compression. zstd is a yet another way to compressed content data over HTTP and if curl supports it, it can then automatically decompress it on the fly. zstd is designed to compress better and faster than gzip and if I understand the numbers shown, it is less CPU intensive than brotli. In pure practical terms, curl will ask for this compression in addition to the other supported algorithms if you tell curl you want compressed content. zstd is still not widely supported by browsers.

For clients that supports HTTP/2 and server push, libcurl now allows the controlling callback (“should this server push be accepted?”) to return an error code that will tear down the entire connection.

There’s a new option for curl_easy_getinfo called CURLINFO_EFFECTIVE_METHOD that lets the application ask libcurl what the most resent request method used was. This is relevant in case you’ve allowed libcurl to follow redirects for a POST where it might have changed the method as a result of what particular HTTP response the server responded with.

Bug-fixes

Here are a collection of bug-fixes I think stood out a little extra in this cycle.

cmake: fix windows xp build

I just love the fact that someone actually tried to build curl for Windows XP, noticed it failed in doing so and provided the fix to make it work again…

curl: improve the existing file check with -J

There were some minor mistakes in the code that checks if the file you get when you use -J already existed. That logic has now been tightened. Presumably not a single person ever actually had an actual problem with that before either, but…

ftp: don’t do ssl_shutdown instead of ssl_close

We landed an FTPS regression in 7.71.1 where we accidentally did the wrong function call when closing down the data connection. It could make consecutive FTPS transfers terribly slow.

http2: repair trailer handling

We had another regression reported where HTTP trailers when using HTTP/2 really didn’t work. Obviously not a terribly well-used feature…

http2: close the http2 connection when no more requests may be sent

Another little HTTP/2 polish: make sure that connections that have received a GOAWAY is marked for closure so that it gets closed sooner rather than later as no new streams can be created on it anyway!

multi_remove_handle: close unused connect-only connections

“connect-only connections” are those where the application asks libcurl to just connect to the site and not actually perform any request or transfer. Previously when that was done, the connection would remain in the multi handle until it was closed and it couldn’t be reused. Starting now, when the easy handle that “owns” the connection is removed from the multi handle the associated connect-only connection will be closed and removed. This is just sensible.

ngtcp2: adapt to changes

ngtcp2 is a QUIC library and is used in one of the backends curl supports for HTTP/3. HTTP/3 in curl is still marked experimental and we aim at keeping the latest curl code work with the latest QUIC libraries – since they’re both still “pre-beta” versions and don’t do releases yet. So, if you find that the HTTP/3 build fails, make sure you use the latest git commits of all the h3 components!

quiche: handle calling disconnect twice

If curl would call the QUIC disconnect function twice, using the quiche backend, it would crash hard. Would happen if you tried to connect to a host that didn’t listen to the UDP port at all for example…

setopt: unset NOBODY switches to GET if still HEAD

We recently fixed a bug for storing the HTTP method internally and due to refactored code, the behavior of unsetting the CURLOPT_NOBODY option changed slightly. There was never any promise as to what exactly that would do – but apparently several users had already drawn conclusions and written applications based on that. We’ve now adapted somewhat to that presumption on undocumented behavior by documenting better what it should do and by putting back some code to back it up…

http2: move retrycount from connect struct to easy handle

Yet another HTTP/2 fix. In a recent release we fixed a problem that materialized when libcurl received a GOAWAY on a stream for a HTTP/2 connection, and it would then instead try a new connection to issue the request over and that too would get a GOAWAY. libcurl will do these retry attempts up to 5 times but due to a mistake, the counter was stored wrongly and was cleared when each new connection was made…

url: fix CURLU and location following

libcurl supports two ways of setting the URL to work with. The good old string to the entire URL and the option CURLOPT_CURLU where you provide the handle to an already parsed URL. The latter is of course a much newer option and it turns out that libcurl didn’t properly handle redirects when the URL was set with this latter option!

Coming up

There are already several Pull Requests waiting in line to get merged that add new features and functionality. We expect the next release to become 7.73.0 and ship on October 14, 2020. Fingers crossed.

Using fixed port numbers for curl tests is now history!

Test suite basics

The curl test suite fires up a whole bunch of test servers for the various supported protocols, and then command lines using curl or libcurl-using dedicated test apps are run against those servers to make sure curl is acting exactly as it is supposed to.

The curl test suite was introduced back in the year 2000 and has of course grown and been developed substantially since then.

Each test server is invoked and handles a specific protocol or set of protocols, so to make sure curl’s 24 transfer protocols are tested, a lot of test server instances are spun up. The test servers normally don’t shut down after each individual test but instead keep running in case there are more tests for that protocol for speedier operations. When all tests are completed, all the test servers are shut down again.

Port numbers

The protocols curl communicates with are all done over TCP or UDP, and therefore each test server needs to listen to a dedicated port so that the tests can be invoked and curl can use the protocols correctly.

We started the test suite once by using port number 8990 as a default “base port”, and then each subsequent server it invokes gets the next port number. A full round can use up to 30 or so ports.

The script needed to know what port number to use so that it could invoke the tests to use the correct port number. But port numbers are both a scarce resource and a resource that can only be used by one server at a time, so if you would simultaneously invoke the curl test suite a second time on the same machine, it would cause havoc since it would attempt to use the same port numbers again.

Not to mention that when various users run the test suite on their machines, on servers or in CI jobs, just assuming or hoping that we could use a range of 30 fixed port numbers is error-prone and it would occasionally cause us trouble.

Dynamic port numbers

A few months ago, in April 2020, I started the work on changing how the curl test suite uses port numbers for its test servers. Each used test server would instead get fired up on a totally random port number, and then feed back that number to the test script that then would use the actually used port number to run the tests against.

It took some rearranging of internal logic to make sure things still worked and that the comparison of the generated protocol etc still would work. Then it was “just” a matter of converting over all test servers to this new dynamic concept and made sure that there was no old-style assumptions lingering around in the test cases.

Some of the more tricky changes for this new paradigm was the protocol parts that use the port number in data where curl base64 encodes the chunk and sends it to the server.

Reaching the end of that journey

The final fixed port number in the curl test suite was removed when we merged PR 5783 on August 7 2020. Now, every test in curl that needs a port number uses a dynamic port number.

Now we can run “make test” from multiple shells on the same machine in parallel without problems.

It can allow us to improve the test suite further and maybe for example run multiple tests in parallel in order to reduce the total execution time. It will avoid future port collisions. A small, but very good step forward.

Yay.

Credits

Image by Gerd Altmann from Pixabay

Upcoming Webinar: curl: How to Make Your First Code Contribution

When: Aug 13, 2020 10:00 AM Pacific Time (US and Canada) (17:00 UTC)
Length: 30 minutes

Abstract: curl is a wildly popular and well-used open source tool and library, and is the result of more than 2,200 named contributors helping out. Over 800 individuals wrote at least one commit so far.

In this presentation, curl’s lead developer Daniel Stenberg talks about how any developer can proceed in order to get their first code contribution submitted and ultimately landed in the curl git repository. Approach to code and commits, style, editing, pull-requests, using github etc. After you’ve seen this, you’ll know how to easily submit your improvement to curl and potentially end up running in ten billion installations world-wide.

Register in advance for this webinar:
https://us02web.zoom.us/webinar/register/WN_poAshmaRT0S02J7hNduE7g

curl ootw: –path-as-is

Previous options of the week.

--path-as-is is a boolean option that was added in curl 7.42.0.

Path normalization in URLs

I hope it isn’t a surprise to you that curl works on URLs. It’s one of the fundamental pillars of curl. The “URLs” curl work with are actually called “URIs” in the IETF specs and the primary specification for them is RFC 3986. (But also: my URL is not your URL…)

A URL can be split up into several different components, which is typically done by the “URL parser” in a program like curl. For example , we can identify a scheme, a host name and a path.

When a program is given a URL, and the program has identified the path part of that URL – it is supposed to “Remove Dot Segments” (to use the wording from RFC 3986) before that path is used.

Remove Dot Segments

Let me show you this with an example to make it clear. Ponder that you pass this URL to curl: "https://example.org/hello/../to/../your/../file". Those funny dot-dot sequences in there is traditional directory traversal speak for “one directory up”, while a single "./" means in the same directory.

RFC 3986 says these sequences should be removed, so curl will iterate and remove them accordingly. A sequence like "word/../" will effectively evaluate to nothing. The example URL above will be massaged into the final version: "https://example.org/file" and so curl will ask the server for just /file.

Compare the HTTP requests

Seen as pure HTTP 1.1, the result of the command line used without --path-as-is:

GET /file HTTP/1.1
Host: example.org
user-agent: curl/7.71.0
accept: */*

Same command line, with --path-as-is:

GET /hello/../to/../your/../file HTTP/1.1
Host: example.org
user-agent: curl/7.71.1
accept: */*

Trick thy server

HTTP servers have over the years been found to have errors and mistakes in how they handle paths and a common way to exploit such flaws has been to pass on exactly this kind of dot-dot sequences to servers.

The very minute curl started removing these sequences (as the spec tells us) security researcher objected and asked for ways to tell curl to not do this. Enter --path-as-is. Use this option to make curl send the path exactly as provided in the URL, without removing any dot segments.

Related options

Other curl options that allow you to customize HTTP request details include --header, --request and --request-target.