curl ootw: –quote

Previous command line options of the week.

This option is called -Q in its short form, --quote in its long form. It has existed for as long as curl has existed.

Quote?

The name for this option originates from the traditional unix command ‘ftp’, as it typically has a command called exactly this: quote. The quote command for the ftp client is a way to send an exact command, as written, to the server. Very similar to what --quote does.

FTP, FTPS and SFTP

This option was originally made for supported only for FTP transfers but when we added support for FTPS, it worked there too automatically.

When we subsequently added SFTP support, even such users occasionally have a need for this style of extra commands so we made curl support it there too. Although for SFTP we had to do it slightly differently as SFTP as a protocol can’t actually send commands verbatim to the server as we can with FTP(S). I’ll elaborate a bit more below.

Sending FTP commands

The FTP protocol is a command/response protocol for which curl needs to send a series of commands to the server in order to get the transfer done. Commands that log in, changes working directories, sets the correct transfer mode etc.

Asking curl to access a specific ftp:// URL more or less converts into a command sequence.

The --quote option provides several different ways to insert custom FTP commands into the series of commands curl will issue. If you just specify a command to the option, it will be sent to the server before the transfer takes places – even before it changes working directory.

If you prefix the command with a minus (-), the command will instead be send after a successful transfer.

If you prefix the command with a plus (+), the command will run immediately before the transfer after curl changed working directory.

As a second (!) prefix you can also opt to insert an asterisk (*) which then tells curl that it should continue even if this command would cause an error to get returned from the server.

The actually specified command is a string the user specifies and it needs to be a correct FTP command because curl won’t even try to interpret it but will just send it as-is to the server.

FTP examples

For example, remove a file from the server after it has been successfully downloaded:

curl -O ftp://ftp.example/file -Q '-DELE file'

Issue a NOOP command after having logged in:

curl -O ftp://user:password@ftp.example/file -Q 'NOOP'

Rename a file remotely after a successful upload:

curl -T infile ftp://upload.example/dir/ -Q "-RNFR infile" -Q "-RNTO newname"

Sending SFTP commands

Despite sounding similar, SFTP is a very different protocol than FTP(S). With SFTP the access is much more low level than FTP and there’s not really a concept of command and response. Still, we’ve created a set of command for the --quote option for SFTP that lets the users sort of pretend that it works the same way.

Since there is no sending of the quote commands verbatim in the SFTP case, like curl does for FTP, the commands must instead be supported by curl and get translated into their underlying SFTP binary protocol bits.

In order to support most of the basic use cases people have reportedly used with curl and FTP over the years, curl supports the following commands for SFTP: chgrp, chmod, chown, ln, mkdir, pwd, rename, rm, rmdir and symlink.

The minus and asterisk prefixes as described above work for SFTP too (but not the plus prefix).

Example, delete a file after a successful download over SFTP:

curl -O sftp://example/file -Q '-rm file'

Rename a file on the target server after a successful upload:

curl -T infile sftp://example/dir/ -Q "-rename infile newname"

SSH backends

The SSH support in curl is powered by a third party SSH library. When you build curl, there are three different libraries to select from and they will have a slightly varying degree of support. The libssh2 and libssh backends are pretty much feature complete and have been around for a while, where as the wolfSSH backend is more bare bones with less features supported but at much smaller footprint.

Related options

--request changes the actual command used to invoke the transfer when listing directories with FTP.

curl 7.69.0 ssh++ and tls–

There has been 56 days since the previous release. As always, download the latest version over at curl.haxx.se.

Perhaps the best news this time is the complete lack of any reported (or fixed) security issues?

Numbers

the 189th release
3 changes
56 days (total: 8,020)

123 bug fixes (total: 5,911)
233 commits (total: 25,357)
0 new public libcurl function (total: 82)
1 new curl_easy_setopt() option (total: 270)

1 new curl command line option (total: 230)
69 contributors, 39 new (total: 2,127)
32 authors, 15 new (total: 771)
0 security fixes (total: 93)
0 USD paid in Bug Bounties

Contributors

Let me first highlight these lovely facts about the community effort that lies behind this curl release!

During the 56 days it took us to produce this particular release, 69 persons contributed to what it is. 39 friends in this crowd were first-time contributors. That’s more than one newcomer every second day. Reporting bugs and providing code or documentation are the primary ways people contribute.

We landed commits authored by 32 individual humans, and out of those 15 were first-time authors! This means that we’ve maintained an average of well over 5 first-time authors per month for the last several years.

The making of curl is a team effort. And we have a huge team!

Changes

In this release cycle we finally removed all traces of support for PolarSSL in the TLS related code. This library doesn’t get updates anymore and has effectively been superseded by its sibling mbedTLS – which we already support since years back. There’s really no reason for anyone to hang on to PolarSSL anymore. Move over to a modern library. wolfSSL, mbedTLS and BearTLS are all similar in spirit and focus. curl now supports 13 different TLS libraries.

We added support for a command line option (--mail-rcpt-allowfails) as well as a libcurl option (CURLOPT_MAIL_RCPT_ALLLOWFAILS) that allows an application to tell libcurl that recipients are fine to fail, as long as at least one is fine. This makes it much easier for applications to send mails to a series of addresses, out of which perhaps a few will fail immediately.

We landed initial support for wolfSSH as a new SSH backend in curl for SFTP transfers.

My top-11 favorite bug-fixes

As usual we’ve landed over a hundred bug-fixes, where most are minor. Here are eleven of the fixes I think stand out a little:

SOCKS connects are now non-blocking

For a very long time we’ve had this outstanding issue that libcurl did the connection phase to SOCKS proxies in a blocking fashion. This of course had unfortunate side-effects if you do many parallel transfers and maybe use slow or remote SOCKS proxies as then each such connection would starve out the others for the duration of the connect handshake.

Now that’s history and libcurl performs SOCKS connection establishment totally non-blocking!

Improved alt-svc parsing

Turns out I had misunderstood the spec a little and the parser needed to be fixed to better deal with some of the real-world Alt-Svc: response headers out there! A fine side effect of more people trying out our early HTTP/3 support, as this is the first major use case of this header in the wild.

Atomic cookie and alt-svc file saves

When saving files to disk, libcurl would previously simply open the file, write all the contents to it and then close it. This caused issues for multi-threaded libcurl users that potentially could start to try to use the saved file before it was done saving, and thus would end up reading partial file. This concerns both cookie and alt-svc usage.

Starting now, libcurl will always save these files into a temporary file with a random suffix while writing data to them, and then when everything is complete, rename the file over to the actual and proper file name. This will make the saving (appear) atomic to all consumers of such files, even in multi-threaded scenarios.

HTTP/2 stream pauses

An application using libcurl to receive a download can tell libcurl to pause the transfer at any given moment. Typically this is used by applications that for some reason consumes the incoming data slowly or has to use small local buffer for it or similar. The transfer is then typically “unpaused” again within shortly and the data can continue flowing in.

When this kind of pause was done for a HTTP/2 stream over a connection that also had other streams going, libcurl previously didn’t actually pause the transfer for real but only “faked” it to the application and instead buffered the incoming data in memory.

In 7.69.0, libcurl will now actually pause HTTP/2 streams for real, even if it might still need to buffer up to a full HTTP/2 window size of data in memory. The HTTP/2 window size is now also reduced to 32MB to at least limit the worst case buffer need to that amount. We might need to come back to this in a future to provide better means for applications to deal with the window size and buffering requirements…

Increased Expect: threshold

Previously, libcurl would add the Expect: header if more than 1024 bytes were sent in the body. Now that limit is raised to 1 MB instead.

HTTP 417 treatment

A 417 HTTP response code from a server when curl has issued a request using the Expect: header means the request should be redone without that header. Starting now, curl does so!

openssl: make CURLINFO_CERTINFO not truncate x509v3 fields

This feature that lets an application extract certificate information from a site curl communicates to had an unfortunate truncating issue that made very long fields occasionally not get returned in full!

smtp: Support UTF-8 based host names

curl SMTP support is now improved when it comes to using IDN names in the URL host name and also in the email addresses provided to some of the SMTP related options and commands.

include: remove non-curl prefixed defines

The public libcurl include files now finally define or provide not a single symbol that isn’t correctly prefixed with either curl or libcurl (in various case versions). This was a cleanup that’s been postponed much too long but a move that should make the headers less likely to ever collide or cause problems in combination with other headers or projects. Keeping within one’s naming scope is important for a good ecosystem citizen.

curl_global_init polish

We made the function assume the EINTR flag by default and we moved away the IPv6 check. Two small parts in an ongoing work to eventually make it thread-safe.

CIFuzz and more Azure jobs

We bumped the number of CI jobs per commit significantly this release cycle, up from 61 to 72.

  1. CIFuzz is a great new addition which runs the commit/PR through the OSS-Fuzz fuzzers for a brief time to at verify that it doesn’t at least trigger a problem already then. For now we have that time limit set to 10 minutes.
  2. A lot of Marc Hörsken’s Windows builds have been moved over from his more or less custom buildbot setup over to Azure Pipelines.

Next?

File your pull requests. We welcome them!

curl ootw: –next

(previously posted options of the week)

--next has the short option alternative -:. (Right, that’s dash colon as the short version!)

Working with multiple URLs

You can tell curl to send a string as a POST to a URL. And you can easily tell it to send that same data to three different URLs. Like this:

curl -d data URL1 URL2 URL2

… and you can easily ask curl to issue a HEAD request to two separate URLs, like this:

curl -I URL1 URL2

… and of course the easy, just GET four different URLs and send them all to stdout:

curl URL1 URL2 URL3 URL4

Doing many at once is beneficial

Of course you could also just invoke curl two, three or four times serially to accomplish the same thing. But if you do, you’d lose curl’s ability to use its DNS cache, its connection cache and TLS resumed sessions – as they’re all cached in memory. And if you wanted to use cookies from one transfer to the next, you’d have to store them on disk since curl can’t magically keep them in memory between separate command line invokes.

Sometimes you want to use several URLs to make curl perform better.

But what if you want to send different data in each of those three POSTs? Or what if you want to do both GET and POST requests in the same command line? --next is here for you!

--next is a separator!

Basically, --next resets the command line parser but lets curl keep all transfer related states and caches. I think it is easiest to explain with a set of examples.

Send a POST followed by a GET:

curl -d data URL1 --next URL2

Send one string to URL1 and another string to URL2, both as POST:

curl -d one URL1 --next -d another URL2

Send a GET, activate cookies and do a HEAD that then sends back matching cookies.

curl -b empty URL1 --next -I URL1

First upload a file with PUT, then download it again

curl -T file URL1 --next URL2

If you’re doing parallel transfers (with -Z), curl will do URLs in parallel only until the --next separator.

There’s no limit

As with most things curl, there’s no limit to the number of URLs you can specify on the command line and there’s no limit to the number of --next separators.

If your command line isn’t long enough to hold them all, write them in a file and point them out to curl with -K, --config.

Imagining a thread-safe curl_global_init

libcurl is thread-safe

That’s the primary message that we push and that’s important to remember. You can write a multi-threaded application that does concurrent Internet transfers with libcurl in as many threads as you like and they fly just fine.

But

But there are nuances and details of course and the devil is always in those. The main obstacle that then and again causes problems for users is the curl_global_init() function. But how come?

curl_global_init

Back in the day, libcurl developers realized that when we work with in particular a lot of third party TLS libraries, they feature init functions that need to be called first, before any other function in those libraries are called. And they typically are all marked as not thread-safe, we have to call those functions knowing that no other thread calls them. This was the case for GnuTLS (before version 3.3.0) and it was the case for OpenSSL (until they shipped version 1.1.1) etc.

In order for libcurl to adhere to those restrictions that weren’t our own inventions, we added a function to libcurl called curl_global_init() that then in itself inherited those non-thread safe characteristics. We documented the function as not thread-safe.

Time passed, and as we now had a function that is a global initialization function that is also marked not thread-safe, it was an attractive point to add more and other functionality for the library. Other global initializations that then weren’t thread-safe either – as that wasn’t any point in doing anyway since the entire thing wasn’t thread-safe to begin with.

The problems

Having the global init function not being thread-safe has caused problems to users, mostly in use cases where for example they use libcurl in a plugin-like cases where you can’t know if you’re the only user in the process.

We’ve then mostly been longing for better days and blamed the third party libraries that forced us into this corner.

Third parties shaped up, we didn’t

One day in recent times when we looked at what third party libraries a typical libcurl user uses in a modern system, we see that they’ve all fixed their init functions! OpenSSL and GnuTLS that once were part of the original reasons for this function have fixed their issues. They no longer have thread-unsafe init functions.

But libcurl still does! 🙁

While we were initially pushed into this unfortunate corner because of limitations in third party libraries, we had added our own init functions into that function that aren’t thread-safe and now, even though the third party libraries had done the right thing over time, we found ourselves no longer able to put the blame on others. Now we need to clean up our own backyard!

Fix it!

In libcurl 7.69.0 we’ve started this journey with two distinct changes. The goal is to make the function thread-safe under the condition that libcurl is built with only thread-safe dependencies, and we should make configure etc check if that’s the case.

1: EINTR handling

Since libcurl 7.30.0, we’ve provided a flag in the curl_global_init() function to let libcurl users ask for EINTR to actually abort internal loops. Starting now, that flag has no meaning and this is now default behavior. No need to store this state globally anywhere.

2: Working IPv6

At least in the past, it has been common with systems that are IPv6-capable at build-times but that can’t actually create IPv6 sockets and therefor they can’t actually use IPv6. This was previously checked for, once, in the global init and then IPv6 is disabled for everyone. Without a global state, we’ve been forced to move this check and it is now instead done for every created multi handle. A minuscule performance hit for thread safety.

Left to do until completely thread-safe

The transition isn’t completed. The low hanging fruit has been picked, here are some remaining issues to solve:

When is it thread-safe?

Since curl can be built with a number of different third party libraries, including version old versions, we need to make the configure script know what versions of what libraries that are safe so that it can tell. But how are libcurl application authors supposed to know? Can we figure how a way to tell them?

curl_version*

Both curl_version() and curl_version_info() store information in static buffers and return information pointing to that memory. They’re currently setup in the global init so they work safely from multiple threads today, but we probably need to create new, alternative versions of them, that instead allocate heap memory to return the info in. Or possibly store the info in memory associated with a handle.

Update: Patrick Monnerat made me realize that a possibly even better way to fix them is to make sure they generate the same output in a way that repeated or concurrent invokes are fine.

Reference counter

There’s a counter counting calls to curl_global_init() so that the corresponding number of calls to curl_global_cleanup() is required before things are actually cleaned up.

This is a hard nut to crack without a global context and no mutex locks. I haven’t yet figured out how to solve this. If you have ideas, I’m listening!

When?

There’s no fixed time schedule for when these remaining nits are supposed to be fixed, but I hope to work on them going forward and I will appreciate all the help I can get and if things just progress, I would imagine we can end 2020 with a libcurl with these flaws fixed!

Oh, and we also really need to make sure that we don’t simultaneously come up with or think of new thread unsafe functionality for the init function..

Credits

Top image by Andreas Lischka from Pixabay