Category Archives: cURL and libcurl

curl and/or libcurl related

curl 7.69.0 ssh++ and tls–

There has been 56 days since the previous release. As always, download the latest version over at curl.haxx.se.

Perhaps the best news this time is the complete lack of any reported (or fixed) security issues?

Numbers

the 189th release
3 changes
56 days (total: 8,020)

123 bug fixes (total: 5,911)
233 commits (total: 25,357)
0 new public libcurl function (total: 82)
1 new curl_easy_setopt() option (total: 270)

1 new curl command line option (total: 230)
69 contributors, 39 new (total: 2,127)
32 authors, 15 new (total: 771)
0 security fixes (total: 93)
0 USD paid in Bug Bounties

Contributors

Let me first highlight these lovely facts about the community effort that lies behind this curl release!

During the 56 days it took us to produce this particular release, 69 persons contributed to what it is. 39 friends in this crowd were first-time contributors. That’s more than one newcomer every second day. Reporting bugs and providing code or documentation are the primary ways people contribute.

We landed commits authored by 32 individual humans, and out of those 15 were first-time authors! This means that we’ve maintained an average of well over 5 first-time authors per month for the last several years.

The making of curl is a team effort. And we have a huge team!

Changes

In this release cycle we finally removed all traces of support for PolarSSL in the TLS related code. This library doesn’t get updates anymore and has effectively been superseded by its sibling mbedTLS – which we already support since years back. There’s really no reason for anyone to hang on to PolarSSL anymore. Move over to a modern library. wolfSSL, mbedTLS and BearTLS are all similar in spirit and focus. curl now supports 13 different TLS libraries.

We added support for a command line option (--mail-rcpt-allowfails) as well as a libcurl option (CURLOPT_MAIL_RCPT_ALLLOWFAILS) that allows an application to tell libcurl that recipients are fine to fail, as long as at least one is fine. This makes it much easier for applications to send mails to a series of addresses, out of which perhaps a few will fail immediately.

We landed initial support for wolfSSH as a new SSH backend in curl for SFTP transfers.

My top-11 favorite bug-fixes

As usual we’ve landed over a hundred bug-fixes, where most are minor. Here are eleven of the fixes I think stand out a little:

SOCKS connects are now non-blocking

For a very long time we’ve had this outstanding issue that libcurl did the connection phase to SOCKS proxies in a blocking fashion. This of course had unfortunate side-effects if you do many parallel transfers and maybe use slow or remote SOCKS proxies as then each such connection would starve out the others for the duration of the connect handshake.

Now that’s history and libcurl performs SOCKS connection establishment totally non-blocking!

Improved alt-svc parsing

Turns out I had misunderstood the spec a little and the parser needed to be fixed to better deal with some of the real-world Alt-Svc: response headers out there! A fine side effect of more people trying out our early HTTP/3 support, as this is the first major use case of this header in the wild.

Atomic cookie and alt-svc file saves

When saving files to disk, libcurl would previously simply open the file, write all the contents to it and then close it. This caused issues for multi-threaded libcurl users that potentially could start to try to use the saved file before it was done saving, and thus would end up reading partial file. This concerns both cookie and alt-svc usage.

Starting now, libcurl will always save these files into a temporary file with a random suffix while writing data to them, and then when everything is complete, rename the file over to the actual and proper file name. This will make the saving (appear) atomic to all consumers of such files, even in multi-threaded scenarios.

HTTP/2 stream pauses

An application using libcurl to receive a download can tell libcurl to pause the transfer at any given moment. Typically this is used by applications that for some reason consumes the incoming data slowly or has to use small local buffer for it or similar. The transfer is then typically “unpaused” again within shortly and the data can continue flowing in.

When this kind of pause was done for a HTTP/2 stream over a connection that also had other streams going, libcurl previously didn’t actually pause the transfer for real but only “faked” it to the application and instead buffered the incoming data in memory.

In 7.69.0, libcurl will now actually pause HTTP/2 streams for real, even if it might still need to buffer up to a full HTTP/2 window size of data in memory. The HTTP/2 window size is now also reduced to 32MB to at least limit the worst case buffer need to that amount. We might need to come back to this in a future to provide better means for applications to deal with the window size and buffering requirements…

Increased Expect: threshold

Previously, libcurl would add the Expect: header if more than 1024 bytes were sent in the body. Now that limit is raised to 1 MB instead.

HTTP 417 treatment

A 417 HTTP response code from a server when curl has issued a request using the Expect: header means the request should be redone without that header. Starting now, curl does so!

openssl: make CURLINFO_CERTINFO not truncate x509v3 fields

This feature that lets an application extract certificate information from a site curl communicates to had an unfortunate truncating issue that made very long fields occasionally not get returned in full!

smtp: Support UTF-8 based host names

curl SMTP support is now improved when it comes to using IDN names in the URL host name and also in the email addresses provided to some of the SMTP related options and commands.

include: remove non-curl prefixed defines

The public libcurl include files now finally define or provide not a single symbol that isn’t correctly prefixed with either curl or libcurl (in various case versions). This was a cleanup that’s been postponed much too long but a move that should make the headers less likely to ever collide or cause problems in combination with other headers or projects. Keeping within one’s naming scope is important for a good ecosystem citizen.

curl_global_init polish

We made the function assume the EINTR flag by default and we moved away the IPv6 check. Two small parts in an ongoing work to eventually make it thread-safe.

CIFuzz and more Azure jobs

We bumped the number of CI jobs per commit significantly this release cycle, up from 61 to 72.

  1. CIFuzz is a great new addition which runs the commit/PR through the OSS-Fuzz fuzzers for a brief time to at verify that it doesn’t at least trigger a problem already then. For now we have that time limit set to 10 minutes.
  2. A lot of Marc Hörsken’s Windows builds have been moved over from his more or less custom buildbot setup over to Azure Pipelines.

Next?

File your pull requests. We welcome them!

curl ootw: –next

(previously posted options of the week)

--next has the short option alternative -:. (Right, that’s dash colon as the short version!)

Working with multiple URLs

You can tell curl to send a string as a POST to a URL. And you can easily tell it to send that same data to three different URLs. Like this:

curl -d data URL1 URL2 URL2

… and you can easily ask curl to issue a HEAD request to two separate URLs, like this:

curl -I URL1 URL2

… and of course the easy, just GET four different URLs and send them all to stdout:

curl URL1 URL2 URL3 URL4

Doing many at once is beneficial

Of course you could also just invoke curl two, three or four times serially to accomplish the same thing. But if you do, you’d lose curl’s ability to use its DNS cache, its connection cache and TLS resumed sessions – as they’re all cached in memory. And if you wanted to use cookies from one transfer to the next, you’d have to store them on disk since curl can’t magically keep them in memory between separate command line invokes.

Sometimes you want to use several URLs to make curl perform better.

But what if you want to send different data in each of those three POSTs? Or what if you want to do both GET and POST requests in the same command line? --next is here for you!

--next is a separator!

Basically, --next resets the command line parser but lets curl keep all transfer related states and caches. I think it is easiest to explain with a set of examples.

Send a POST followed by a GET:

curl -d data URL1 --next URL2

Send one string to URL1 and another string to URL2, both as POST:

curl -d one URL1 --next -d another URL2

Send a GET, activate cookies and do a HEAD that then sends back matching cookies.

curl -b empty URL1 --next -I URL1

First upload a file with PUT, then download it again

curl -T file URL1 --next URL2

If you’re doing parallel transfers (with -Z), curl will do URLs in parallel only until the --next separator.

There’s no limit

As with most things curl, there’s no limit to the number of URLs you can specify on the command line and there’s no limit to the number of --next separators.

If your command line isn’t long enough to hold them all, write them in a file and point them out to curl with -K, --config.

Imagining a thread-safe curl_global_init

libcurl is thread-safe

That’s the primary message that we push and that’s important to remember. You can write a multi-threaded application that does concurrent Internet transfers with libcurl in as many threads as you like and they fly just fine.

But

But there are nuances and details of course and the devil is always in those. The main obstacle that then and again causes problems for users is the curl_global_init() function. But how come?

curl_global_init

Back in the day, libcurl developers realized that when we work with in particular a lot of third party TLS libraries, they feature init functions that need to be called first, before any other function in those libraries are called. And they typically are all marked as not thread-safe, we have to call those functions knowing that no other thread calls them. This was the case for GnuTLS (before version 3.3.0) and it was the case for OpenSSL (until they shipped version 1.1.1) etc.

In order for libcurl to adhere to those restrictions that weren’t our own inventions, we added a function to libcurl called curl_global_init() that then in itself inherited those non-thread safe characteristics. We documented the function as not thread-safe.

Time passed, and as we now had a function that is a global initialization function that is also marked not thread-safe, it was an attractive point to add more and other functionality for the library. Other global initializations that then weren’t thread-safe either – as that wasn’t any point in doing anyway since the entire thing wasn’t thread-safe to begin with.

The problems

Having the global init function not being thread-safe has caused problems to users, mostly in use cases where for example they use libcurl in a plugin-like cases where you can’t know if you’re the only user in the process.

We’ve then mostly been longing for better days and blamed the third party libraries that forced us into this corner.

Third parties shaped up, we didn’t

One day in recent times when we looked at what third party libraries a typical libcurl user uses in a modern system, we see that they’ve all fixed their init functions! OpenSSL and GnuTLS that once were part of the original reasons for this function have fixed their issues. They no longer have thread-unsafe init functions.

But libcurl still does! 🙁

While we were initially pushed into this unfortunate corner because of limitations in third party libraries, we had added our own init functions into that function that aren’t thread-safe and now, even though the third party libraries had done the right thing over time, we found ourselves no longer able to put the blame on others. Now we need to clean up our own backyard!

Fix it!

In libcurl 7.69.0 we’ve started this journey with two distinct changes. The goal is to make the function thread-safe under the condition that libcurl is built with only thread-safe dependencies, and we should make configure etc check if that’s the case.

1: EINTR handling

Since libcurl 7.30.0, we’ve provided a flag in the curl_global_init() function to let libcurl users ask for EINTR to actually abort internal loops. Starting now, that flag has no meaning and this is now default behavior. No need to store this state globally anywhere.

2: Working IPv6

At least in the past, it has been common with systems that are IPv6-capable at build-times but that can’t actually create IPv6 sockets and therefor they can’t actually use IPv6. This was previously checked for, once, in the global init and then IPv6 is disabled for everyone. Without a global state, we’ve been forced to move this check and it is now instead done for every created multi handle. A minuscule performance hit for thread safety.

Left to do until completely thread-safe

The transition isn’t completed. The low hanging fruit has been picked, here are some remaining issues to solve:

When is it thread-safe?

Since curl can be built with a number of different third party libraries, including version old versions, we need to make the configure script know what versions of what libraries that are safe so that it can tell. But how are libcurl application authors supposed to know? Can we figure how a way to tell them?

curl_version*

Both curl_version() and curl_version_info() store information in static buffers and return information pointing to that memory. They’re currently setup in the global init so they work safely from multiple threads today, but we probably need to create new, alternative versions of them, that instead allocate heap memory to return the info in. Or possibly store the info in memory associated with a handle.

Update: Patrick Monnerat made me realize that a possibly even better way to fix them is to make sure they generate the same output in a way that repeated or concurrent invokes are fine.

Reference counter

There’s a counter counting calls to curl_global_init() so that the corresponding number of calls to curl_global_cleanup() is required before things are actually cleaned up.

This is a hard nut to crack without a global context and no mutex locks. I haven’t yet figured out how to solve this. If you have ideas, I’m listening!

When?

There’s no fixed time schedule for when these remaining nits are supposed to be fixed, but I hope to work on them going forward and I will appreciate all the help I can get and if things just progress, I would imagine we can end 2020 with a libcurl with these flaws fixed!

Oh, and we also really need to make sure that we don’t simultaneously come up with or think of new thread unsafe functionality for the init function..

Credits

Top image by Andreas Lischka from Pixabay

Expect: tweaks in curl

One of the persistent myths about HTTP is that it is “a simple protocol”.

Expect: – not always expected

One of the dusty spec corners of HTTP/1.1 (Section 5.1.1 of RFC 7231) explains how the Expect: header works. This is still today in 2020 one of the HTTP request headers that is very commonly ignored by servers and intermediaries.

Background

HTTP/1.1 is designed for being sent over TCP (and possibly also TLS) in a serial manner. Setting up a new connection is costly, both in terms of CPU but especially in time – requiring a number of round-trips. (I’ll detail further down how HTTP/2 fixes all these issues in a much better way.)

HTTP/1.1 provides a number of ways to allow it to perform all its duties without having to shut down the connection. One such an example is the ability to tell a client early on that it needs to provide authentication credentials before the clients sends of a large payload. In order to maintain the TCP connection, a client can’t stop sending a HTTP payload prematurely! When the request body has started to get transmitted, the only way to stop it before the end of data is to cut off the connection and create a new one – wasting time and CPU…

“We want a 100 to continue”

A client can include a header in its outgoing request to ask the server to first acknowledge that everything is fine and that it can continue to send the “payload” – or it can return status codes that informs the client that there are other things it needs to fulfill in order to have the request succeed. Most such cases typically that involves authentication.

This “first tell me it’s OK to send it before I send it” request header looks like this:

Expect: 100-continue

Servers

Since this mandatory header is widely not understood or simply ignored by HTTP/1.1 servers, clients that issue this header will have a rather short timeout and if no response has been received within that period it will proceed and send the data even without a 100.

The timeout thing is happening so often that removing the Expect: header from curl requests is a very common answer to question on how to improve POST or PUT requests with curl, when it works against such non-compliant servers.

Popular browsers

Browsers are widely popular HTTP clients but none of the popular ones ever use this. In fact, very few clients do. This is of course a chicken and egg problem because servers don’t support it very well because clients don’t and client’s don’t because servers don’t support it very well…

curl sends Expect:

When we implemented support for HTTP/1.1 in curl back in 2001, we wanted it done proper. We made it have a short, 1000 milliseconds, timeout waiting for the 100 response. We also made curl automatically include the Expect: header in outgoing requests if we know that the body is larger than NNN or we don’t know the size at all before-hand (and obviously, we don’t do it if we send the request chunked-encoded).

The logic being there that if the data amount is smaller than NNN, then the waste is not very big and we can just as well send it (even if we risk sending it twice) as waiting for a response etc is just going to be more time consuming.

That NNN margin value (known in the curl sources as the EXPECT_100_THRESHOLD) in curl was set to 1024 bytes already then back in the early 2000s.

Bumping EXPECT_100_THRESHOLD

Starting in curl 7.69.0 (due to ship on March 4, 2020) we catch up a little with the world around us and now libcurl will only automatically set the Expect: header if the amount of data to send in the body is larger than 1 megabyte. Yes, we raise the limit by 1024 times.

The reasoning is of course that for most Internet users these days, data sizes below a certain size isn’t even noticeable when transferred and so we adapt. We then also reduce the amount of problems for the very small data amounts where waiting for the 100 continue response is a waste of time anyway.

Credits: landed in this commit. (sorry but my WordPress stupidly can’t show the proper Asian name of the author!)

417 Expectation Failed

One of the ways that a HTTP/1.1 server can deal with an Expect: 100-continue header in a request, is to respond with a 417 code, which should tell the client to retry the same request again, only without the Expect: header.

While this response is fairly uncommon among servers, curl users who ran into 417 responses have previously had to resort to removing the Expect: header “manually” from the request for such requests. This was of course often far from intuitive or easy to figure out for users. A source for grief and agony.

Until this commit, also targeted for inclusion in curl 7.69.0 (March 4, 2020). Starting now, curl will automatically act on 417 response if it arrives as a consequence of using Expect: and then retry the request again without using the header!

Credits: I wrote the patch.

HTTP/2 avoids this all together

With HTTP/2 (and HTTP/3) this entire thing is a non-issue because with these modern protocol versions we can abort a request (stop a stream) prematurely without having to sacrifice the connection. There’s therefore no need for this crazy dance anymore!

curl ootw: –ftp-pasv

(previous options of the week)

--ftp-pasv has no short version of the option.

This option was added in curl 7.11.0, January 2004.

FTP uses dual connections

FTP is a really old protocol with traces back to the early 1970s, actually from before we even had TCP. It truly comes from a different Internet era.

When a client speaks FTP, it first sets up a control connection to the server, issues commands to it that instructs it what to do and then when it wants to transfer data, in either direction, a separate data connection is required; a second TCP connection in addition to the initial one.

This second connection can be setup two different ways. The client instructs the server which way to use:

  1. the client connects again to the server (passive)
  2. the server connects back to the client (active)

Passive please

--ftp-pasv is the default behavior of curl and asks to do the FTP transfer in passive mode. As this is the default, you will rarely see this option used. You could for example use it in cases where you set active mode in your .curlrc file but want to override that decision for a single specific invocation.

The opposite behavior, active connection, is called for with the --ftp-port option. An active connection asks the server to do a TCP connection back to the client, an action that these days is rarely functional due to firewall and network setups etc.

FTP passive behavior detailed

FTP was created decades before IPv6 was even thought of, so the original FTP spec assumes and uses IPv4 addresses in several important places, including how to ask the server for setting up a passive data transfer.

EPSV is the “new” FTP command that was introduced to enable FTP over IPv6. I call it new because it wasn’t present in the original RFC 959 that otherwise dictates most of how FTP works. EPSV is an extension.

EPSV was for a long time only supported by a small fraction of FTP servers and it probably still exists such servers, so in case invoking this command is a problem, curl needs to fallback to the original command for this purpose: PASV.

curl first tries EPSV and if that fails, it sends PASV. The first of those commands that is accepted by the server will make the server wait for the client to connect to it (again) on fresh new TCP port number.

Problems with the dual connections

Your firewall needs to allow your client to connect to the new port number the server opened up for the data connection. This is particularly complicated if you enable FTPS (encrypted FTP) as then the new port number is invisible to middle-boxes such as firewalls.

Dual connections also often cause problems because the control connection will be idle during the transfer and that idle connection sometimes get killed by NATs or other middle-boxes due to that inactivity.

Enabling active mode is usually even worse than passive mode in modern network setups, as then the firewall needs to allow an outside TCP connection to come in and connect to the client on a given port number.

Example

Ask for a file using passive FTP mode:

curl --ftp-pasv ftp://ftp.example.com/cake-recipe.txt

Related options

--disable-epsv will prevent curl from using the EPSV command – which then also makes this transfer not work for IPv6. --ftp-port switches curl to active connection.

--ftp-skip-pasv-ip tells curl that the IP address in the server’s PASV response should be ignored and the IP address of the control connection should instead be used when the second connection is setup.

The command line options we deserve

A short while ago curl‘s 230th command line option was added (it was --mail-rcpt-allowfails). Two hundred and thirty command line options!

A look at curl history shows that on average we’ve added more than ten new command line options per year for very long time. As we don’t see any particular slowdown, I think we can expect the number of options to eventually hit and surpass 300.

Is this manageable? Can we do something about it? Let’s take a look.

Why so many options?

There are four primary explanations why there are so many:

  1. curl supports 24 different transfer protocols, many of them have protocol specific options for tweaking their operations.
  2. curl’s general design is to allow users to individual toggle and set every capability and feature individually. Every little knob can be on or off, independently of the other knobs.
  3. curl is feature-packed. Users can customize and adapt curl’s behavior to a very large extent to truly get it to do exactly the transfer they want for an extremely wide variety of use cases and setups.
  4. As we work hard at never breaking old scripts and use cases, we don’t remove old options but we can add new ones.

An entire world already knows our options

curl is used and known by a large amount of humans. These humans have learned the way of the curl command line options, for better and for worse. Some thinks they are strange and hard to use, others see the logic behind them and the rest simply accepts that this is the way they work. Changing options would be conceived as hostile towards all these users.

curl is used by a very large amount of existing scripts and programs that have been written and deployed and sit there and do their work day in and day out. Changing options, even ever so slightly, risk breaking some to many of these scripts and make a lot of developers out there upset and give them everything from nuisance to a hard time.

curl is the documented way to use APIs, REST calls and web services in numerous places. Lots of examples and tutorials spell out how to use curl to work with services all over the world for all sorts of interesting things. Examples and tutorials written ages ago that showcase curl still work because curl doesn’t break behavior.

curl command lines are even to some extent now used as a translation language between applications:

All the four major web browsers let you export HTTP requests to curl command lines that you can then execute from your shell prompts or scripts. Other web tools and proxies can also do this.

There are now also tools that can import said curl command lines so that they can figure out what kind of transfer that was exported from those other tools. The applications that import these command lines then don’t want to actually run curl, they want to figure out details about the request that the curl command line would have executed (and instead possibly run it themselves). The curl command line has become a web request interchange language!

There are also a lot of web services provided that can convert a curl command line into a source code snippet in a variety of languages for doing the same request (using the language’s native preferred method). A few examples of this are: curl as DSL, curl to Python Requests, curl to Go, curl to PHP and curl to perl.

Can the options be enhanced?

I hope I’ve made it clear why we need to maintain the options we already support. However, it doesn’t limit what we can add or that we can’t add new ways of doing things. It’s just code, of course it can be improved.

We could add alternative options for existing ones that make sense, if there are particular ones that are considered complicated or messy.

We could add a new “mode” that would have a totally new set of options or new way of approaching what we think of options today.

Heck, we could even consider doing a separate tool next to curl that would similarly use libcurl for the transfers but offer a totally new command line option approach.

None of these options are ruled out as too crazy or far out. But they all of course require that someone think of them as a good ideas and is prepared to work on making it happen.

Are the options hard to use?

Oh what a subjective call.

Packing this many features into a single tool and having every single option and superpower intuitive and easy-to-use is perhaps not impossible but at least a very very hard task. Also, the curl command line options have been added organically over a period of over twenty years so some of them of could of course have been made a little smarter if we could’ve foreseen what would come or how the protocols would later change or improve.

I don’t think curl is hard to use (what you think I’m biased?).

Typical curl use cases often only need a very small set of options. Most users never learn or ever need to learn most curl options – but they are there to help out when the day comes and the user wants that particular extra quirk in their transfer.

Using any other tool, even those who pound their chest and call themselves user-friendly, if they grow features close to the amount of abilities that curl can do, such command lines also grow substantially and will no longer always be intuitive and self-explanatory. I don’t think a very advanced tool can remain easy to use in all circumstances. I think the aim should be to keep the commonly use cases easy. I think we’ve managed this in curl, in spite of having 230 different command line options.

Should we have another (G)UI?

Would a text-based or even graphical UI help or improve curl? Would you use one if it existed? What would it do and how would it work? Feel most welcome to tell me!

Related

See the cheat sheet refreshed and why not the command line option of the week series.

curl ootw: –mail-from

(older options of the week)

--mail-from has no short version. This option was added to curl 7.20.0 in February 2010.

I know this surprises many curl users: yes curl can both send and receive emails.

SMTP

curl can do “uploads” to an SMTP server, which usually has the effect that an email is delivered to somewhere onward from that server. Ie, curl can send an email.

When communicating with an SMTP server to send an email, curl needs to provide a certain data set and one of the mandatory fields of data that is necessary to provide is the email of the sender.

It should be noted that this is not necessary the same email as the one that is displayed in the From: field that the recipient’s email client will show. But it can be the same.

Example

curl --mail-from from@example.com --mail-rcpt to@example.com smtp://mail.example.com -T body.txt

Receiving emails

You can use curl to receive emails over POP3 or IMAP. SMTP is only for delivering email.

Related options

--mail-rcpt for the recipient(s), and --mail-auth for the authentication address.

curl is 8000 days old

Another pointless number that happens to be round and look nice so I feel a need to highlight it.

When curl was born WiFi didn’t exist yet. Smartphones and tablets weren’t invented. Other things that didn’t exist include YouTube, Facebook, Twitter, Instagram, Firefox, Chrome, Spotify, Google search, Wikipedia, Windows 98 or emojis.

curl was born in a different time, but also in the beginning of the explosion of the web and Internet Protocols. Just before the big growth wave.

In 1996 when I started working on the precursor to curl, there were around 250,000 web sites (sources vary slightly)..

In 1998 when curl shipped, the number of sites were already around 2,400,000. Ten times larger amount in just those two years.

In early 2020, the amount of web sites are around 1,700,000,000 to 2,000,000,000 (depending on who provides the stats). The number of web sites has thus grown at least 70,000% over curl’s 8000 days of life and perhaps as much as 8000 times the amount as when I first working with HTTP clients.

One of the oldest still available snapshots of the curl web site is from the end of 1998, when curl was just a little over 6 months old. On that page we can read the following:

That “massive popularity” looks charming and possibly a bit naive today. The number of monthly curl downloads have also possibly grown by 8,000 times or so – by estimates only, as most users download curl from other places than our web site these days. Even more users get it installed as part of their OS or bundled with something else.

Thank you for flying curl.

(This day occurs only a little over a month before curl turns 22, there will be much more navel-gazing then, I promise.)

Image by Annie Spratt from Pixabay

curl ootw: –keepalive-time

(previously blogged about options are listed here.)

This option is named --keepalive-time even if the title above ruins the double-dash (thanks for that WordPress!). This command line option was introduced in curl 7.18.0 back in early 2008. There’s no short version of it.

The option takes a numerical argument; number of seconds.

What’s implied in the option name and not spelled out is that the particular thing you ask to keep alive is a TCP connection. When the keepalive feature is not used, TCP connections typically don’t send anything at all if no data is transmitted.

Idle TCP connections

Silent TCP connections typically cause the two primary issues:

  1. Middle-boxes that track connections, such as your typical NAT boxes (home WiFi routers most notoriously) will consider silent connections “dead” after a certain period of time and drop all knowledge about them, leading to the connection non functioning when the client (or server) later wants to resume operation of it.
  2. Neither side of the connection will notice when the network between them breaks, as it takes actual traffic to do so. This is of course also a feature, because there’s no need to be alarmed by a breakage if there’s no traffic as it might be fine again when it eventually gets used again.

TCP stacks then typically implement a low-level feature where they can send a “ping” frame over the connection if it has been idle for a certain amount of time. This is the keepalive packet.

--keepalive-time <seconds> therefor sets the interval. After this many seconds of “silence” on the connection, there will be a keepalive packet sent. The packet is totally invisible to the applications on both sides but will maintain the connection through NATs better and if the connection is broken, this packet will make curl detect it.

Keepalive is not always enough

To complicate issues even further, there are also devices out there that will still close down connections if they only send TCP keepalive packets and no data for certain period. Several protocols on top of TCP have their own keepalive alternatives (sometimes called ping) for this and other reasons.

This aggressive style of closing connections without actual traffic TCP traffic typically hurts long-going FTP transfers. This, because FTP sets up two connections for a transfer, but the first one is the “control connection” and while a transfer is being delivered on the “data connection”, nothing happens over the first one. This can then result in the control connection being “dead” by the time the data transfer completes!

Default

The default keepalive time is 60 seconds. You can also disable keepalive completely with the --no-keepalive option.

The default time has been selected to be fairly low because many NAT routers out there in the wild are fairly aggressively and close idle connections already after two minutes (120) seconds.

For what protocols

This works for all TCP-based protocols, which is what most protocols curl speaks use. The only exception right now is TFTP. (See also QUIC below.)

Example

Change the interval to 3 minutes:

curl --keepalive-time 180 https://example.com/

Related options

A related functionality is the --speed-limit amd --speed-time options that will cancel a transfer if the transfer speed drops below a given speed for a certain time. Or just the --max-time that sets a global timeout for an entire operation.

QUIC?

Soon we will see QUIC getting used instead of TCP for some protocols: HTTP/3 being the first in line for that. We will have to see what exactly we do with this option when QUIC starts to get used and what the proper mapping and behavior shall be.

curl ootw: -U for proxy credentials

(older options of the week)

-U, --proxy-user

The short version of this option uses the uppercase letter ‘U’. It is important since the lower case letter ‘u’ is used for another option. The longer form of the option is spelled --proxy-user.

This command option existed already in the first ever curl release!

The man page‘s first paragraph describes this option as:

Specify the user name and password to use for proxy authentication.

Proxy

This option is for using a proxy. So let’s first briefly look at what a proxy is.

A proxy is a “middle man” in the communication between a client (curl) and a server (the one that holds the contents you want to download or will receive the content you want to upload).

The client communicates via this proxy to reach the server. When a proxy is used, the server communicates only with the proxy and the client also only communicates with the proxy:

curl <===> proxy <===> server

There exists several different types of proxies and a proxy can require authentication for it to allow it to be used.

Proxy authentication

Sometimes the proxy you want or need to use requires authentication, meaning that you need to provide your credentials to the proxy in order to be allowed to use it. The -U option is used to set the name and password used when authenticating with the proxy (separated by a colon).

You need to know this name and password, curl can’t figure them out – unless you’re on Windows and your curl is built with SSPI support as then it can magically use the current user’s credentials if you provide blank credentials in the option: -U :.

Security

Providing passwords in command lines is a bit icky. If you write it in a script, someone else might see the script and figure them out.

If the proxy communication is done in clear text (for example over HTTP) some authentication methods (for example Basic) will transmit the credentials in clear text across the network to the proxy, possibly readable by others.

Command line options may also appear in process listing so other users on the system can see them there – although curl will attempt to blank them out from ps outputs if the system supports it (Linux does).

Needs other options too

A typical command line that use -U also sets at least which proxy to use, with the -x option with a URL that specifies which type of proxy, the proxy host name and which port number the proxy runs on.

If the proxy is a HTTP or HTTPS type, you might also need to specify which type of authentication you want to use. For example with --proxy-anyauth to let curl figure it out by itself.

If you know what HTTP auth method the proxy uses, you can also explicitly enable that directly on the command line with the correct option. Like for example --proxy-basic or --proxy-digest.

SOCKS proxies

curl also supports SOCKS proxies, which is a different type than HTTP or HTTPS proxies. When you use a SOCKS proxy, you need to tell curl that, either with the correct prefix in the -x argument or by with one of the --socks* options.

Example command line

curl -x http://proxy.example.com:8080 -U user:password https://example.com

See Also

The corresponding option for sending credentials to a server instead of proxy uses the lowercase version: -u / --user.