a filename when none exists

This is episode four in my mini-series about shiny new features in the upcoming curl 8.10.0 release.

One of the most commonly used curl command line options is the dash capital O (-O) which also is known as dash dash remote-name (--remote-name) in its long form.

This option tells curl to create a local file using the name from the filename part of the provided URL when downloading. I.e. when you tell curl

curl -O https://example.com/file.html

This command line conveniently creates a local file called file.html in which it saves the downloaded data.

The -O option has been supported with this functionality since curl first shipped, in March 1998. An important point here is that it picks the name from the URL so that a user can tell what filename it creates. No surprises. The remote server is not involved in naming it.

What about no filename scenarios?

URLs do not necessarily need to have filename parts. Like these examples:

http://example.com/
http://example.com/path/
http://example.com/one/two/?id=12345

Since there are no filename parts in these URLs, they used to cause curl to refuse to operate with -O and instead return error. curl could not create a local filename to use:

$ curl -O http://example.com/
curl: Remote filename has no length
curl: (23) Failed writing received data to disk/application

Trying harder

Starting in curl 8.10.0, curl works a little harder to come up with a filename to store the download in when -O is used. While there is no filename part in the URL, the user did ask curl to download the URL to a local file so it now tries a few extra steps:

  1. Use the filename part from the URL if there is one, like before.
  2. If there is no filename but there is a path provided in the URL, extract the right-most directory name from the URL and use as filename.
  3. If there is neither a filename nor a path in the URL, curl uses a default, fixed, filename as a final backup: curl_response. This name intentionally has no extension because curl has no idea what data that will come and using an extension could mislead users into believing it says something about the type of content.

Several people have insisted that index.html would be better and sensible default file name. I cannot agree with that, since it might just as well be an image or a tarball of your favorite open source project. I think naming such a file index.html would be more misleading than simply sticking to the neutral curl_response.

Let me give you a little table showing what filenames that will be used with curl -O and a given set of URLs:

URLlocal filename
http://example.com/one.htmlone.html
http://example.com/one.html?clues=noone.html (curl ignores the query part)
http://example.com/one/two/?id=42two (because it is the right-most directory piece)
http://example.com/path/path (because it is the right-most directory piece)
http://example.com/curl_response (because no filename nor directory to use)

Find out which name

You can use curl’s -w, –write-out option and its %{filename_effective} variable to learn exactly which name that was used.

Prefer another name?

There is always the -o (lowercase o) option that lets you specify whatever filename you like. You do not have to let curl pick the filename for you.

Clobber or not

curl will by default overwrite, clobber if you will, any previously existing file using the same name. If you rather curl took a more careful approach, consider using –no-clobber in your command lines. It makes curl pick an alternative filename if the chosen one already exists when curl is about to download data into a local file.

skip a curl transfer

This is episode three in my mini-series of posts describing news in the coming curl 8.10.0 release. Part one was more help, part two verbose, verbose and verbosest.

This new command line option in curl 8.10.0 is a simple one that has been requested by users repeatedly over the years so I figure it was about time we actually provide it.

If the target file already exists on disk, skip downloading it.

It is exactly as simple as that. No date check, no size check, no checking if the file is even what you want it to be. If the target file is present and exists that is a signal enough that the file should not be downloaded; to skip the transfer.

A real-world command line using this feature could then look like this:

curl --skip-existing --output local/dir/file https://example.com

Or if instead -O is used, it still works the same:

curl --skip-existing -O https://example.com/me.jpg

Easy, right? See the manpage.

Broken files can also be present

To avoid a previous broken download remainder to linger around and cause future transfers to get skipped, remember that curl also has a –remove-on-errror option.

Ships

In curl 8.10.0, on September 11, 2024.

Image

From a movie with a suitable if even perhaps subtle reference.

So the Department of Energy emailed me

I received an email today. What follows is a slightly edited version (for brevity).

From: DOE Attestation <doe.attestation@hq.doe.gov>
Subject: [ACTION REQUIRED] U.S. Department of Energy Secure Software Development Attestation Submission Request

OMB Control No. 1670-0052
Expires: 03/31/2027

Hello Haxx

** The following communication contains important DOE Secure Software Development Attestation Submission instructions. Please read this communication in its entirety. **

The U.S. Department of Energy (DOE) has identified your company's software as affected by this request. The list of impacted software products and versions can be found below.

DOE Request:

In support of the Office of Management and Budget (OMB) requirement to collect attestations per M-22-18, please complete the U.S. Department of Energy Secure Software Development Attestation Form (DOE Common Form). If you are unable to attest to all secure software development framework (SSDF) practices, please be sure to attach your Plan of Action and Milestones (POA&M). The software listed below has been identified as being associated with your company and requires DOE to collect an attestation for the software.

Product Name Version Number

libcurl 8.3

The U.S. Department of Energy Secure Software Development Attestation Form (DOE Common Form) can be found at DOE F 205.2 Secure Software Development Attestation Form. The DOE Common Form identifies the minimum secure software development requirements a Software Producer must meet, and attest to meeting, before software subject to the requirements of M-22-18 as updated by M-23-16, may be used by Federal agencies. This form is used by Software Producers to attest that the software they produce is developed in conformity with specified secure software development practices and standards.

Regards,

DOE OCIO C-SCRM Team

Don’t you just love the personal touch in the signature in the end?

I could add that I have never been in contact with them before. I did not know they use libcurl before this email. I do not know what they use it for.

I find it amusing they insist this is “required” .

My response

I am not impossible and I will not deny them this information. So I pressed reply and immediately sent an answer back.

Hello Department of Energy,

I cannot find that you are an existing customer of ours, so we cannot fulfill this request.

libcurl is a product we work on. It is open source and licensed under an MIT-like license in which the distribution and use conditions are clearly stated.

If you contact support@wolfssl.com we can remedy this oversight and can then arrange for all the paperwork and attestations you need.

Thanks,

/ Daniel

Related

Other emails I have received. NASA emailed me.

Discussion

On hacker news.

slow TCP connect on Windows

I have this tradition of mentioning occasional network related quirks on Windows on my blog so here we go again.

This round started with a bug report that said

curl is slow to connect to localhost on Windows

It is also demonstrably true. The person runs a web service on a local IPv4 port (and nothing on the IPv6 port), and it takes over 200 milliseconds to connect to it. You would expect it to take no less than a number of microseconds, as it does on just about all other systems out there.

The command

curl http://localhost:4567

Connecting

Buckle up, this is getting worse. But first, let’s take a look at this exact use case and what happens.

The hostname localhost first resolves to ::1 and 127.0.0.1 by curl. curl actually resolves this name “hardcoded”, so it does this extremely fast. Hardcoded meaning that it does not actually use DNS or any resolver functionality. It provides this result “fixed” for this hostname.

When curl has both IPv6 and IPv4 addresses to connect to and the host supports both IP families, it will first start the IPv6 attempt(s) and only if it did not succeed to connect over IPv6 after two hundred millisecond does it start the IPv4 attempts. This way of connecting is called Happy Eyeballs and is the de-facto and recommended way to connect to servers in a dual-stack since years back.

On all systems except Windows, where the IPv6 TCP connect attempt sends a SYN to a TCP port where nothing is listening, it gets a RST back immediately and returns failure. ECONNREFUSED is the most likely outcome of that.

On all systems except Windows, curl then immediately switches over to the IPv4 connect attempt instead that in modern systems succeeds within a small fraction of a millisecond. The failed IPv6 attempt is not noticeable to a user.

A TCP reminder

This is how a working TCP connect can be visualized like:

But when the TCP port in the server has no listener it actually performs like this

Connect failures on Windows

On Windows however, the story is different.

When the TCP SYN is sent to the port where nothing is listening and an RST is sent back to tell the client so, the client TCP stack does not return an error immediately.

Instead it decides that maybe the problem is transient and it will magically fix itself in the near future. It then waits a little and then keeps sending new SYN packets to see if the problem perhaps has fixed itself – until a maximum retry value is reached (set in the registry, this value defaults to 3 extra times).

Done on localhost, this retry-SYN process can easily take a few seconds and when done over a network, it can take even more seconds.

Since this behavior makes the leading IPv6 attempt not possible to fail within 200 milliseconds even when nothing is listening on that port, connecting to any service that is IPv4-only but has an IPv6 address by default delays curl’s connect success by two hundred milliseconds. On Windows.

Of course this does not only hurt curl. This is likely to delay connect attempts for countless applications and services for Windows users.

Non-responding is different

I want to emphasize that there is a big difference when trying to connect to a host where the SYN packet is simply not answered. When the server is not responding. Because then it could be a case of the packet gotten lost on the way so then the TCP stack has to resend the SYN again a couple of times (after a delay) to see if it maybe works this time.

IPv4 and IPv6 alike

I want to stress that this is not an issue tied to IPv6 or IPv4. The TCP stack seems to treat connect attempts done over either exactly the same. The reason I even mention the IP versions is because how this behavior works counter to Happy Eyeballs in the case where nothing listens to the IPv6 port.

Is resending SYN after RST “right” ?

According to this exhaustive resource I found on explaining this Windows TCP behavior, this is not in violation of RFC 793. One of the early TCP specifications from 1981.

It is surprising to users because no one else does it like this. I have not found any other systems or TCP stacks that behave this way.

Fixing?

There is no way for curl to figure out that this happens under the hood so we cannot just adjust the code to error out early on Windows as it does everywhere else.

Workarounds

There is a registry key in Windows called TcpMaxConnectRetransmissions that apparently can be tweaked to change this behavior. Of course it changes this for all applications so it is probably not wise to do this without a lot of extra verification that nothing breaks.

The two hundred millisecond Happy Eyeballs delay that curl exhibits can be mitigated by forcibly setting –happy-eyballs-timeout-ms to zero.

If you know the service is not using IPv6, you can tell curl to connect using IPv4 only, which then avoids trying and wasting time on the IPv6 sinkhole: –ipv4.

Without changing the registry and trying to connect to any random server out there where nothing is listening to the requested port, there is no decent workaround. You just have to accept that where other systems can return failure within a few milliseconds, Windows can waste multiple seconds.

Windows version

This behavior has been verified quite recently on modern Windows versions.

verbose, verboser, verbosest

A key feature for a tool like curl is its ability to help the user diagnose command lines and operations that do not work the way the user intended them to.

When I do XYZ, why does it not work?

The command line option -v and its longer version --verbose have been supported by curl since day one for this purpose. A boolean flag that when used shows what is going on by outputting extra information from the execution.

I need to emphasize the boolean part here. Up until curl 8.10.0, this option was a plain boolean. You either did not get verbose output or you got it. There was no levels or ways to increase or decrease the amount of information shown. Just a binary one or zero. On or off.

But starting in 8.10.0 the story is different.

The world

Meanwhile, there is a universe of additional command line tools out there. Many other tools also offers a -v option for outputting verbose tracing information. In many other tools, the -v is not a boolean but instead you might get additional output if you add more vs. -vv shows a little more, -vvv even more etc.

Users are fairly trained on this. To the extent that we often get to see users use -vvv etc on curl command lines in bug reports etc. The curl command line parser accepts more of them fine (any amount really), but repeating them just enables the boolean again and again to no extra effect.

When we asked users for their favorite command line option in the annual curl user survey in May 2024, a noticeable amount of respondents said -vv or -vvv. Even though they do nothing extra than -v. I think this shows to which extent people are trained to and are used to having these options for command line tools.

Make curl do what you think it did

Therefore.

In curl 8.10.0, coming on September 11, 2024, we introduce support for -vv, -vvv and -vvvv. One, two, three or four vs. (Maybe we add more in the future.)

If you write the v-letters consecutive next to each other like that, you increase the logging level; the amount of verbose output curl shows. It might then possibly do something in the style that many users already expected it to do.

The extra logging it can do in 8.10.0 is actually nothing new, what is new is that you can get to it by simply using -vv and friends. The old style of getting such extra verbose tracing is to instead use a selected combination of –trace-time, –trace-ascii and –trace-config.

Backwards compatibility

In curl we care deeply about backwards compatibility and not breaking users existing scripts and use cases when we ship new versions. This change is perhaps on the border of possibly doing this, so we have tried to tread as gently as we can to make sure that risk is slim.

This is why doing something like curl -v -v will not increase the level, because a user might have one of the switches in ~/.curlrc, another one on the command line and a third one in a custom file read with curl’s -K option etc. We want the extra output level to be explicitly asked for.

Using a single -v after a -vv or -vvv resets the level back to the original lowest-but-enabled for the same reason. The --no-verbose option also still works, even though the option is not strictly a boolean anymore and curl normally otherwise only supports the --no- prefix for boolean command line options.

Credits

Stefan Eissing wrote the pull-request that landed this change.

more curl help

With the ever-growing number of command line options for curl, the problem of how to provide documentation and help users understand how options work and should be used is a challenge that is worth revisiting regularly. To keep iterating on.

I personally often use the curl manpage to lookup descriptions for options. Often not only to refresh my memory for my own use, but also when I want to quote a piece from it in a response to a user asking questions.

Right now curl supports 265 separate command line options in the latest development version. The text-only version of the manpage is almost 7,000 lines long. Searching the manpage for an option is sometimes also tedious since there are a lot of mentions of options in descriptions for other options. Like in see also how option blabla can help you accomplish this. So you might need to hit the key for next-search a significant number of times before you find the place you want if the term you search for is in the bottom half of the manpage.

curl --manual exists as well, and is convenient since everyone who has the tool automatically also has the documentation for it. But it carries the same challenge.

Gimme docs for that option only

Starting in curl 8.10.0, planned to ship in September 11, 2024 we introduce this new nifty way to get documentation for only a specific single command line option:

curl --help [option]

or using the shorter version:

curl -h [option]

The [option] part above is a command line option. You can write it using the long format, the short format or even in its –no- format. In all ways an option is typed when accepted on a command line, you ask curl to help you with it.

Screenshots

This seems like a feature worthy of a few screenshots to fully demonstrate it.

I hope it will help everyone understand curl options a little better.

Technical details you did not ask for

The full text version of the manpage is normally stored gzip-compressed in memory to make the binary several hundred kilobytes smaller. To display this output, curl needs to decompress the whole thing from the start and in a streaming manner figure out when to start showing the help and when to stop. This is still blazingly fast even on a not so modern computer.

For a long time we wrote the manpage for curl in directly using nroff format. Back then we generated the text-only version of that (for using with –manual) using nroff at build time. That text output was not stable and reliable enough for us to be able to do this feature. It was not until we switched over the documentation format to text, and later curldown, from which we generate the manpage, that we started to see this possibility. In March 2024, we landed the build updates that finally removed nroff from the build procedure and instead replaced it with a custom written tool. With full control of the input as well as the output, we can now add this feature reliably.

Future

Like everything, this feature can certainly also be improved further. Possible improvements include:

  • Use of a pager so that when the output is more than a screen-full, it pauses and you can move up and down using keys. Right now you need to pipe it manually into less or something.
  • Dynamic reflowing of the text to better adjust itself to the existing terminal width. The current output is fixed-width designed for eighty column terminals.

curl welcomes wcurl to the team

The curl project welcomes its newest sibling into the family: wcurl.

I already wrote about wcurl. I will try to not repeat myself too much here, but starting now wcurl has its new home under the curl organization umbrella. It is now an official curl project.

www: https://curl.se/wcurl
GitHub: https://github.com/curl/wcurl

The initial developers and people behind wcurl are still around. This is just replanting the little thing in a different pot to hopefully allow it to grow better. The idea being that wcurl is a tool with a wider scope than “just” Debian and by hosting and working on it within the curl family, we signal this quite clearly. Ideally, we also make it easier for contributors and maintainers of curl to try out wcurl and help us take it further to whatever is coming next.

I do not intend to control wcurl or dictate how it should be run or developed. I hope and strive to become a contributor of it and I want to help making it the best tool it can be.

Samuel Henrique and Sergio Durigan Junior are the wcurl maintainers.

wcurl becomes the third command line tool produced in the curl project. trurl was the second after curl itself.

Communication

If you find problems, have improvements or just have ideas for how to take wcurl further. Head over to the GitHub repository and make your voice heard!

The previous repository was transferred on GitHub, which should mean that it will redirect correctly to the new URL in case you by accident would go to the old place.

Join us!

wcurl is released under the regular curl Open Source license.

wcurl has the same low bar and no-friction policies as the rest of curl to make it easy for everyone who wants to, to help out and contribute.

You are welcome and appreciated!

libcurl is 24 years old

On Monday August 7, 2000 at 14:49 UTC, we announced the release of the first libcurl version ever. Exactly twenty-four years ago today. We called it version 7.1. The simple reason we did a point one release as the first one was that we had shipped a whole range of 7.0 beta versions before that day and I wanted to make it clear to everyone that this was a bump up from those. To avoid confusions.

The last release we did before libcurl was born was called 6.5.2, which we did 139 days before 7.1 was uploaded (in March 2000). The longest release interval in the entire history of curl. Never before or after, have we spent that long time to prepare a single new curl release.

Howdy

I just put the curl 7.1 tarball on the curl site. It’ll take a while more for it to appear on the mirrors and for our friends to send in binary archives or report where there are updated archives for various platforms.

The original announcement. Not many frills there.

Contributors

Most of the work that went into the creation of libcurl was done by me alone. While we already had contributors and lots of people helping out with bug reports and features, this kind of heavy lifting and huge refactor was not really a team effort. I spent the summer splitting the monolithic command line tool into two components: a library and a command line tool that uses the library.

Since the last days of 1999 the curl source code was hosted on Sourceforge using CVS so at least all this was done in a public source code repository with version control.

Why libcurl

I had already years before this time understood the powers of using shared libraries and how they are good ways to offer features and functionality to applications. I got an article published in a technical Amiga magazine in the early 1990s about how to do shared libraries on AmigaOS.

As the command line tool curl started to get somewhere and was already two years old at that point (the curl tool was born on March 20 1998), I imagined that there could be an application or two that could benefit from getting easy access to doing Internet transfers. It was just a gut feeling and a guess. I did not know. I did not research this or anything, I just took the plunge and figured we will eventually realize if I was right or wrong.

With me having written some portable libraries in the past already, curl was already written with the mindset that it could at time point get converted to a library + command line tool. This eased the conversion.

Already on day one libcurl built and ran on numerous operating systems offering the same API.

Name

When I decided to make a library from the core of the curl tool I opened the discussion to see if someone had any ideas for what this library should be called. Naming is hard and I had no particularly bright ideas myself so we quite soon landed on the simplest take possible. Let’s just call it libcurl.

API

How do you design an internet transfer library API?

I had this idea to offer multiple different API “sets” but I figured I needed to start with something basic. It ended up getting called the “easy” interface because it was intended to be straight-forward to use and I decided I could always add something more advanced later. I wanted to provide a fairly low level API and if someone wanted something more high-level, such an API can get built on top of this.

I knew that I wanted the API to be somewhat extensible without having to change the API all the time, and I found inspiration in how the ioctl() and fcntl() and similar functions work. I made curl_easy_setopt() follow that pattern. For good and bad.

I wanted the API to be decently protocol agnostic, since curl could already handle several different ones and I figured it could be appreciated by users who might not always be Internet protocol experts.

Mostly due to luck we managed to ship an API that even though it has its quirks has survived remarkably well.

Using C

There was never any consideration to use another language than C for the library and there was really no decent options to select from either. In theory I suppose C++ could have been used, but I have never been a friend of C++ so I rather not. Not then, not now.

There was never even a discussion about what language to use.

Resilience

The API that gradually was created in that summer of 2000 turned out to be fairly good and has survived and done surprisingly good.

There are 17 API functions still in use today that were introduced in 7.1. In fact, much code written for the API from back then can still be compiled and run using the latest libcurl!

At the time of the first libcurl release there was about 17,000 lines of product code. Today, twenty-four years later, we recently surpassed 171,000 lines.

The API has over the years even proved itself to endure “revolutionary” protocol changes such as HTTP/2 introducing multiplexing multiple transfers over the same connection and HTTP/3, which switches from TCP to UDP. This, mostly because it offers a high enough abstraction level.

Users

The same month we shipped the first libcurl ever, the PHP project built the first libcurl binding. When PHP 4.0.2 was released on August 29 2000, curl was a bundled, blessed and official extension. PHP’s early and unconditional adoption of libcurl helped us get users, get testing, get bugreports and helped us mature.

Soon other users, projects, tools and devices adopted libcurl and built things using it. Gradually, it grew in popularity.

multi interface

In the spring of 2002, we expanded the libcurl API with the “multi” API. Using this API family, libcurl offers an unlimited number of concurrent parallel Internet transfers in the same thread.

In 2006, we later expanded the multi interface to offer the multi_action way, which expands the API to allow use of event-based mechanisms for the event loop, like epoll etc, thus allowing scaling up parallel transfers to the thousands or more without sacrificing performance.

SONAME

Put simply, SONAME is the version number of the binary interface. We bumped it several times in the early years as we adjusted API details that we did not get right from the beginning.

Years later (in 2006), we did the last major SONAME bump and by then libcurl had enough usage to make a lot of people squeal and complain about the agony that bump caused. That event and its reactions was a strong contributing factor to me making the decision: To as far as we possibly can never again break the ABI. So far we have managed to stick to this promise and goal – barring bugs of course.

Every libcurl program since 2006 can be compiled against the latest libcurl. Of course, the underlying protocols (especially TLS) and URLs etc have changed and moved significantly since then which makes most URLs from that time to not work anymore.

Higher level API

I think in many ways my original hopes of others doing higher level APIs to build on top of what libcurl provides has been realized primarily through the means of language bindings.

We have records of at least 65 different language bindings created for libcurl, for almost as may different languages and environments. This expands the reach of libcurl powers and makes it virtually independent of programming language of choice. It also lowers the bar, as many of those languages and bindings might be easier and perhaps less intimidating to use than the C API, and yet they still offer the unparalleled super powers of the libcurl engine under the hood.

Success

I think no matter how you count or view it, libcurl can be considered a success.

This success was possible only because of the people involved. The thousands of contributors who reported bugs, debugged problems, tested patches, ran crazy experiments in production, voiced their opinions, provided ideas and added features. I may be the captain of the ship, but I was never alone and it would never have been even close to possible if I had been.

libcurl has been made to run on at least 103 operating systems and 28 CPU architectures. Everyone can always just opt to use libcurl in whatever thing they need Internet transfers in.

Timing

I attribute a lot of our success on the lucky timing. We offered something that was working and decent at the right time, and we could following along and grow with the general Internet use growth. We had no idea where the Internet evolution was going, its explosion growth and we had no idea that just about every consumer device was going to get Internet connected twenty years layer. Devices that all could run libcurl.

Competition

Internet development and evolution never stop. When libcurl first shipped, people were using libwww and for several years I considered that to be our primary competitor. Today, few people even know it existed.

The time is now and if we do not keep up with development, fix bugs and offer the features that allow the world to do Internet transfers the way every wants, then libcurl could soon also be added to the pile of forgotten software projects.

These days, there are plenty of libcurl alternatives. Perhaps the most significant ones being native (HTTP) libraries included in or for programming languages like Python, Java, Rust or .Net.

I like to say that libcurl’s biggest selling point is that we have at least 18 years of proven API, ABI and general stability. Nothing you may consider to use instead of libcurl comes even close.

Future

Even more devices will get Internet access going forward. Where Internet goes, libcurl can be of use. I do not think we have reached peak libcurl yet.

The curl project needs to keep the product rock solid and never break the API. We need to continue support doing Internet transfers they way “the world” wants them done: the protocols, the versions and the methods deemed the right ones. Keep listening to our users. Keep making it easy for contributors to improve it.

I have never been good at forecasting or even guessing what the future holds. Will there ever be a moment or event in the future that suddenly makes libcurl irrelevant? Maybe. It’s not totally unlikely. More likely however, libcurl will continue to transfer Internet bytes for the world for many years to come. At some point it will be superseded by something else, but I cannot say when or by what.