Closing the NASA loop

curl is actually something which is critical, especially to our data management system. It is being used very widely across NASA.

Dr Steve Crawford

2020

Back in 2020 I started getting emails from NASA asking for details and specifics about curl’s origins and in particular about where contributors to the project works, and I first replied eagerly trying to be helpful, but over time I kept receiving very similar emails from other NASA departments.

2021

It puzzled me, and out of frustration I posted this tweet in April 2021. A tweet that received a lot of attention and more than 3,000 likes.

2023

In a closing keynote at the FOSDEM 2023 conference, Dr. Steve Crawford did a talk titled NASA and Open Source Software (video, slides).

Some 24 minutes in, on slide 28, Dr Crawford shows a screenshot of the above tweet and talks about NASA’s use of curl, and says that piece I quoted at the top.

I was at the FOSDEM 2023 conference, but unfortunately I had to skip the last hour of presentations so I had just left the campus when this talk was held.

It would have been a blast to have been present in the room at that time. Now I instead got an avalanche of messages from friends and acquaintances who notified me about this talk and mention of me, which was of course also fun.

Takeaway

I’m glad NASA is aware of some of their problems and that they listen. It is a comfort that my text was taken with the right attitude. It also feels good that I used the correct tone in that Tweet: I figure it is rarely someone’s actual desire to appear clumsy or bureaucratic, but organizations and companies can easily get trapped in processes that still make them act that way.

Credit

As a celebration of NASA, the top image is taken by NASA’s James Webb Space Telescope’s Near-Infrared Camera (NIRCam) and features the central region of the Chamaeleon I dark molecular cloud, which resides 630 light years away.

Of course, I use a lame downscaled version.

A badger badge for bagder

At FOSDEM 2023, fossasia sold a fun badge with bright red 44×11 leds on them. Looking like this:

Lots of people walked around the conference this year with sarcastic or otherwise amusing messages scrolling or blinking on them.

GitHub Social

When a few people at the GitHub Social event on the Saturday evening showed up wearing those badges, this triggered Martin Woodward (VP of DevRel for GitHub) who hosted the event, and he brought out his e-ink programmable badge from GitHub Universe and with a big smile on his face he showed off how cool it was and some of the functionality it holds. Combined with the story behind how he made this happen. Good stuff!

The red led badge immediately felt boring in comparison and in particular my friend Linus expressed his desire to get one. Martin explained how they had a few of them as prizes for the lottery of the evening. You would get one chance to win for every “hash collision” you could find – we all had gotten a hash code each on a sticker when we entered the place. Linus took off and searched for collisions with a frenzy, trying to maximize his chances.

Hash collisions

My fancy t-shirt front of the evening, showing my hash in the top left.

It was a fun and sufficiently nerdy social game that made us talk and mingle around a bit among the other fellow open source maintainers in the room and soon enough we found occasional collisions. I think I ended up finding three, Linus found six. Every time we wrote our names on a piece of paper and put it in a big container.

Socializing, beers, pizza and then eventually the time had come. No more collisions, time to see who had won.

Prizes

GitHub handed out many different prizes apart from the badges: GitHub pro subscriptions, hats and a huge GitHub Led logo.

I got the honors of picking the first five prizes: the pro subscriptions. Out of the five folded papers one of them had my name on it! Me already being a GitHub star (which offers that, and Martin knowing this), my name was instead replaced by another ticket. Still, I had apparently managed to pull my own name out of the hundreds of lottery tickets!

The badges

When the time come to pick the winners for the five badges, someone else fished up five tickets from the box and they were handed over to Martin to who read them, one by one.

Daniel Stenberg – I had won again! What are the odds? It was most hilarious, but then Martin continued, he read another name and then when he unfolded a ticket he looked a little surprised and said a little wondering another Stenberg? Yes, my brother Björn who was also present there then also won of those fun badges. What were the odds of that?!

Linus did not win one.

Linus, Björn and me are friends since over thirty years back (we are haxx.se) and since now two of us had won badges and the person who wanted it the most did not win, we of course could not resist to tease and taunt Linus plenty about it. What are friends for after all?

Leveling up

I knew how much Linus wanted one of these, so while I considered giving him mine, I first did a careful check with Martin. Did he possibly have an extra spare one that I could perhaps get my hands on for Linus?

He probably did. Over at his hotel he might have an extra. He even very graciously offered me that one like the champ he is.

Knowing that this possibility was in the pipeline, we could level up the taunting even more and really rub it in on Linus. Me and Björn thought it was mightily fun. You can probably say we were badgering him.

A midnight image

The GitHub event ended, we all walked out into the Brussels night, aiming to get some sleep to endure another intense FOSDEM full day the following Sunday. At 00:55 Martin messaged me this picture:

Sunday morning

Of course we reminded Linus already at breakfast how some of us actually have these cool badges and then maybe a few more times the following hours. We also shared the secret with other friends so the surrounding had a better understanding of what was going on.

In the meantime, I secretly coordinated a badge drop-off. My personal hero Martin delivered a box for Linus with this badge, I could pick it up and while pausing our badge-teasing, I could hand over the cardboard box to him with his GitHub handle on a sticker.

Here’s one for you

That expression. It took a few seconds for what just happened to sink in, before Linus realized exactly what had been going on since last night.

Priceless.

An e-ink programmable badge

The badge tech itself is a 296 x 128 pixels e-ink display powered by an FP2040 (Raspberry Pi) Dual Arm Cortex M0+ running at up to 133Mhz. Easily programmable too.

If you want one, pimoroni sells these beauties. (I have no relation with them.)

They call this the badger 2040, which is particularly fun since I use the nickname bagder since a long time, on GitHub and elsewhere. A badger badge for bagder.

Click the images for higher resolution.

Front

Back

curl’s use of many CI services

In the beginning and for many years, the curl project used no CI services at all. It instead used a distributed build and test systems where volunteers ran machines that pulled the latest code repeatedly, built curl, ran the tests and reported back the results to a central server.

One

In 2013, the year curl turned 15, we created our first CI jobs on Travis CI. With only a single CI service life was easy for a few years.

Two, three

This single service had a limited feature set and in particular a limited set of supported platforms. To also do automatic testing on FreeBSD and Windows we had to use two additional services because Travis did not support them. Now they were three, early 2019. Cirrus CI and AppVeyor.

Four

When we use free services, we need to live with the limitations of what the good providers offer for free or at low cost. In the case of CI services, they tend to reduce CPU time and parallelisms for users of the free tier and so did Travis.

When the number of CI jobs on Travis surpassed 30, and we had already gotten a small performance boost just because of their good will, we created the next few new CI jobs on GitHub Actions instead to increase the parallelism for no extra money. If I recall things correctly, the macOS support was also much better on GitHub since it was rather limited on Travis.

GitHub later graciously bumped our service level for even more power and parallelism. Increased parallelism, not the least thanks to the use of several independent CI services, made sure that the complete set of CI jobs would still complete within a reasonable time.

Five

When working on extending and improving our Windows CI testing in late 2019, our previous Windows CI provider AppVeyor was not good enough so we opted to add jobs on Azure Pipelines. This was also because GitHub Actions could not run the images we have and wanted to use for this purpose.

Redundancy

When we entered the year 2020 we were at 60 CI jobs and having them run on several different CI services often turned out useful when one of them acted up: at least a lot of other jobs would still work and help us assess and verify proposed changes. No all eggs in the same basket problem.

Services come and go

Redundancy also helps soften the blow when a service goes away. If you are in the race long enough, all services will go away or go sour eventually. This includes CI services.

In 2021, Travis CI changed their policies and suddenly we could not keep using them unless we paid up a few K USD per year and we would rather avoid that.

We had to move the 30+ CI jobs from Travis to something else. Thanks to a generous offer, volunteers showed up and helped transition the Travis jobs over to a new service: Zuul CI. It softened the repercussions from the “jump” and the CI jobs kept helping us ship quality code.

Five, Six

To manage the Travis CI eviction, Zuul took over most of the curl CI jobs and a few of them were added on Circle CI, which then appeared as CI service number six. Primarily because of their at the time early and convenient support for arm.

Zuul CI

We were grateful for the help we got to move over to Zuul from Travis, but soon it became apparent to us that Zuul CI is more “crude” than some of the other services and it left us wanting more. It’s UI is way less sophisticated, to the level that it is almost difficult for a casual PR submitted to read and understand build errors. Also, it was slightly buggy, which could result in Zuul jobs not showing up in the GitHub UI at all or simply failing to trigger the new jobs. When the responses from the Zuul side to our problems were somewhere between slow to non-existent I felt with had no other choice but to transition away from this service as well.

The change took its time. At the end of 2021 we had 30 CI jobs on Zuul, and just days ago in late January 2023, we removed the final curl jobs from it.

Five

We use five services now and we could possibly consolidate down to four if we really wanted to, but I see no reason to do that now when things are working and huffing along.

GitHub Actions have really taken off as our primary CI service and now runs almost half of the entire set. Thanks to it being convenient, well integrated, well documented and us having good parallelism on it.

We do what we need

Whatever is good for the project we will consider doing. We have gotten to this point with this set of CI services because they help the project. If someone proposes a change that improve things and that change reduces the number of CI services, then we might go that way next. Or maybe we add one? We have not planned what comes next.

What we run in CI

  • We build curl and run tests with numerous different build configurations on several architectures on different operating systems. With and without debug enabled. With and without using valgrind. Most builds also run checksrc , which verifies source code style.
  • We run dedicated jobs that do “deeper” testing, such as building with address and undefined behavior-analyzers and running the complete curl test suite in “torture mode”.
  • We run markdown and man page spell checkers
  • We run English prose checking (using proselint) of markdown files
  • We run static code analyzers and fuzzers
  • We confirm the copyright and license situation of all files in git
  • We verify links within markdowns
  • We have a few “bot services” that can set the “hacktoberfest-accepted” label, and a labeler service that tries to automatically set proper categories for pull requests.
  • We verify that the release tarball looks right and works when generated from the current set of files in git
  • … and probably a few other things I have forgot now

Of course we have graphs

These graphs were screenshotted from the dashboard on February 1st, 2023.

The total number of CI jobs done for each PR and commit, over time
Number of CI jobs running on which CI service, over time
CI job distribution over platforms

Future

Whatever helps the project and whatever someone offers to help us make that happen, we might do. That may mean using more services, it might mean using less.

The important part is that these services are used to improve and strengthen the curl project and the products we ship.

Selecting HTTP version (three)

The latest HTTP version is called HTTP/3 and is being transferred over QUIC instead of the old classic TCP+TLS duo.

An attempt of an architectural drawing could look like this:

HTTP network stacks

Remember: experimental

HTTP/3 support in curl is still experimental so we do reserve the right to change names, behavior and functionality during development.

We aim to remove this experimental label from HTTP/3 support during the spring of 2023.

HTTP/3 is for HTTPS only

Before we dig into the details of this, remember that HTTP/3 can and will only be used for HTTPS:// URLs. It is always encrypted and there is no way to do HTTP/3 over clear text. Asking to do HTTP/3 with a HTTP:// URL is therefor a non-starter. An error.

Cannot upgrade a connection

When HTTP/2 was introduced to the world, there was a companion TLS extension created called ALPN that allows a client to ask for HTTP/2 to be used. This is a very convenient and slick way for a client to mostly transparently upgrade from HTTP/1.1 to HTTP/2 over the same connection. No penalty or extra time wasted even.

With HTTP/3, the procedure cannot be done in the same easy way. As my fancy picture above shows, HTTP/3 requires a separate QUIC connection done to the host. A connection that uses HTTP/1 or HTTP/2 cannot be upgraded to HTTP/3. The client needs to make a separate, dedicated connection for HTTP/3.

(Since QUIC is done over UDP, HTTP/3 also uses a different port number space than the earlier HTTP versions.)

If the QUIC connection fails, it means there can be no HTTP/3 and then a client might instead select to try an older HTTP version over another connection.

Alt-Svc

The original and official (according to the HTTP/3 RFC) way of bootstrapping a transfer into HTTP/3 is done like this:

A client makes a request using HTTP/1 or HTTP/2 to a server and in its response headers, the server indicates that it supports HTTP/3 by including an Alt-Svc: header with details on where and how to connect to the HTTP/3 server – and also for how long into the future this information is valid.

The client can then make its next HTTP operation against that server with HTTP/3 to the above mentioned host name and port number. So unless this info was already cached, a client needs an initial “upgrade” round-trip before it can use HTTP/3. Also, many clients/browsers will rather prefer to reuse the existing initial (HTTP/2 or HTTP/1-using) connection for subsequent requests rather than creating a new one since that might be faster.

Thus, the upgrading to HTTP/3 might not happen until some time has passed that allow the initial connection to close.

curl supports alt-svc and can upgrade to HTTP/3 using it.

Alt-Svc replacement

There are some inherent problems with this header and server operators do not like it. A working group set out to fix its problems have rather suggested a new header, alt-svcb, to replace it. This header looks simpler, partly because it is made to lean on another newcomer in the game: the HTTPS DNS records.

HTTPS records

This is a proposed new DNS record which can contain information about a server’s support for (among other things) HTTP/3, called HTTPS. Called a DNS RR, where RR is Resource Record. A field of information stored in DNS.

Yes, the name of this DNS record makes discussions a little confusing as HTTPS is otherwise generally a URL scheme or perhaps even a “protocol”.

A client can use DNS to figure out if and where it should try HTTP/3 or an older HTTP version when speaking to a particular host by using this HTTPS record. This is not yet an official standard and the RFC is not finalized, but there are servers out there deploying it already and there are clients/browsers taking advantage of it.

curl does not support HTTPS records yet, but we have a rough plan for how to do it.

Just try it

During curl’s several years of having offered experimental HTTP/3 support, we have provided an option for the user to ask it to use HTTP/3 directly against the host mentioned in the URL. Known as --http3 for the command line tool.

Going forward, this option is going to remain an option to ask curl to speak HTTP/3 with the server in the URL but it will also allow curl to fallback to an earlier HTTP version in case of QUIC problems. See below for details on exactly how.

Starting now, we also introduce a new separate option to ask for exactly and only HTTP/3 without any fallback. We expect this to be less commonly used by users. This option for the command line is currently called --http3-only.

Happy eyeballs everything!

We want users to be able to ask for HTTP/3 with a fallback to an earlier HTTP version if needed. The option should start as an opt-in but with the expectation that maybe in a future it can become a default.

Challenges involve:

  1. A not insignificant share of QUIC attempts are blocked when the company/organization from which the attempt is made does not allow them.
  2. Sometimes UDP is just slowed down (a lot)
  3. HTTP/3 is still only deployed in a fraction of all servers

This is how we envision to do it:

  1. Start an HTTP/3 attempt
  2. If it has not connected successfully within N milliseconds, start an HTTP/2 attempt in parallel. (That can become an HTTP/1 transfer depending what the server supports.)
  3. The first successful connect wins and the other one is discarded.

For each of these separate attempts, IPv6/IPv4 is also selected in the same kind of race against each other to pick the one that connects first. Potentially making up to four parallel connect attempts going on at the same time: QUIC-IPv6, QUIC-IPv4, TCP-IPv6 and TCP-IPv4!

I made a little drawing to visualize how the different connect attempts then might get initiated:

Multi-layered happy eyeballs

This is planned

I just want to be clear: this is what we plan to make work going forward. The code does not actually work like this just yet.

Update

In a slightly longer plan, before this feature is removed from its experimental state, we will probably remove both --http3 and --http3-only from the command line tool and instead create a more generic --http-versions options to maybe replace a lot of HTTP selection options. The exact functionality and syntax for this is yet to be worked out.

The underlying libcurl options might still remain as described in this blog post though.

My weekly report on email

Starting this week, you can subscribe to my weekly report and receive it as an email. This is the brief weekly summary of my past week that I have been writing and making available for over a year already. It sums up what I have been doing recently and what I plan to do next.

Topics in the reports typically involve a lot of curl, libcurl, HTTP, protocols, standards, networking and related open source stuff.

By subscribing to this by email, you will receive a ping and get it in your inbox as soon as it it exists. This saves you from reloading the weekly report web page or risk missing my updates on social media.

Follow what happens in the projects I run and participate in. Keep up with the latest developments in all the open source and network related stuff that occupy my every day life.

Why email?

I was already sending this report over email to some receivers, so I figured I could just invite everyone who wants to receive it the same way. Depending on how people take this, I might decide to rather only do this over email going forward.

Your feedback will help me decide on how this plays out.

The weekly report emails are archived, so you can go back and check them after the fact as well.

Copyright without years

Like so many other software projects the curl project has copyright mentions at the top of almost every file in the source code repository. Like

Copyright (C) 1998 - 2022, Daniel Stenberg ...

Over the years we have used a combination of scripts and manual edits to update the ending year in that copyright line to match the year of the latest update of that file.

As soon as we started a new year and someone updated a file, the copyright range needed update. Scripts and tools made it less uncomfortable, but it was always somewhat of a pain to remember and fix.

In 2023 this changed

When the year was again bumped and the first changes of the year were done to curl, we should then consequentially start updating years again to make ranges end with 2023.

Only this time someone asked me why? and it made me decide that what the heck, let’s completely rip them out instead! Doing it at the beginning of the year is also a very good moment.

Do we need the years?

The Berne Convention states that copyright “must be automatic; it is prohibited to require formal registration”.

The often-used copyright lines are not necessary to protect our rights. According to the Wikipedia page mentioned above, the Berne Convention has been ratified by 181 states out of 195 countries in the world.

They can still serve a purpose as they are informational and make the ownership question quite clear. The year ranges add questionable value though.

I have tried to find resources that argue for the importance of the copyright years to be stated and present, but I have not found any credible sources. Possibly because I haven’t figured out where to look.

Not alone

It turns out quite a few projects run by many different organizations or even huge companies have already dropped the years from their source code header copyright statements. Presumably at least some of those giant corporations have had their legal departments give a green light to the idea before they went ahead and published source code that way to the world.

Low risk

We own the copyrights no matter if the years are stated or not. The exact years the files were created or edited can still easily be figured out since we use version control, should anyone ever actually care about it. And we give away curl for free, under an extremely liberal license.

I don’t think we risk much by doing this move.

January 3, 2023

On this day I merged commit 2bc1d775f510, which updated 1856 files and removed copyright years from almost everywhere in the source code repository.

I decided to leave them in the main license file. Partly because this is a file that lots of companies include in their products and I have had some use of seeing the year ranges in there in the past!

Bliss

Now we can forget about copyright years in the project. It’s a relief!

An m1 for curl

A generous member of the wider curl community stepped up and donated an unused Mac mini m1 model to me to be used for curl development. Today it arrived at my home. An 8C CPU/16GB/1TB/8C GPU/1GbE model as per the sticker on the box.

The m1 mac mini, still wrapped in plastic.

Apple is not helping

Apple has shipped and used curl in their products for twenty years but they never assist, help or otherwise contribute to the development. They also don’t sponsor us in any way, like with hardware.

Yet, there are many curl users on the different Apple platforms and sometimes these users run into issues that are unique to those platforms and are challenging to address without direct access to such.

For curl

I decided to accept this gift as I believe it might help the project, but this is not a guarantee or promise that I will run around and become the mac support guy in the project. It will just allow me to sometimes get a better grip and ability to help out.

I will also offer other curl committers access to the machine in case of need. For development and debugging and whatnot. Talk to me about it.

A tiny speed comparison

My Intel-based development machine runs Linux, is ten years old and is equipped with an i7-3770K CPU at 3.5GHz. The source code is stored on an OCZ-VERTEX4 SSD on the Intel, the mac has SSD storage only.

Here’s a rough and not very scientific test of some of my most common build activities on the m1+macOS vs the old Intel+Linux machines. This is using the bleeding edge curl source code with roughly the same build config. Both used clang for compiling, a debug build.

Testm1Intel
configure19.8 s18.5 s
make -sj12.8 s14.2 s
autoreconf -fi7.9 s12.8 s
make -sj (in tests/)19.1 s33.9 s

I expected the differences to be bigger.

The first line of curl -V for the two builds:

curl 7.87.1-DEV (aarch64-apple-darwin22.2.0) libcurl/7.87.1-DEV OpenSSL/3.0.7 zlib/1.2.11 brotli/1.0.9 zstd/1.5.2 c-ares/1.18.1 libidn2/2.3.4 libpsl/0.21.2 (+libicu/71.1) libssh2/1.10.0 nghttp2/1.51.0 libgsasl/2.2.0
curl 7.87.1-DEV (x86_64-pc-linux-gnu) libcurl/7.87.1-DEV OpenSSL/3.0.7 zlib/1.2.13 brotli/1.0.9 zstd/1.5.2 c-ares/1.17.0 libidn2/2.3.3 libpsl/0.21.0 (+libidn2/2.3.0) libssh2/1.10.1_DEV nghttp2/1.50.0-DEV librtmp/2.3 libgsasl/2.2.0

Interestingly, there is no mention anywhere that I can find in the OS settings/config or in the box etc as to what CPU speed the m1 runs at.

Credits

This device was donated “to the cause” by “a member and supporter of the Network Time Foundation at nwtime.org” (real name withheld on request).

Discussed

Hacker news.

Short follow-up

People mention that the Intel CPU uses much more power, runs at higher temperature and that the m1 is “just first generation” and all sorts of other excuses for the results presented above. Others insist that the Makefiles must be bad or that I’m not using the mac to its best advantage etc.

None of those excuses change the fact that my ten year old machine builds curl and related code at roughly the same speed as this m1 box while I expected it to be a more noticeable speed difference in the m1’s favor. Yes, it was probably bad expectations.

curl -w certs

When a client connects to a TLS server it gets sent one or more certificates during the handshake.

Those certificates are verified by the client, to make sure that the server is indeed the right one: the server the client expects it to be; no impostor and no man in the middle etc.

When such a server certificate is signed by a Certificate Authority (CA), that CA’s certificate is normally not sent by the server but the client is expected to have it already in its CA store.

What certs?

Ever since the day SSL and TLS first showed up in the 1990s user have occasionally wanted to be able to save the certificates provided by the server in a TLS handshake.

The openssl tool has offered this ability since along time and is actually one of my higher ranked stackoverflow answers.

Export the certificates with the tool first, and then in subsequent transfers you can tell curl to use those certificates as a CA store:

$ echo quit | openssl s_client -showcerts -connect curl.se:443 > cacert.pem
$ curl --cacert cacert.pem https://curl.se/

This is of course most convenient when that server is using a self-signed certificate or something otherwise unusual.

(WARNING: The above shown example is an insecure way of reaching the host, as it does not detect if the host is already MITMed at the time when the first command runs. Trust On First Use.)

OpenSSL

A downside with the approach above is that it requires the openssl tool. Albeit, not a big downside for most people.

There are also alternative tools provided by wolfSSL and GnuTLS etc that offer the same functionality.

QUIC

Over the last few years we have seen a huge increase in number of servers that run QUIC and HTTP/3, and tools like curl and all the popular browsers can communicate using this modern set of protocols.

OpenSSL cannot. They decided to act against what everyone wanted, and as a result the openssl tool also does not support QUIC and therefore it cannot show the certificates used for a HTTP/3 site!

This is an inconvenience to users, including many curl users. I decided I could do something about it.

CURLOPT_CERTINFO

Already back in 2016 we added a feature to libcurl that enables it to return a list of certificate information back to the application, including the certificate themselves in PEM format. We call the option CURLOPT_CERTINFO.

We never exposed this feature in the command line tool and we did not really see the need as everyone could use the openssl tool etc fine already.

Until now.

curl -w is your friend

curl supports QUIC and HTTP/3 since a few years back, even if still marked as experimental. Because of this, the above mentioned CURLOPT_CERTINFO option works fine for that protocol version as well.

Using the –write-out (-w) option and the new variables %{certs} and %{num_certs} curl can now do what you want. Get the certificates from a server in PEM format:

$ curl https://curl.se -w "%{certs}" -o /dev/null > cacert.pem
$ curl --cacert cacert.pem https://curl.se/

You can of course also add --http3 to the command line if you want, and if you like to get the certificates from a server with a self-signed one you may want to use --insecure. You might consider adding --head to avoid the response body. This command line uses -o to write the content to /dev/null because it does not care about that data.

The %{num_certs} variable shows the number of certificates returned in the handshake. Typically one or two but can be more.

%{certs} outputs the certificates in PEM format together with a number of other details and meta data about the certificates in a “name: value” format.

Availability

These new -w variables are only supported if curl is built with a supported TLS backend: OpenSSL/libressl/BoringSSL/quictls, GnuTLS, Schannel, NSS, GSKit and Secure Transport.

Support for these new -w variables has been merged into curl’s master branch and is scheduled to be part of the coming release of curl version 7.88.0 on February 15th, 2023.

At 17000 curl commits

Today, another 1,000 commits have been recorded as done by me in the curl source code git repository since November 2021. Out of a total of 29,608 commits to the curl source code repository, I have made 17,001. 57.42%.

The most recent one was PR #10019.

In 2022, I have done 56% of all the commits in the curl source repository. I am also the only developer who works full time on curl all the time.

In 2022, 179 individuals authored commits that were merged into curl. 115 of them did that for the first time this year. Over curl’s life time, a total of 1104 persons have authored code merged into curl.

Do I ever get bored? Not yet. I will let you know if I do.

The curl fragment trick

curl supports globbing in the sense that you can provide ranges or lists in the URL that will make curl iterate, loop, over all the different variations and do a separate transfer for each.

For example, get ten images in a numeric range:

curl "https://example.com/image[1-10].jpg" -O

Or get them when named after some weekdays:

curl "https://example.com/{Monday,Tuesday,Friday}.jpg" -O

Naming the output

The examples above use -O which makes curl use the same name for the destination file as is used the effective URL. Convenient, but not always what you want.

curl also allows you to refer to the number or name from the range or list and use that when naming your output files, which helps you do better globbing.

For example, maybe the file name part of the URL is actually the same and you iterate over another difference in the URL. Like this:

curl "https://example.com/{Monday,Tuesday,Friday}/image" -o #1.jpg

The #1 part in the example is a reference back to the first list/range, as you can do multiple ones and even using mixed types and you can then use multiple #-references in the same command line. To illustrate, here is a simple example using two iterators to download three hundred images:

curl "https://{red,blue,green}.example.com/image[1-100].jpg" -o "#2-#1-stored.jpg"

There is actually no upper limit to how many transfers you can do like this with curl, other than that the numeric ranges only deal with up to 64 bit numbers.

Hundreds? Maybe go parallel

If you actually do come up with a command line that needs to transfer several hundred or more resources, then maybe consider adding -Z, --parallel to the mix so that curl performs many transfers simultaneously, in parallel. This can drastically reduce the total time needed for completing the task.

curl runs up to 50 transfers in parallel by default when this option is used, but you can also tweak this amount with --parallel-max.

A fragment trick

Okay, so now we finally arrive at the fragment and the trick mentioned in the title.

If you want to do several repeated transfers but not actually change the URL then the examples above do not satisfy you as they change the URL for every new transfer.

A neat trick is then to add a fragment part to the URL you use, and then do the globbing there. The fragment is the rightmost part of a URL that starts with a #-character and continues to the end of the URL.

A fragment can always be added to a URL, but the fragment is never actually transmitted over the network so the remote server is not aware of it.

Get the same URL ten times, saved in different target files:

curl "https://example.com/index.html#[1-10]" -o #1.html

If you rather name the outputs according some scheme, you can of course just list them in the glob:

curl "https://example.com/index.html#{mercury,venus,earth,mars}" -o #1.html

Maybe slower

In cases where you transfer the same URL many times, chances are you want to do this because the content changes at some interval. Perhaps you then do not want them all to be done as fast as possible as then the contents may not have updated.

To help you pace the transfers to get the same thing over and over in a more controlled manner, curl offers --rate. With this you can tell curl to not do it faster than N transfers per given period.

If the URL contents update every 5 minutes, then doing the transfer 12 times per hour seems suitable. Let’s do it 2016 times to have the operation run non-stop for a week:

curl "https://example.com/index.html#[1-2016]" --rate 12/h -o "#1.html"

curl, open source and networking