23 | December | 2022 | daniel.haxx.se

curl supports globbing in the sense that you can provide ranges or lists in the URL that will make curl iterate, loop, over all the different variations and do a separate transfer for each.

For example, get ten images in a numeric range:

curl "https://example.com/image[1-10].jpg" -O

Or get them when named after some weekdays:

curl "https://example.com/{Monday,Tuesday,Friday}.jpg" -O

Naming the output

The examples above use -O which makes curl use the same name for the destination file as is used the effective URL. Convenient, but not always what you want.

curl also allows you to refer to the number or name from the range or list and use that when naming your output files, which helps you do better globbing.

For example, maybe the file name part of the URL is actually the same and you iterate over another difference in the URL. Like this:

curl "https://example.com/{Monday,Tuesday,Friday}/image" -o #1.jpg

The #1 part in the example is a reference back to the first list/range, as you can do multiple ones and even using mixed types and you can then use multiple #-references in the same command line. To illustrate, here is a simple example using two iterators to download three hundred images:

curl "https://{red,blue,green}.example.com/image[1-100].jpg" -o "#2-#1-stored.jpg"

There is actually no upper limit to how many transfers you can do like this with curl, other than that the numeric ranges only deal with up to 64 bit numbers.

Hundreds? Maybe go parallel

If you actually do come up with a command line that needs to transfer several hundred or more resources, then maybe consider adding -Z, --parallel to the mix so that curl performs many transfers simultaneously, in parallel. This can drastically reduce the total time needed for completing the task.

curl runs up to 50 transfers in parallel by default when this option is used, but you can also tweak this amount with --parallel-max.

A fragment trick

Okay, so now we finally arrive at the fragment and the trick mentioned in the title.

If you want to do several repeated transfers but not actually change the URL then the examples above do not satisfy you as they change the URL for every new transfer.

A neat trick is then to add a fragment part to the URL you use, and then do the globbing there. The fragment is the rightmost part of a URL that starts with a #-character and continues to the end of the URL.

A fragment can always be added to a URL, but the fragment is never actually transmitted over the network so the remote server is not aware of it.

Get the same URL ten times, saved in different target files:

curl "https://example.com/index.html#[1-10]" -o #1.html

If you rather name the outputs according some scheme, you can of course just list them in the glob:

curl "https://example.com/index.html#{mercury,venus,earth,mars}" -o #1.html

Maybe slower

In cases where you transfer the same URL many times, chances are you want to do this because the content changes at some interval. Perhaps you then do not want them all to be done as fast as possible as then the contents may not have updated.

To help you pace the transfers to get the same thing over and over in a more controlled manner, curl offers --rate. With this you can tell curl to not do it faster than N transfers per given period.

If the URL contents update every 5 minutes, then doing the transfer 12 times per hour seems suitable. Let’s do it 2016 times to have the operation run non-stop for a week:

curl "https://example.com/index.html#[1-2016]" --rate 12/h -o "#1.html"

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

daniel.haxx.se

Daily Archives: December 23, 2022

The curl fragment trick

Naming the output

Hundreds? Maybe go parallel

A fragment trick

Maybe slower

curl, open source and networking