Mars 2020 Helicopter Contributor

Friends of mine know that I’ve tried for a long time to get confirmation that curl is used in space. We’ve believed it to be likely but I’ve wanted to get a clear confirmation that this is indeed the fact.

Today GitHub posted their article about open source in the Mars mission, and they now provide a badge on their site for contributors of projects that are used in that mission.

I have one of those badges now. Only a few other of the current 879 recorded curl authors got it. Which seems to be due to them using a very old curl release (curl 7.19, released in September 2008) and they couldn’t match all contributors with emails or the authors didn’t have their emails verified on GitHub etc.

According to that GitHub blog post, we are “almost 12,000” developers who got it.

While this strictly speaking doesn’t say that curl is actually used in space, I think it can probably be assumed to be.

Here’s the interplanetary curl development displayed in a single graph:

See also: screenshotted curl credits and curl supports NASA.

Credits

Image by Aynur Zakirov from Pixabay

curl those funny IPv4 addresses

Everyone knows that on most systems you can specify IPv4 addresses just 4 decimal numbers separated with periods (dots). Example:

192.168.0.1

Useful when you for example want to ping your local wifi router and similar. “ping 192.168.0.1”

Other bases

The IPv4 string is usually parsed by the inet_addr() function or at times it is passed straight into the name resolver function like getaddrinfo().

This address parser supports more ways to specify the address. You can for example specify each number using either octal or hexadecimal.

Write the numbers with zero-prefixes to have them interpreted as octal numbers:

0300.0250.0.01

Write them with 0x-prefixes to specify them in hexadecimal:

0xc0.0xa8.0x00.0x01

You will find that ping can deal with all of these.

As a 32 bit number

An IPv4 address is a 32 bit number that when written as 4 separate numbers are split in 4 parts with 8 bits represented in each number. Each separate number in “a.b.c.d” is 8 bits that combined make up the whole 32 bits. Sometimes the four parts are called quads.

The typical IPv4 address parser however handles more ways than just the 4-way split. It can also deal with the address when specified as one, two or three numbers (separated with dots unless its just one).

If given as a single number, it treats it as a single unsigned 32 bit number. The top-most eight bits stores what we “normally” with write as the first number and so on. The address shown above, if we keep it as hexadecimal would then become:

0xc0a80001

And you can of course write it in octal as well:

030052000001

and plain old decimal:

3232235521

As two numbers

If you instead write the IP address as two numbers with a dot in between, the first number is assumed to be 8 bits and the next one a 24 bit one. And you can keep on mixing the bases as you see like. The same address again, now in a hexadecimal + octal combo:

0xc0.052000001

This allows for some fun shortcuts when the 24 bit number contains a lot of zeroes. Like you can shorten “127.0.0.1” to just “127.1” and it still works and is perfectly legal.

As three numbers

Now the parts are supposed to be split up in bits like this: 8.8.16. Here’s the example address again in octal, hex and decimal:

0xc0.0250.1

Bypassing filters

All of these versions shown above work with most tools that accept IPv4 addresses and sometimes you can bypass filters and protection systems by switching to another format so that you don’t match the filters. It has previously caused problems in node and perl packages and I’m guessing numerous others. It’s a feature that is often forgotten, ignored or just not known.

It begs the question why this very liberal support was once added and allowed but I’ve not been able to figure that out – maybe because of how it matches class A/B/C networks. The support for this syntax seems to have been introduced with the inet_aton() function in the 4.2BSD release in 1983.

IPv4 in URLs

URLs have a host name in them and it can be specified as an IPv4 address.

RFC 3986

The RFC 3986 URL specification’s section 3.2.2 says an IPv4 address must be specified as:

dec-octet "." dec-octet "." dec-octet "." dec-octet

… but in reality very few clients that accept such URLs actually restrict the addresses to that format. I believe mostly because many programs will pass on the host name to a name resolving function that itself will handle the other formats.

The WHATWG URL Spec

The Host Parsing section of this spec allows the many variations of IPv4 addresses. (If you’re anything like me, you might need to read that spec section about three times or so before that’s clear).

Since the browsers all obey to this spec there’s no surprise that browsers thus all allow this kind of IP numbers in URLs they handle.

curl before

curl has traditionally been in the camp that mostly accidentally somewhat supported the “flexible” IPv4 address formats. It did this because if you built curl to use the system resolver functions (which it does by default) those system functions will handle these formats for curl. If curl was built to use c-ares (which is one of curl’s optional name resolver backends), using such address formats just made the transfer fail.

The drawback with allowing the system resolver functions to deal with the formats is that curl itself then works with the original formatted host name so things like HTTPS server certificate verification and sending Host: headers in HTTP don’t really work the way you’d want.

curl now

Starting in curl 7.77.0 (since this commit ) curl will “natively” understand these IPv4 formats and normalize them itself.

There are several benefits of doing this ourselves:

  1. Applications using the URL API will get the normalized host name out.
  2. curl will work the same independently of selected name resolver backend
  3. HTTPS works fine even when the address is using other formats
  4. HTTP virtual hosts headers get the “correct” formatted host name

Fun example command line to see if it works:

curl -L 16843009

16843009 gets normalized to 1.1.1.1 which then gets used as http://1.1.1.1 (because curl will assume HTTP for this URL when no scheme is used) which returns a 301 redirect over to https://1.1.1.1/ which -L makes curl follow…

Credits

Image by Thank you for your support Donations welcome to support from Pixabay