When you use the name
localhost in a URL, what does it mean? Where does the network traffic go when you ask curl to download
Is “localhost” just a name like any other or do you think it infers speaking to your local host on a loopback address?
The name was “resolved” using the standard resolver mechanism into one or more IP addresses and then curl connected to the first one that works and gets the data from there.
The (default) resolving phase there involves asking the getaddrinfo() function about the name. In many systems, it will return the IP address(es) specified in
/etc/hosts for the name. In some systems things are a bit more unusually setup and causes a DNS query get sent out over the network to answer the question.
In other words: localhost was not really special and using this name in a URL worked just like any other name in curl. In most cases in most systems it would resolve to 127.0.0.1 and ::1 just fine, but in some cases it would mean something completely different. Often as a complete surprise to the user…
Starting in commit 1a0ebf6632f8, to be released in curl 7.78.0, curl now treats the host name “localhost” specially and will use an internal “hard-coded” set of addresses for it – the ones we typically use for the loopback device: 127.0.0.1 and ::1. It cannot be modified by
/etc/hosts and it cannot be accidentally or deliberately tricked by DNS resolves. localhost will now always resolve to a local address!
Does that kind of mistakes or modifications really happen? Yes they do. We’ve seen it and you can find other projects report it as well.
Who knows, it might even be a few microseconds faster than doing the “full” resolve call.
(You can still build curl without IPv6 support at will and on systems without support, for which the ::1 address of course will not be provided for localhost.)
Specs say we can
Users are free to use localhost names as they would any other domain names. Users may assume that IPv4 and IPv6 address queries for localhost names will always resolve to the respective IP loopback address.
Name resolution APIs and libraries SHOULD recognize localhost names as special and SHOULD always return the IP loopback address for address queries and negative responses for all other query types. Name resolution APIs SHOULD NOT send queries for localhost names to their configured caching DNS server(s).
Mike West at Google also once filed an I-D with even stronger wording suggesting we should always let localhost be local. That wasn’t ever turned into an RFC though but shows a mindset.
(Some) Browsers do it
Chrome has been special-casing localhost this way since 2017, as can be seen in this commit and I think we can safely assume that the other browsers built on their foundation also do this.
Firefox landed their corresponding change during the fall of 2020, as recorded in this bugzilla entry.
Safari (on macOS at least) does however not do this. It rather follows what
/etc/hosts says (and presumably DNS of not present in there). I’ve not found any official position on the matter, but I found this source code comment indicating that localhost resolving might change at some point:
Windows (kind of) does it
Since some time back, Windows already resolves “localhost” internally and it is not present in their /etc/hosts alternative. I believe it is more of a hybrid solution though as I believe you can put localhost into that file and then have that custom address get used for the name.
Secure over http://localhost
When we know for sure that http://localhost is indeed a secure context (that’s a browser term I’m borrowing, sorry), we can follow the example of the browsers and for example curl should be able to start considering cookies with the “secure” property to be dealt with over this host even when done over plain HTTP. Previously, secure in that regard has always just meant HTTPS.
This change in cookie handling has not happened in curl yet, but with localhost being truly local, it seems like an improvement we can proceed with.
Can you still trick curl?
When I mentioned this change proposal on twitter two of the most common questions in response were
- can’t you still trick curl by routing 127.0.0.1 somewhere else
- can you still use
--resolveto “move” localhost?
The answers to both questions are yes.
You can of course commit the most hideous hacks to your system and reroute traffic to 127.0.0.1 somewhere else if you really wanted to. But I’ve never seen or heard of anyone doing it, and it certainly will not be done by mistake. But then you can also just rebuild your curl/libcurl and insert another address than the default as “hardcoded” and it’ll behave even weirder. It’s all just software, we can make it do anything.
--resolve option is this magic thing to redirect curl operations from the given host to another custom address. It also works for localhost, since curl will check the cache before the internal resolve and
--resolve populates the DNS cache with the given entries. (Provided to applications via the CURLOPT_RESOLVE option.)
What will break?
With enough number of users, every single little modification or even improvement is likely to trigger something unexpected and undesired on at least one system somewhere. I don’t think this change is an exception. I fully expect this to cause someone to shake their fist in the sky.
However, I believe there are fairly good ways to make to restore even the most complicated use cases even after this change, even if it might take some hands on to update the script or application. I still believe this change is a general improvement for the vast majority of use cases and users. That’s also why I haven’t provided any knob or option to toggle off this behavior.
The top photo was taken by me (the symbolism being that there’s a path to take somewhere but we don’t really know where it leads or which one is the right to take…). This curl change was written by me. Mike West provided me the Chrome localhost change URL. Valentin Gosu gave me the Firefox bugzilla link.