Tag Archives: cookies

PSL in curl

PSL in this context stands for Public Suffix List. It is a list of known public suffixes in domain names. A public suffix is a name under which Internet users can (or historically could) directly register names – other than the top-level domains. One of the most commonly known and used such suffixes is “co.uk”, but the complete list is rather long and exhaustive.

Public suffixes are important when working with HTTP cookies, because the only way an HTTP client can accurately deny setting super cookies is to know about every existing PSL domain and deny servers to set cookies for those domains.

The use of PSL is recommend in RFC 6265: “If feasible, user agents SHOULD use an up-to-date public suffix list” (and yes, there mere fact that it speaks of a list and not the list of course highlights how problematic this is)

A super cookie is a cookie that is set for a “too wide” domain, making it apply across multiple sites in ways cookies are not supposed to function. If we allow a super cookie set for “co.uk”, that cookie could get sent to a huge amount of different domains.

If you ask me, this is one of the ugliest parts of cookie functionality. And there are a few such parts to select from.

The exact list of names present in the PSL varies over time. New names are added, old names are removed. This of course makes aging clients over time increasingly likely to unwittingly accept cookies for domain names that have since been added to the list. Or block cookies for domains that are no longer in the list!

curl, cookies and PSL

curl has supported cookies since 1998, but it has only supported PSL since 2015 (7.46.0). PSL support in curl is powered by the libpsl library.

Support for this external library is entirely optional in the build and lots of users deliberately decide to build curl without it. Leaving it out reduces footprint, avoids an extra dependency etc. Lots of users who build curl won’t use cookies or do not care about the risk for super cookies.

When you invoke curl -V, to get to see the versions and features of your currently installed curl tool, look for the PSL keyword among the features to learn whether your curl build supports Public Suffix List handling.

cookies are cross-client

Since cookies are sent to clients / browsers typically in a completely client-agnostic way, setting cookies with “illegal” domains make them get discarded and not used. This makes sites tend to not try super cookies for normal scenarios as those cookies will just be rejected by most clients. This fact sometimes save the non-PSL compliant clients.

Of course, servers can also just send a flood of cookies to the client trying different domain attributes – as clients will just silent discard the illegal ones and keep the legal ones and then use them later according to their own best abilities.

Force a decision

Based on the realization that people have been building and shipping curl without PSL support perhaps without fully understanding what they have ducked for, I have just merged a change to curl’s configure script. This change will ship in curl 8.6.0.

The configure script checks for libpsl by default (and has done so for a long time) and will use the library if detected, but starting now it will exit with an error if the detection fails to highlight this fact in a much more noticeable way. The users who want to keep building without it can just add --without-libpsl to hush up the message. Previously, a failed detection only worked as a hint to configure that it should build curl without PSL support – completely fine, but maybe not always done so with the user being totally knowledgeable of what that means.

This change still does not necessarily mean that the person getting the configure error understands PSL in depth, but it forces them to make a decision: Willingly build without it, or fix the build to use it. Your choice!

Not a security issue (in curl)

This (now past) lenient take, to allow users to build curl easily without PSL support, was reported as a curl security problem.

I disagree with that take, as I believe it is the responsibility of everyone building curl to make sure that the sum of the components is what they think it should be – the curl build procedure is not designed for nor attempting to make sure that the final product is hardened security wise (whatever that could be). PSL support is optionally built-in – by design.

This said: users who mistakenly use curl without PSL support might of course end up in what could be seen as a potential security problem. I do however not think that it is a security problem in the curl project itself.

There is a tab in my cookie

An HTTP cookie, is just a name + value pair sent from the server to the client. That pair is stored and is sent back to the server in subsequent requests when conditions match.

Cookies were first invented and used in the 1990s. Sources seem to agree that the first browser to support them, was Netscape 0.9beta released in September 1994. Internet Explorer added support in October 1995.

After many years and several failed specification attempts, they were eventually documented in RFC 6265 in 2011. They have been debated, criticized and misunderstood since virtually forever. Mostly because of the abuse/tracking they (used to?) allow in browsers. Less so because of how they actually work over the wire.

curl

curl has supported cookies since the 1990s as well (October 1998 to be specific), and it is a frequently used feature among curl users everywhere. Not the least because the login pattern of the web has become HTTP POST with credentials with a session cookie returned on success – and curl is often used to mimic or reproduce such operations to allow for automated processes and more.

For curl, it is important to remain interoperable and compatible with cookies the same way browsers do them so that users can keep doing these things.

What to accept

Not very long ago, I blogged about a cookie change we had to do because curl’s former liberal attitude towards what a cookie might contain turned out to be a possible vector for mischief. That flaw was basically a direct result of curl never totally adapting to the language in RFC 6265 because we typically do not change what seems to work and has not been reported to be wrong.

In that curl change, we started rejecting incoming cookies that contain “control codes” but we let ASCII code 9 through. ASCII code 9 is the tab character. Generally considered white space. We let it through because we checked the source code for two major browsers and they since they do, curl does as well.

Tab where?

But accepting tabs in the cookie line is one thing. The next question then comes where exactly in the cookie line should it be acceptable?

In the currently ongoing security audit for curl, our friends at Trail of Bits figured out that if a cookie is sent to curl with a tab in the name (literally inside the name and not before or after, like foo<tab>bar) curl would save that wrongly if saved to a file. This, because curl saves cookies in a tab-separated file format, the so called Netscape cookie file format, and it has no escape mechanism or anything. A tab in the cookie name causes the cookie to later get treated wrongly when the file is later loaded again.

Interop status

The file format curl uses for saving cookies is the same as the original format Netscape and then early Firefox used back in the old days. Since this format does not support tabs, I believe it is reasonable to assume the early browsers did not accept tabs in names or content.

I checked how (current) Chrome and Firefox handle cookies like this by creating a test page that sends cookies to the browser.

Chrome rejects cookies with tabs in name or content. They are simply not accepted or stored.

Firefox rejects cookies with tabs in name, but strangely enough it strips tabs from the content and otherwise let them through.

(Safari doesn’t work on Linux so I ignored that)

This, even if my reading of the RFC 6265 seems to say that they should be fine. They should be kept when “internal” to a name or content. I believe curl followed the spec here better than the browsers.

But clearly tabs in cookie names or content is not an interoperable concept on the web.

Adding or removing

With all this in my luggage, I decided to bring this question to the team working on the cookie spec update. The rfc6265bis effort.

I figured that this non-interoperable state and support situation could be worth highlighting or perhaps make a bit stricter in the spec update.

In that issue on GitHub, I was instead informed that recently changed language in their RFC draft rather made browser implementers keen on adding support for this kind of tabs in cookies.

Instead of admitting defeat and documenting that tabs in cookie names and values do not work correctly, we would rather continue limping along pretending this will work.

The HTTP community have supported cookies for almost thirty years without tabs working correctly inside cookies.

Adding (clearer) support for them in a spec would be good in the sense that more defined behavior is a good thing, but since we have decades of this non or perhaps spotty support the already deployed software and long tail of clients will not adapt to any such new wording rapidly. Even if accepted in the new spec, it will take ages until cookies could be done interoperable with tabs inside.

I believe we are better off just documenting that tabs SHOULD NOT be used in cookie name or content as they will not interop. Because that is the truth and will be for a long time no matter what we do today or tomorrow.

My comments in that issue at least seem to have brought reason for maybe reconsider the draft wording. Maybe.

Why!

Someone might ask the excellent question “why would anyone want to use a tab in the name or content?”, but quite frankly, I cannot think of any good or legitimate reason other than maybe laziness or a lack of proper filtering.

There is no sound technical reason why this needs to be done.

An executive decision

To fix the breakage curl does when saving cookies like this, and to align better with existing browsers, we decided to rather make curl reject incoming cookies that have tabs in names or content. Starting in the next release: 7.86.0.

We needed to do something. I believe going with the more strict approach is the better one here and now.

If the rfc6265bis draft ends up ultimately keeping the language “encouraging” tab support and browser authors decide to follow, then I presume we will have reason to revisit this decision later on and perhaps take curl in the other direction instead. I think we can handle that, even if I believe it would be the wrong thing for the ecosystem.

Credits

Cookie image by StockSnap from Pixabay

A bug that was 23 years old or not

This is a tale of cookies, Internet code and a CVE. It goes back a long time so please take a seat, lean back and follow along.

The scene is of course curl, the internet transfer tool and library I work on.

1998

In October 1998 we shipped curl 4.9. In 1998. Few people had heard of curl or used it back then. This was a few months before the curl website would announce that curl achieved 300 downloads of a new release. curl was still small in every meaning of the word at that time.

curl 4.9 was the first release that shipped with the “cookie engine”. curl could then receive HTTP cookies, parse them, understand them and send back cookies properly in subsequent requests. Like the browsers did. I wrote the bigger part of the curl code for managing cookies.

In 1998, the only specification that existed and described how cookies worked was a very brief document that Netscape used to host called cookie_spec. I keep a copy of that document around for curious readers. It really does not document things very well and it leaves out enormous amounts information that you had to figure out by inspecting other clients.

The cookie code I implemented than was based on that documentation and what the browsers seemed to do at the time. It seemed to work with numerous server implementations. People found good use for the feature.

2000s

This decade passed with a few separate efforts in the IETF to create cookie specifications but they all failed. The authors of these early cookie specs probably thought they could create standards and the world would magically adapt to them, but this did not work. Cookies are somewhat special in the regard that they are implemented by so many different authors, code bases and websites that fundamentally changing the way they work in a “decree from above” like that is difficult if not downright impossible.

RFC 6265

Finally, in 2011 there was a cookie rfc published! This time with the reversed approach: it primarily documented and clarified how cookies were actually already being used.

I was there and I helped it get made by proving my views and opinions. I did not agree to everything that the spec includes (you can find blog posts about some of those details), but finally having a proper spec was still a huge improvement to the previous state of the world.

Double syntax

What did not bother me much at the time, but has been giving me a bad rash ever since, is the peculiar way the spec is written: it provides one field syntax for how servers should send cookies, and a different one for what syntax clients should accept for cookies.

Two syntax for the same cookies.

This has at least two immediate downsides:

  1. It is hard to read the spec as it is very easy to to fall over one of those and assume that syntax is valid for your use case and accidentally get the wrong role’s description.
  2. The syntax defining how to send cookie is not really relevant as the clients are the ones that decide if they should receive and handle the cookies. The existing large cookie parsers (== browsers) are all fairly liberal in what they accept so nobody notices nor cares about if the servers don’t follow the stricter syntax in the spec.

RFC 6265bis

Since a few years back, there is ongoing work in IETF on revising and updating the cookie spec of 2011. Things have evolved and some extensions to cookies have been put into use in the world and deserves to be included in the spec. If you would to implement code today that manage cookies, the old RFC is certainly not enough anymore. This cookie spec update work is called 6265bis.

curl is up to date and compliant with what the draft versions of RFC 6265bis say.

The issue about the double syntax from above is still to be resolved in the document, but I faced unexpectedly tough resistance when I recently shared my options and thoughts about that spec peculiarity.

It can be noted that fundamentally, cookies still work the same way as they did back in 1998. There are added nuances and knobs sure, but the basic principles have remained. And will so even in the cookie spec update.

One of oddities of cookies is that they don’t work on origins like most other web features do.

HTTP Request tunneling

While cookies have evolved slowly over time, the HTTP specs have also been updated and refreshed a few times over the decades, but perhaps even more importantly the HTTP server implementations have implemented stricter parsing policies as they have (together with the rest of the world) that being liberal in what you accept (Postel’s law) easily lead to disasters. Like the dreaded and repeated HTTP request tunneling/smuggling attacks have showed us.

To combat this kind of attack, and probably to reduce the risk of other issues as well, HTTP servers started to reject incoming HTTP requests early if they appear “illegal” or malformed. Block them already at the door and not letting obvious crap in. In particular this goes for control codes in requests. If you try to send a request to a reasonably new HTTP server today that contains a control code, chances are very high that the server will reject the request and just return a 400 response code.

With control code I mean a byte value between 1 and 31 (excluding 9 which is TAB)

The well known HTTP server Apache httpd has this behavior enabled by default since 2.4.25, shipped in December 2016. Modern nginx versions seem to do this as well, but I have not investigated since exactly when.

Cookies for other hosts

If cookies were designed today for the first time, they certainly would be made different.

A website that sets cookies sends cookies to the client. For each cookie it sends, it sets a number of properties for the cookie. In particular it sets matching parameters for when the cookie should be sent back again by the client.

One of these cookie parameters set for a cookie is the domain that need to match for the client to send it. A server that is called www.example.com can set a cookie for the entire example.com domain, meaning that the cookie will then be sent by the client also when visiting second.example.com. Servers can set cookies for “sibling sites!

Eventually the two paths merged

The cookie code added to curl in 1998 was quite liberal in what content it accepted and while it was of course adjusted and polished over the years, it was working and it was compatible with real world websites.

The main driver for changes in that area of the code has always been to make sure that curl works like and interoperates with other cookie-using agents out in the wild.

CVE-2022-35252

In the end of June 2022 we received a report of a suspected security problem in curl, that would later result in our publication of CVE-2022-35252.

As it turned out, the old cookie code from 1998 accepted cookies that contained control codes. The control codes could be part of the name or the the content just fine, and if the user enabled the “cookie engine” curl would store those cookies and send them back in subsequent requests.

Example of a cookie curl would happily accept:

Set-Cookie: name^a=content^b; domain=.example.com

The ^a and ^b represent control codes, byte code one and two. Since the domain can mark the cookie for another host, as mentioned above, this cookie would get included for requests to all hosts within that domain.

When curl sends a cookie like that to a HTTP server, it would include a header field like this in its outgoing request:

Cookie: name^a=content^b

400

… to which a default configure Apache httpd and other servers will respond 400. For a script or an application that received theses cookies, further requests will be denied for as long as the cookies keep getting sent. A denial of service.

What does the spec say?

The client side part of RFC 6265, section 5.2 is not easy to decipher and figuring out that a client should discard cookies with control cookies requires deep studies of the document. There is in fact no mention of “control codes” or this byte range in the spec. I suppose I am just a bad spec reader.

Browsers

It is actually easier to spot what the popular browsers do since their source codes are easily available, and it turns out of course that both Chrome and Firefox already ignore incoming cookies that contain any of the bytes

%01-%08 / %0b-%0c / %0e-%1f / %7f

The range does not include %09, which is TAB and %0a / %0d which are line endings.

The fix

The curl fix was not too surprisingly and quite simply to refuse cookie fields that contain one or more of those banned byte values. As they are not accepted by the browser’s already, the risk that any legitimate site are using them for any benign purpose is very slim and I deem this change to be nearly risk-free.

The age of the bug

The vulnerable code has been in curl versions since version 4.9 which makes it exactly 8,729 days (23.9 years) until the shipped version 7.85.0 that fixed it. It also means that we introduced the bug on project day 201 and fixed it on day 8,930.

The code was not problematic when it shipped and it was not problematic during a huge portion of the time it has been used by a large amount of users.

It become problematic when HTTP servers started to refuse HTTP requests they suspected could be malicious. The way this code turned into a denial of service was therefore more or less just collateral damage. An unfortunate side effect.

Maybe the bug was born first when RFC 6265 was published. Maybe it was born when the first widely used HTTP server started to reject these requests.

Project record

8,729 days is a new project record age for a CVE to have been present in the code until found. It is still the forth CVE that were lingering around for over 8,000 days until found.

Credits

Thanks to Stefan Eissing for digging up historic Apache details.

Axel Chong submitted the CVE-2022-35252 report.

Campfire image by Martin Winkler from Pixabay

A tale of a trailing dot

Trailing dots on host names in URLs is the gift that keeps on giving.

Let me take you through a dwindling story of how the dot is handled differently in different places through the stack of an Internet client. The evil trailing dot.

DNS

When a given host name is to be resolved to an IP address on a networked computer, there are dedicated functions to use. The host name example.com resolves to a number of IP addresses.

If you add a dot to the end of that host name, it does not change what is resolved. “example.com.” resolves to the same set of addresses as “example.com” does. (But putting two dots at the end will make it fail.)

The reason why it works like this is based on how DNS is built up with different “labels” (that when written in text are separated with dots) and then having a trailing dot is just an empty final label, just as with no dot. So, in the DNS protocol there are no trailing dots so to speak. When trying two dots at the end, it makes a zero-length label between them and that is not allowed.

People accustomed to fiddle with DNS are used to ending Fully Qualified Domain Names (FQDN) with a trailing dot.

Resolving names

In addition to the name actually being resolved (sent to a DNS resolver), native resolver functions usually puts a meaning and a semantic difference between resolving “hello” and “hello.” (with or without the trailing dot). The trailing dot then means the name is to be used actually exactly only like that, it is specified in full, while the name without a trailing dot can be tried with a domain name appended to it. Or even a list of domain names, until one resolves. This makes people want to use a trailing dot at times, to avoid that domain test.

HTTP names

HTTP clients that want to work with a given URL needs to extract the name part from the URL and use that name to:

  1. resolve the host name to a list of IP addresses to connect to
  2. pass that name in the Host: or :authority: request headers, so that the HTTP server knows which specific server the clients speaks to – as it may run multiple servers on the same IP address

The HTTP spec says the name in the Host header should be used verbatim from the URL; the trailing dot should be included if it was present in the URL. This allows a server to host different content for “example.com” and “example.com.”, even if many servers will by default treat them as the same. Some hosts will just redirect the dot version to the non-dot. Some hosts will return error.

The HTTP client certainly connects to the same set of addresses for both.

For a lot of HTTP traffic, having the trailing dot there or not makes no difference. But they can be made to make a difference. And boy, they can certainly make a difference internally…

Cookies

Cookies are passed back and forth over HTTP using dedicated request and response headers. When a server wants to pass a cookie to the client, it can specify for which particular domain it is valid for and the client will send back cookies to the server only when there is a match of the domain it speaks to and for which domain cookies are set to etc.

The cookie spec RFC 6265 section 5.1.2 defines the host name in a way that makes it ignore trailing dots. Cookies set for a domain with a dot are valid for the same domain without one and vice versa.

SNI

When speaking to a HTTPS server, a client passes on the name of the remote server during the TLS handshake, in the SNI (Server Name Indication) field, so that the server knows which instance the client wants to speak to. What about the trailing dot you think?

The hostname is represented as a byte string using ASCII encoding without a trailing dot.

Meaning, that a HTTPS server cannot – in the TLS layer – make a distinction between a server for “example.com.” and “example.com”. Different hosts for HTTP, the same host for HTTPS.

curl’s dotted past

In the curl project, we – as everyone else – have struggled with the trailing dot over time.

As-is

We started out being mostly oblivious about the effects of the trailing dot and most of the code just treated it as part of the host name and it would be in the host name everywhere. Until one day someone pointed out that the SNI field does not approve of it. We fixed that.

Strip it

In 2014, curl started to always just cut off trailing dots internally if one was provided in the URL. The dot rarely makes a difference, it made the host name work fine with SNI and for HTTPS it is practically difficult to make a difference between them.

Keep it

In 2022, someone found a web site that actually requires a trailing dot in the Host: header to respond correctly and reported it to the curl project.

Sigh. We back-pedaled on the eight years old decision and decided to internally keep the dot in the name, but strip it for the purpose of the SNI field. This seems to be how the browsers are doing it. We released curl 7.82.0 with this change. That site that needed the trailing dot kept in the Host: header could now be retrieved with curl. Yay.

As a bonus curl also lowercases the SNI name field now, because that is what the browsers do even if the spec says the field is supposed to be used case insensitively. That habit has made sure there are servers on the Internet that won’t work properly if the SNI name is not lowercase…

In your face

That “back-peddle” for 7.82.0 when we brought back the dot into the host name field, turned out to be incomplete, but it was not totally obvious nor immediately apparent.

When we brought back the trailing dot into the name field, we accidentally broke several internal name checks.

The checks broke in the cookie handling of domains even though cookies, as mentioned above, are supposed to not care about trailing dots.

To understand this, we have to back up a little bit and talk about how cookies and cookie domains work.

Public Suffixes

Cookies are strange beasts and because the server can tell the client for which domain the cookie applies to, a client needs to check so that the server does not try to set the cookies too broadly or for other domains. It does not stop there, but there is also the concept of something called “Public Suffix List” (PSL), which are known domains for which setting a cookie is not accepted. (This list is also used for limiting other things in browsers but they are out of scope here.) One widely known such domain to mention as an example is “co.uk”. A server should not be allowed to set a cookie for “co.uk” as then it would basically be sent back for every web site that exist in the UK.

The PSL is a maintained list with a huge number of domains in it. To manage those and to make sure tools like curl can check for them in a convenient way, a dedicated library was made for this several years ago: libpsl. curl has optionally used this since 2015.

I said optional

That public suffix list is huge, which is a primary reason why many users still opt to build curl without support for it. This means that curl needs to provide backup functionality for the builds where libpsl is not present. Typically in a lot of embedded systems.

Without knowledge of the PSL curl will not reject cookies for “co.uk” but it should reject cookies for “.uk” or “.com” as even without PSL knowledge it still knows that setting cookies for top-level domains is not okay.

How did the curl check used without PSL verify if the given domain is a TLD only?

It checked – if there is a dot present in the name, then it is not a TLD.

CVE-2022-27779

Axel Chong figured out that for a curl build without PSL knowledge, the server could set a cookie for a TLD if you just made sure to end the name with a dot.

With the 7.82.0 change in place, where curl keeps the trailing dot for the host name, combined with that cookie set for TLD domain with a trailing dot, they have matching tail ends. This means that curl would send cookies to servers that match the criteria. The broken TLD check was benign all those years until we let the trailing dot in. This is security vulnerability CVE-2022-27779.

CVE-2022-30115

It did not stop there. Axel did not stop there. Since curl now keeps the trailing dot in the name and did not do it before, there was a second important string comparison that broke in unexpected ways that Axel figured out and reported. A second vulnerability introduced by the same change.

HSTS is the concept that allows curl to store a “cache” of host names and keep it around, so that if you want to do a subsequent transfer to one of those host names again before they expire, curl will go directly to HTTPS even if HTTP is used in the URL. As a way to avoid the clear-text insecure redirect step some URLs use.

The new treatment of trailing dots, that basically allows users to provide the same host name in two different ways and yet resolve to the exact same addresses exposed that the HSTS code did take care of (ignored) the trailing dot properly. If you let curl store HSTS info for the host name without a trailing dot, you can then later bypass the HSTS by using the same host name with a trailing dot. Or vice versa. This is security vulnerability CVE-2022-30115.

alt-svc

The code for alt-svc also needed adjustment for the dot, but fortunately that was “just” a bug and had not security impact.

All these three separate areas in which trailing dots caused problems have been fixed in curl 7.83.1 and all of them are now tested and verified with an extended set of tests to make sure they keep handle the dots correctly.

Someone called it a dot release.

Is this the end of dot problems?

I don’t know but it seems unlikely. The trailing dots have kept on haunting us since a long time by now so I would say the chances are big that there are both some more flaws lingering and some future changes pending. That then can make the cycle take another loop or two.

I suppose we will find out. Stay tuned!

curl localhost as a local host

When you use the name localhost in a URL, what does it mean? Where does the network traffic go when you ask curl to download http://localhost ?

Is “localhost” just a name like any other or do you think it infers speaking to your local host on a loopback address?

Previously

curl http://localhost

The name was “resolved” using the standard resolver mechanism into one or more IP addresses and then curl connected to the first one that works and gets the data from there.

The (default) resolving phase there involves asking the getaddrinfo() function about the name. In many systems, it will return the IP address(es) specified in /etc/hosts for the name. In some systems things are a bit more unusually setup and causes a DNS query get sent out over the network to answer the question.

In other words: localhost was not really special and using this name in a URL worked just like any other name in curl. In most cases in most systems it would resolve to 127.0.0.1 and ::1 just fine, but in some cases it would mean something completely different. Often as a complete surprise to the user…

Starting now

curl http://localhost

Starting in commit 1a0ebf6632f8, to be released in curl 7.78.0, curl now treats the host name “localhost” specially and will use an internal “hard-coded” set of addresses for it – the ones we typically use for the loopback device: 127.0.0.1 and ::1. It cannot be modified by /etc/hosts and it cannot be accidentally or deliberately tricked by DNS resolves. localhost will now always resolve to a local address!

Does that kind of mistakes or modifications really happen? Yes they do. We’ve seen it and you can find other projects report it as well.

Who knows, it might even be a few microseconds faster than doing the “full” resolve call.

(You can still build curl without IPv6 support at will and on systems without support, for which the ::1 address of course will not be provided for localhost.)

Specs say we can

The RFC 6761 is titled Special-Use Domain Names and in its section 6.3 it especially allows or even encourages this:

Users are free to use localhost names as they would any other domain names.  Users may assume that IPv4 and IPv6 address queries for localhost names will always resolve to the respective IP loopback address.

Followed by

Name resolution APIs and libraries SHOULD recognize localhost names as special and SHOULD always return the IP loopback address for address queries and negative responses for all other query types. Name resolution APIs SHOULD NOT send queries for localhost names to their configured caching DNS server(s).

Mike West at Google also once filed an I-D with even stronger wording suggesting we should always let localhost be local. That wasn’t ever turned into an RFC though but shows a mindset.

(Some) Browsers do it

Chrome has been special-casing localhost this way since 2017, as can be seen in this commit and I think we can safely assume that the other browsers built on their foundation also do this.

Firefox landed their corresponding change during the fall of 2020, as recorded in this bugzilla entry.

Safari (on macOS at least) does however not do this. It rather follows what /etc/hosts says (and presumably DNS of not present in there). I’ve not found any official position on the matter, but I found this source code comment indicating that localhost resolving might change at some point:

// FIXME: Ensure that localhost resolves to the loopback address.

Windows (kind of) does it

Since some time back, Windows already resolves “localhost” internally and it is not present in their /etc/hosts alternative. I believe it is more of a hybrid solution though as I believe you can put localhost into that file and then have that custom address get used for the name.

Secure over http://localhost

When we know for sure that http://localhost is indeed a secure context (that’s a browser term I’m borrowing, sorry), we can follow the example of the browsers and for example curl should be able to start considering cookies with the “secure” property to be dealt with over this host even when done over plain HTTP. Previously, secure in that regard has always just meant HTTPS.

This change in cookie handling has not happened in curl yet, but with localhost being truly local, it seems like an improvement we can proceed with.

Can you still trick curl?

When I mentioned this change proposal on twitter two of the most common questions in response were

  1. can’t you still trick curl by routing 127.0.0.1 somewhere else
  2. can you still use --resolve to “move” localhost?

The answers to both questions are yes.

You can of course commit the most hideous hacks to your system and reroute traffic to 127.0.0.1 somewhere else if you really wanted to. But I’ve never seen or heard of anyone doing it, and it certainly will not be done by mistake. But then you can also just rebuild your curl/libcurl and insert another address than the default as “hardcoded” and it’ll behave even weirder. It’s all just software, we can make it do anything.

The --resolve option is this magic thing to redirect curl operations from the given host to another custom address. It also works for localhost, since curl will check the cache before the internal resolve and --resolve populates the DNS cache with the given entries. (Provided to applications via the CURLOPT_RESOLVE option.)

What will break?

With enough number of users, every single little modification or even improvement is likely to trigger something unexpected and undesired on at least one system somewhere. I don’t think this change is an exception. I fully expect this to cause someone to shake their fist in the sky.

However, I believe there are fairly good ways to make to restore even the most complicated use cases even after this change, even if it might take some hands on to update the script or application. I still believe this change is a general improvement for the vast majority of use cases and users. That’s also why I haven’t provided any knob or option to toggle off this behavior.

Credits

The top photo was taken by me (the symbolism being that there’s a path to take somewhere but we don’t really know where it leads or which one is the right to take…). This curl change was written by me. Mike West provided me the Chrome localhost change URL. Valentin Gosu gave me the Firefox bugzilla link.

Workshop Season 4 Finale

The 2019 HTTP Workshop ended today. In total over the years, we have now done 12 workshop days up to now. This day was not a full day and we spent it on only two major topics that both triggered long discussions involving large parts of the room.

Cookies

Mike West kicked off the morning with his cookies are bad presentation.

One out of every thousand cookie header values is 10K or larger in size and even at the 50% percentile, the size is 480 bytes. They’re a disaster on so many levels. The additional features that have been added during the last decade are still mostly unused. Mike suggests that maybe the only way forward is to introduce a replacement that avoids the issues, and over longer remove cookies from the web: HTTP state tokens.

A lot of people in the room had opinions and thoughts on this. I don’t think people in general have a strong love for cookies and the way they currently work, but the how-to-replace-them question still triggered lots of concerns about issues from routing performance on the server side to the changed nature of the mechanisms that won’t encourage web developers to move over. Just adding a new mechanism without seeing the old one actually getting removed might not be a win.

We should possibly “worsen” the cookie experience over time to encourage switch over. To cap allowed sizes, limit use to only over HTTPS, reduce lifetimes etc, but even just that will take effort and require that the primary cookie consumers (browsers) have a strong will to hurt some amount of existing users/sites.

(Related: Mike is also one of the authors of the RFC6265bis draft in progress – a future refreshed cookie spec.)

HTTP/3

Mike Bishop did an excellent presentation of HTTP/3 for HTTP people that possibly haven’t kept up fully with the developments in the QUIC working group. From a plain HTTP view, HTTP/3 is very similar feature-wise to HTTP/2 but of course sent over a completely different transport layer. (The HTTP/3 draft.)

Most of the questions and discussions that followed were rather related to the transport, to QUIC. Its encryption, it being UDP, DOS prevention, it being “CPU hungry” etc. Deploying HTTP/3 might be a challenge for successful client side implementation, but that’s just nothing compared the totally new thing that will be necessary server-side. Web developers should largely not even have to care…

One tidbit that was mentioned is that in current Firefox telemetry, it shows about 0.84% of all requests negotiates TLS 1.3 early data (with about 12.9% using TLS 1.3)

Thought-worthy quote of the day comes from Willy: “everything is a buffer”

Future Workshops

There’s no next workshop planned but there might still very well be another one arranged in the future. The most suitable interval for this series isn’t really determined and there might be reasons to try tweaking the format to maybe change who will attend etc.

The fact that almost half the attendees this time were newcomers was certainly good for the community but that not a single attendee traveled here from Asia was less good.

Thanks

Thanks to the organizers, the program committee who set this up so nicely and the awesome sponsors!

curl 7.61.1 comes with only bug-fixes

Already at the time when we shipped the previous release, 7.61.0, I had decided I wanted to do a patch release next. We had some pretty serious HTTP/2 bugs in the pipe to get fixed and there were a bunch of other unresolved issues also awaiting their treatments. Then I took off on vacation and and the HTTP/2 fixes took a longer time than expected to get on top of, so I subsequently decided that this would become a bug-fix-only release cycle. No features and no changes would be merged into master. So this is what eight weeks of only bug-fixes can look like.

Numbers

the 176th release
0 changes
56 days (total: 7,419)

102 bug fixes (total: 4,640)
151 commits (total: 23,439)
0 new curl_easy_setopt() options (total: 258)

0 new curl command line option (total: 218)
46 contributors, 21 new (total: 1,787)
27 authors, 14 new (total: 612)
  1 security fix (total: 81)

Notable bug-fixes this cycle

Among the many small fixes that went in, I feel the following ones deserve a little extra highlighting…

NTLM password overflow via integer overflow

This latest security fix (CVE-2018-14618) is almost identical to an earlier one we fixed back in 2017 called CVE-2017-8816, and is just as silly…

The internal function Curl_ntlm_core_mk_nt_hash() takes a password argument, the same password that is passed to libcurl from an application. It then gets the length of that password and allocates a memory area that is twice the length, since it needs to expand the password. Due to a lack of checks, this calculation will overflow and wrap on a 32 bit machine if a password that is longer than 2 gigabytes is passed to this function. It will then lead to a very small memory allocation, followed by an attempt to write a very long password to that small memory buffer. A heap memory overflow.

Some mitigating details: most architectures support 64 bit size_t these days. Most applications won’t allow passing in passwords that are two gigabytes.

This bug has been around since libcurl 7.15.4, released back in 2006!

Oh, and on the curl web site we now use the CVE number in the actual URL for all the security vulnerabilities to make them easier to find and refer to.

HTTP/2 issues

This was actually a whole set of small problems that together made the new crawler example not work very well – until fixed. I think it is safe to say that HTTP/2 users of libcurl have previously used it in a pretty “tidy” fashion, because I believe I corrected four or five separate issues that made it misbehave.  It was rather pure luck that has made it still work as well as it has for past users!

Another HTTP/2 bug we ran into recently involved us discovering a little quirk in the underlying nghttp2 library, which in some very special circumstances would refuse to blank out the stream id to struct pointer mapping which would lead to it delivering a pointer to a stale (already freed) struct at a later point. This is fixed in nghttp2 now, shipped in its recent 1.33.0 release.

Windows send-buffer tuning

Making uploads on Windows from between two to seven times faster than before is certainly almost like a dream come true. This is what 7.61.1 offers!

Upload buffer size increased

In tests triggered by the fix above, it was noticed that curl did not meet our performance expectations when doing uploads on really high speed networks, notably on localhost or when using SFTP. We could easily double the speed by just increasing the upload buffer size. Starting now, curl allocates the upload buffer on demand (since many transfers don’t need it), and now allocates a 64KB buffer instead of the previous 16KB. It has been using 16KB since the 2001, and with the on-demand setup and the fact that computer memories have grown a bit during 17 years I think it is well motivated.

A future curl version will surely allow the application to set this upload buffer size. The receive buffer size can already be set.

Darwinssl goes ALPN

While perhaps in the grey area of what a bugfix can be, this fix  allows curl to negotiate ALPN using the darwinssl backend, which by extension means that curl built to use darwinssl can now – finally – do HTTP/2 over HTTPS! Darwinssl is also known under the name Secure Transport, the native TLS library on macOS.

Note however that macOS’ own curl builds that Apple ships are no longer built to use Secure Transport, they use libressl these days.

The Auth Bearer fix

When we added support for Auth Bearer tokens in 7.61.0, we accidentally caused a regression that now is history. This bug seems to in particular have hit git users for some reason.

-OJ regression

The introduction of bold headers in 7.61.0 caused a regression which made a command line like “curl -O -J http://example.com/” to fail, even if a Content-Disposition: header with a correct file name was passed on.

Cookie order

Old readers of this blog may remember my ramblings on cookie sort order from back in the days when we worked on what eventually became RFC 6265.

Anyway, we never did take all aspects of that spec into account when we sort cookies on the HTTP headers sent off to servers, and it has very rarely caused users any grief. Still, now Daniel Gustafsson did a glorious job and tweaked the code to also take creation order into account, exactly like the spec says we should! There’s still some gotchas in this, but at least it should be much closer to what the spec says and what some sites might assume a cookie-using client should do…

Unbold properly

Yet another regression. Remember how curl 7.61.0 introduced the cool bold headers in the terminal? Turns out I of course had my escape sequences done wrong, so in a large number of terminal programs the end-of-bold sequence (“CSI 21 m”) that curl sent didn’t actually switch off the bold style. This would lead to the terminal either getting all bold all the time or on some terminals getting funny colors etc.

In 7.61.1, curl sends the “switch off all styles” code (“CSI 0 m”) that hopefully should work better for people!

Next release!

We’ve held up a whole bunch of pull requests to ship this patch-only release. Once this is out the door, we’ll open the flood gates and accept the nearly 10 changes that are eagerly waiting merge. Expect my next release blog post to mention several new things in curl!

Reducing the Public Suffix pain?

Let me introduce you to what I consider one of the worst hacks we have in current and modern internet protocols: the Public Suffix List (PSL). This is a list (maintained by Mozilla) with domains that have some kind administrative setup or arrangement that makes sub-domains independent. For example, you can’t be allowed to set cookies for “*.com” because .com is a TLD that has independent domains. But the same thing goes for “*.co.uk” and there’s no hint anywhere about this – except for the Public Suffix List. Then, take that simple little example and extrapolate to a domain system that grows with several new TLDs every month and more. The PSL is now several thousands of entries long.

And cookies isn’t the only thing this is used for. Another really common and perhaps even more important use case is for wildcard matches in TLS server certificates. You should not be allowed to buy and use a cert for “*.co.uk” but you can for “*.yourcompany.co.uk”…

Not really official but still…

If you read the cookie RFC or the spec for how to do TLS wildcard certificate matching you won’t read any line putting it crystal clear that the Suffix List is what you must use and I’m sure different browser solve this slightly differently but in practice and most unfortunately (if you ask me) you must either use the list or make your own to be fully compliant with how the web works 2014.

curl, wget and the PSL

In curl and libcurl, we have so far not taken the PSL into account which is by choice since I’ve not had any decent way to handle it and there are lots of embedded and other use cases that simply won’t be able to cope with that large PSL chunk.

Wget hasn’t had any PSL awareness either, but the recent weeks this has been brought up on the wget list and more attention has been given to this. Work has been initiated to do something about it, which has lead to…

libpsl

Tim Rühsen took the baton and started the libpsl project and its associated mailing list, as a foundation for something for Wget to use to get PSL awareness.

I’ve mostly cheered the effort so far and said that I wouldn’t mind building on this to enhance curl in the future if it just gets a suitable (liberal enough) license and it seems to go in that direction. For curl’s sake, I would like to get a conditional dependency on this so that people without particular size restrictions can use this, and people on more embedded and special-purpose situations can continue to build without PSL support.

If you’re interested in helping out in curl and libcurl in this area, feel most welcome!

dbound

Meanwhile, the IETF has set up a new mailing list called dbound for discussions around PSL and similar issues and it seems very timely!

tailmatching them cookies

A brand new libcurl security advisory was announced on April 12th, which details how libcurl can leak cookies to domains with tailmatch. Let me explain the details.

(Did I mention that security is hard?)

cookiecurl first implemented cookie support way back in the early days in the late 90s. I participated in the IETF work that much later documented how cookies work in real life. I know how cookies work, and yet this flaw still existed in the curl cookie implementation for over 13 years. Until someone spotted it. And once again that sense of gaaaah, how come we never saw this before!! came over me.

A quick cookie 101

When cookies are used over HTTP, it is (if we simplify things a little) only a name = value pair that is set to be valid a certain domain and a path. But the path is only specifying the prefix, and the domain only specifies the tail part. This means that a site can set a cookie that is for the entire site that is under the path /members so that it will be sent by the brower even for /members/names/ as well as for /members/profile/me etc. The cookie will then not be sent to the same domain for pages under a different path, such as /logout or similar.

A domain for a cookie can set to be valid for example.org and then it will be sent by the browser also for www.example.org and www.sub.example.org but not at all for example.com or badexample.org.

Unless of course you have a bug in the cookie tailmatching function. The bug libcurl had until 7.30.0 was released made it send cookies for the domain example.org also to sites that would have the same tail but a different prefix. Like badexample.org.

Let me try a story on you

It might not be obvious at first glance how terrible this can become to users. Let me take you through an imaginary story, backed up by some facts:

Imagine that there’s a known web site out there on the internet that provides an email service to users. Users login on a form and they read email. Or perhaps it is a social site. Preferably for our story, the site is using HTTP only but this trick can be done for most HTTPS sites as well with only a mildly bigger effort.

This known and popular site runs its services on ‘site.com’. When you’re logged in to site.com, your session is a cookie that keeps getting sent to the server and the server sometimes updates the contents and sends it back to the browser. This is the way millions of sites work.

As an evil person, you now register a domain and setup an attack server. You register a domain that has the same ending as the legitimate site. You call your domain ‘fun-cat-and-food-pics-from-site.com’ (FCAFPFS among friends).

anattackMr evil person also knows that there are several web browsers, typically special purpose ones for different kinds of devices, that use libcurl as its base. (But it doesn’t have to be a browser, it could be other tools as well but for this story a browser fits fine.) Lets say you know a person or two who use one of those browsers on site.com.

You send a phishing email to these persons. Or post a funny picture on the social site. The idea is to have them click your link to follow through to your funny FCAFPFS site. A little social engineering, who on the internet can truly resist funny cats?

The visitor’s browser (which uses a vulnerable libcurl) does the wrong “tailmatch” on the domain for the session cookie and gladly hands it off to the attacker site. The attacker site could then use that cookie to access site.com and hijack the user’s session. Quite likely the attacker would immediately change password or something and logout/login so that the innocent user who’s off looking at cats will get a “you are logged-out” message when he/she returns to site.com…

The attacker could then use “password reminder” features on other sites to get emails sent to site.com to allow him to continue attacking the user’s other accounts on other services. Or if site.com was a social site, the attacker would post more cat links and harvest more accounts etc…

End of story.

Any process improvements?

For every security vulnerability a project gets, it should be a reason for scrutinizing what went wrong. I don’t mean in the actual code necessarily, but more what processes we lack that made the bug sneak in and remain in there for so long without being detected.

What didn’t we do that made this bug survive this long?

Obviously we didn’t review the code properly. But this is a tricky beast that was added a very long time ago, back in the days when the project was young and not that many developers were involved. Before we even had a test suite. I do believe that we have slightly better reviews these days, but I will also claim that it is far from sure that we would detect this flaw by a sheer code review.

Test cases! We clearly lacked the necessary test case setup that tested the limitations of how cookies are supposed to work and get sent back and forth. We’ve added a few new ones now that detect this particular flaw fine, but I think we have reasons to continue to search for various kinds of negative tests we should do. Involving cookies of course, but also generally in other areas of the curl project.

Of course, we’re all just working voluntarily here on spare time so we can’t expect miracles.

(an attack, picture by Andy Gardner)

curl’s new HTTP cookies docs

Whenever libcurl saves cookies to a file, it saves them in the old “Netscape cookie format”.cookie

Since ages ago libcurl has written a comment at the top of that cookie file (the file we refer to as the cookie jar in curl lingo) that explains that it is a cookie netscape cookie file format and then a URL pointing to the original Netscape document describing what cookies are and how they work. To be perfectly honest, we pointed to a local copy of that document since the original is since long removed and gone and now only available through archive.org.

The web page pointed to were never a perfect match since the page never explained the file format or its use, only the cookie protocol – as it was originally described to work and not even like cookies works today.

Starting with curl 7.27.0, it will write the header to point to another URL: http://curl.haxx.se/docs/http-cookies.html

cURL