Tag Archives: Chrome

curl is not removing FTP

FTP is going out of style.

The Chrome team has previously announced that they are deprecating and removing support for FTP.

Mozilla also announced their plan for the deprecation of FTP in Firefox.

Both browsers have paused or conditioned their efforts to not take the final steps during the Covid-19 outbreak, but they will continue and the outcome is given: FTP support in browsers is going away. Soon.

curl

curl supported both uploads and downloads with FTP already in its first release in March 1998. Which of course was many years before either of those browsers mentioned above even existed!

In the curl project, we work super hard and tirelessly to maintain backwards compatibility and not break existing scripts and behaviors.

For these reasons, curl will not drop FTP support. If you have legacy systems running FTP, curl will continue to have your back and perform as snappy and as reliably as ever.

FTP the protocol

FTP is a protocol that is quirky to use over the modern Internet mostly due to its use of two separate TCP connections. It is unencrypted in its default version and the secured version, FTPS, was never supported by browsers. Not to mention that the encrypted version has its own slew of issues when used through NATs etc.

To put it short: FTP has its issues and quirks.

FTP use in general is decreasing and that is also why the browsers feel that they can take this move: it will only negatively affect a very minuscule portion of their users.

Legacy

FTP is however still used in places. In the 2019 curl user survey, more than 29% of the users said they’d use curl to transfer FTP within the last two years. There’s clearly a long tail of legacy FTP systems out there. Maybe not so much on the public Internet anymore – but in use nevertheless.

Alternative protocols?

SFTP could have become a viable replacement for FTP in these cases, but in practice we’ve moved into a world where HTTPS replaces everything where browsers are used.

Credits

Train image by D Thory from Pixabay

Daily web traffic

By late 2019, there’s an estimated amount of ten billion curl installations in the world. Of course this is a rough estimate and depends on how you count etc.

There are several billion mobile phones and tablets and a large share of those have multiple installations of curl. Then there all the Windows 10 machines, web sites, all macs, hundreds of millions of cars, possibly a billion or so games, maybe half a billion TVs, games consoles and more.

How much data are they transferring?

In the high end of volume users, we have at least two that I know of are doing around one million requests/sec on average (and I’m not even sure they are the top users, they just happen to be users I know do high volumes) but in the low end there will certainly be a huge amount of installations that barely ever do any requests at all.

If there are two users that I know are doing one million requests/sec, chances are there are more and there might be a few doing more than a million and certainly many that do less but still many.

Among many of the named and sometimes high profiled apps and users I know use curl, I very rarely know exactly for what purpose they use curl. Also, some use curl to do very many small requests and some will use it to do a few but very large transfers.

Additionally, and this really complicates the ability to do any good estimates, I suppose a number of curl users are doing transfers that aren’t typically considered to be part of “the Internet”. Like when curl is used for doing HTTP requests for every single subway passenger passing ticket gates in the London underground, I don’t think they can be counted as Internet transfers even though they use internet protocols.

How much data are browsers driving?

According to some data, there is today around 4.388 billion “Internet users” (page 39) and the world wide average time spent “on the Internet” is said to be 6 hours 42 minutes (page 50). I think these numbers seem credible and reasonable.

According to broadbandchoices, an average hour of “web browsing” spends about 25MB. According to databox.com, an average visit to a web site is 2-3 minutes. httparchive.org says the median page needs 74 HTTP requests to render.

So what do users do with their 6 hours and 42 minutes “online time” and how much of it is spent in a browser? I’ve tried to find statistics for this but failed.

@chuttenc (of Mozilla) stepped up and helped me out with getting stats from Firefox users. Based on stats from users that used Firefox on the day of October 1, 2019 and actually used their browser that day, they did 2847 requests per client as median with the median download amount 18808 kilobytes. Of that single day of use.

I don’t have any particular reason to think that other browsers, other days or users of other browsers are very different than Firefox users of that single day. Let’s count with 3,000 requests and 20MB per day. Interestingly, that makes the average data size per request a mere 6.7 kilobytes.

A median desktop web page total size is 1939KB right now according to httparchive.org (and the mobile ones are just slightly smaller so the difference isn’t too important here).

Based on the median weight per site from httparchive, this would imply that a median browser user visits the equivalent of 15 typical sites per day (30MB/1.939MB).

If each user spends 3 minutes per site, that’s still just 45 minutes of browsing per day. Out of the 6 hours 42 minutes. 11% of Internet time is browser time.

3000 requests x 4388000000 internet users, makes 13,164,000,000,000 requests per day. That’s 13.1 trillion HTTP requests per day.

The world’s web users make about 152.4 million HTTP requests per second.

(I think this is counting too high because I find it unlikely that all internet users on the globe use their browsers this much every day.)

The equivalent math to figure out today’s daily data amounts transferred by browsers makes it 4388000000 x 30MB = 131,640,000,000 megabytes/day. 1,523,611 megabytes per second. 1.5 TB/sec.

30MB/day equals a little under one GB/month per person. Feels about right.

Back to curl usage

The curl users with the highest request frequencies known to me (*) are racing away at one million requests/second on average, but how many requests do the others actually do? It’s really impossible to say. Let’s play the guessing game!

First, it feels reasonable to assume that these two users that I know of are not alone in doing high frequency transfers with curl. Purely based on probability, it seems reasonable to assume that the top-20 something users together will issue at least 10 million requests/second.

Looking at the users that aren’t in that very top. Is it reasonable to assume that each such installed curl instance makes a request every 10 minutes on average? Maybe it’s one per every 100 minutes? Or is it 10 per minute? There are some extremely high volume and high frequency users but there’s definitely a very long tail of installations basically never doing anything… The grim truth is that we simply cannot know and there’s no way to even get a ballpark figure. We need to guess.

Let’s toy with the idea that every single curl instance on average makes a transfer, a request, every tenth minute. That makes 10 x 10^9 / 600 = 16.7 million transfers per second in addition to the top users’ ten million. Let’s say 26 million requests per second. The browsers of the world do 152 million per second.

If each of those curl requests transfer 50Kb of data (arbitrarily picked out of thin air because again we can’t reasonably find or calculate this number), they make up (26,000,000 x 50 ) 1.3 TB/sec. That’s 85% of the data volume all the browsers in the world transfer.

The world wide browser market share distribution according to statcounter.com is currently: Chrome at 64%, Safari at 16.3% and Firefox at 4.5%.

This simple-minded estimate would imply that maybe, perhaps, possibly, curl transfers more data an average day than any single individual browser flavor does. Combined, the browsers transfer more.

Guesses, really?

Sure, or call them estimates. I’m doing them to the best of my ability. If you have data, reasoning or evidence to back up modifications my numbers or calculations that you can provide, nobody would be happier than me! I will of course update this post if that happens!

(*) = I don’t name these users since I’ve been given glimpses of their usage statistics informally and I’ve been asked to not make their numbers public. I hold my promise by not revealing who they are.

Thanks

Thanks to chuttenc for the Firefox numbers, as mentioned above, and thanks also to Jan Wildeboer for helping me dig up stats links used in this post.

Google to reimplement curl in libcrurl

Not the entire thing, just “a subset”. It’s not stated very clearly exactly what that subset is but the easy interface is mentioned in the Chrome bug about this project.

What?

The Chromium bug states that they will create a library of their own (named libcrurl) that will offer (parts of) the libcurl API and be implemented using Cronet.

Cronet is the networking stack of Chromium put into a library for use on mobile. The same networking stack that is used in the Chrome browser.

There’s also a mentioned possibility that “if this works”, they might also create “crurl” tool which is then their own version of the curl tool but using their own library. In itself is a pretty strong indication that their API will not be fully compatible, as if it was they could just use the existing curl tool…

Why?

“Implementing libcurl using Cronet would allow developers to take advantage of the utility of the Chrome Network Stack, without having to learn a new interface and its corresponding workflow. This would ideally increase ease of accessibility of Cronet, and overall improve adoption of Cronet by first-party or third-party applications.”

Logically, I suppose they then also hope that 3rd party applications can switch to this library (without having to change to another API or adapt much) and gain something and that new applications can use this library without having to learn a new API. Stick to the old established libcurl API.

How?

By throwing a lot of man power on it. As the primary author and developer of the libcurl API and the libcurl code, I assume that Cronet works quite differently than libcurl so there’s going to be quite a lot of wrestling of data and code flow to make this API work on that code.

The libcurl API is also very versatile and is an API that has developed over a period of almost 20 years so there’s a lot of functionality, a lot of options and a lot of subtle behavior that may or may not be easy or straight forward to mimic.

The initial commit imported the headers and examples from the curl 7.65.1 release.

Will it work?

Getting basic functionality for a small set of use cases should be simple and straight forward. But even if they limit the subset to number of functions and libcurl options, making them work exactly as we have them documented will be hard and time consuming.

I don’t think applications will be able to arbitrarily use either library for a very long time, if ever. libcurl has 80 public functions and curl_easy_setopt alone takes 268 different options!

Given enough time and effort they can certainly make this work to some degree.

Releases?

There’s no word on API/ABI stability or how they intend to ship or version their library. It is all very early still. I suppose we will learn more details as and if this progresses.

Flattered?

I think this move underscores that libcurl has succeeded in becoming an almost defacto standard for network transfers.

A Google office building in New York.

There’s this saying about imitation and flattery but getting competition from a giant like Google is a little intimidating. If they just put two paid engineers on their project they already have more dedicated man power than the original libcurl project does…

How will it affect curl?

First off: this doesn’t seem to actually exist for real yet so it is still very early.

Ideally the team working on this from Google’s end finds and fixes issues in our code and API so curl improves. Ideally this move makes more users aware of libcurl and its API and we make it even easier for users and applications in the world to do safe and solid Internet transfers. If the engineers are magically good, they offer a library that can do things better than libcurl can, using the same API so application authors can just pick the library they find work the best. Let the best library win!

Unfortunately I think introducing half-baked implementations of the API will cause users grief since it will be hard for users to understand what API it is and how they differ.

Since I don’t think “libcrurl” will be able to offer a compatible API without a considerable effort, I think applications will need to be aware of which of the APIs they work with and then we have a “split world” to deal with for the foreseeable future and that will cause problems, documentation problems and users misunderstanding or just getting things wrong.

Their naming will possibly also be reason for confusion since “libcrurl” and “crurl” look so much like typos of the original names.

We are determined to keep libcurl the transfer library for the internet. We support the full API and we offer full backwards compatibility while working the same way on a vast amount of different platforms and architectures. Why use a copy when the original is free, proven and battle-tested since years?

Rights?

Just to put things in perspective: yes they’re perfectly allowed and permitted to do this. Both morally and legally. curl is free and open source and licensed under the MIT license.

Good luck!

I wish the team working on this the best of luck!

Updates after initial post

Discussions: the hacker news discussion, the reddit thread, the lobsters talk.

Rename? it seems the google library might change name to libcurl_on_cronet.

Update in April 2020:

According to an update to the bug entry dated February 28th 2020:

Remove libcurl_on_cronet and dependencies.

This project was never finished, and we have no current plans to
continue development.

The HTTP Workshop 2019 begins

The forth season of my favorite HTTP series is back! The HTTP Workshop skipped over last year but is back now with a three day event organized by the very best: Mark, Martin, Julian and Roy. This time we’re in Amsterdam, the Netherlands.

35 persons from all over the world walked in the room and sat down around the O-shaped table setup. Lots of known faces and representatives from a large variety of HTTP implementations, client-side or server-side – but happily enough also a few new friends that attend their first HTTP Workshop here. The companies with the most employees present in the room include Apple, Facebook, Mozilla, Fastly, Cloudflare and Google – having three or four each in the room.

Patrick Mcmanus started off the morning with his presentation on HTTP conventional wisdoms trying to identify what have turned out as successes or not in HTTP land in recent times. It triggered a few discussions on the specific points and how to judge them. I believe the general consensus ended up mostly agreeing with the slides. The topic of unshipping HTTP/0.9 support came up but is said to not be possible due to its existing use. As a bonus, Anne van Kesteren posted a new bug on Firefox to remove it.

Mark Nottingham continued and did a brief presentation about the recent discussions in HTTPbis sessions during the IETF meetings in Prague last week.

Martin Thomson did a presentation about HTTP authority. Basically how a client decides where and who to ask for a resource identified by a URI. This triggered an intense discussion that involved a lot of UI and UX but also trust, certificates and subjectAltNames, DNS and various secure DNS efforts, connection coalescing, DNSSEC, DANE, ORIGIN frame, alternative certificates and more.

Mike West explained for the room about the concept for Signed Exchanges that Chrome now supports. A way for server A to host contents for server B and yet have the client able to verify that it is fine.

Tommy Pauly then talked to his slides with the title of Website Fingerprinting. He covered different areas of a browser’s activities that are current possible to monitor and use for fingerprinting and what counter-measures that exist to work against furthering that development. By looking at the full activity, including TCP flows and IP addresses even lots of our encrypted connections still allow for pretty accurate and extensive “Page Load Fingerprinting”. We need to be aware and the discussion went on discussing what can or should be done to help out.

The meeting is going on somewhere behind that red door.

Lucas Pardue discussed and showed how we can do TLS interception with Wireshark (since the release of version 3) of Firefox, Chrome or curl and in the end make sure that the resulting PCAP file can get the necessary key bundled in the same file. This is really convenient when you want to send that PCAP over to your protocol debugging friends.

Roberto Peon presented his new idea for “Generic overlay networks”, a suggested way for clients to get resources from one out of several alternatives. A neighboring idea to Signed Exchanges, but still different. There was an interested to further and deepen this discussion and Roberto ended up saying he’d at write up a draft for it.

Max Hils talked about Intercepting QUIC and how the ability to do this kind of thing is very useful in many situations. During development, for debugging and for checking what potentially bad stuff applications are actually doing on your own devices. Intercepting QUIC and HTTP/3 can thus also be valuable but at least for now presents some challenges. (Max also happened to mention that the project he works on, mitmproxy, has more stars on github than curl, but I’ll just let it slide…)

Poul-Henning Kamp showed us vtest – a tool and framework for testing HTTP implementations that both Varnish and HAproxy are now using. Massaged the right way, this could develop into a generic HTTP test/conformance tool that could be valuable for and appreciated by even more users going forward.

Asbjørn Ulsberg showed us several current frameworks that are doing GET, POST or SEARCH with request bodies and discussed how this works with caching and proposed that SEARCH should be defined as cacheable. The room mostly acknowledged the problem – that has been discussed before and that probably the time is ripe to finally do something about it. Lots of users are already doing similar things and cached POST contents is in use, just not defined generically. SEARCH is a already registered method but could get polished to work for this. It was also suggested that possibly POST could be modified to also allow for caching in an opt-in way and Mark volunteered to author a first draft elaborating how it could work.

Indonesian and Tibetan food for dinner rounded off a fully packed day.

Thanks Cory Benfield for sharing your notes from the day, helping me get the details straight!

Diversity

We’re a very homogeneous group of humans. Most of us are old white men, basically all clones and practically indistinguishable from each other. This is not diverse enough!

A big thank you to the HTTP Workshop 2019 sponsors!


DNS-over-HTTPS is RFC 8484

The protocol we fondly know as DoH, DNS-over-HTTPS, is now  officially RFC 8484 with the official title “DNS Queries over HTTPS (DoH)”. It documents the protocol that is already in production and used by several client-side implementations, including Firefox, Chrome and curl. Put simply, DoH sends a regular RFC 1035 DNS packet over HTTPS instead of over plain UDP.

I’m happy to have contributed my little bits to this standard effort and I’m credited in the Acknowledgements section. I’ve also implemented DoH client-side several times now.

Firefox has done studies and tests in cooperation with a CDN provider (which has sometimes made people conflate Firefox’s DoH support with those studies and that operator). These studies have shown and proven that DoH is a working way for many users to do secure name resolves at a reasonable penalty cost. At least when using a fallback to the native resolver for the tricky situations. In general DoH resolves are slower than the native ones but in the tail end, the absolutely slowest name resolves got a lot better with the DoH option.

To me, DoH is partly necessary because the “DNS world” has failed to ship and deploy secure and safe name lookups to the masses and this is the one way applications “one layer up” can still secure our users.

Inspect curl’s TLS traffic

Since a long time back, the venerable network analyzer tool Wireshark (screenshot above) has provided a way to decrypt and inspect TLS traffic when sent and received by Firefox and Chrome.

You do this by making the browser tell Wireshark the SSL secrets:

  1. set the environment variable named SSLKEYLOGFILE to a file name of your choice before you start the browser
  2. Setting the same file name path in the Master-secret field in Wireshark. Go to Preferences->Protocols->SSL and edit the path as shown in the screenshot below.

Having done this simple operation, you can now inspect your browser’s HTTPS traffic in Wireshark. Just super handy and awesome.

Just remember that if you record TLS traffic and want to save it for analyzing later, you need to also save the file with the secrets so that you can decrypt that traffic capture at a later time as well.

curl

Adding curl to the mix. curl can be built using a dozen different TLS libraries and not just a single one as the browsers do. It complicates matters a bit.

In the NSS library for example, which is the TLS library curl is typically built with on Redhat and Centos, handles the SSLKEYLOGFILE magic all by itself so by extension you have been able to do this trick with curl for a long time – as long as you use curl built with NSS. A pretty good argument to use that build really.

Since curl version 7.57.0 the SSLKEYLOGFILE feature can also be enabled when built with GnuTLS, BoringSSL or OpenSSL. In the latter two libs, the feature is powered by new APIs in those libraries and in GnuTLS the library’s own logic similar to how NSS does it. Since OpenSSL is the by far most popular TLS backend for curl, this feature is now brought to users more widely.

In curl 7.58.0 (due to ship on Janurary 24, 2018), this feature is built by default also for curl with OpenSSL and in 7.57.0 you need to define ENABLE_SSLKEYLOGFILE to enable it for OpenSSL and BoringSSL.

And what’s even cooler? This feature is at the same time also brought to every single application out there that is built against this or later versions of libcurl. In one single blow. now suddenly a whole world opens to make it easier for you to debug, diagnose and analyze your applications’ TLS traffic when powered by libcurl!

Like the description above for browsers, you

  1. set the environment variable SSLKEYLOGFILE to a file name to store the secrets in
  2. tell Wireshark to use that same file to find the TLS secrets (Preferences->Protocols->SSL), as the screenshot showed above
  3. run the libcurl-using application (such as curl) and Wireshark will be able to inspect TLS-based protocols just fine!

trace options

Of course, as a light weight alternative: you may opt to use the –trace or –trace-ascii options with the curl tool and be fully satisfied with that. Using those command line options, curl will log everything sent and received in the protocol layer without the TLS applied. With HTTPS you’ll see all the HTTP traffic for example.

Credits

Most of the curl work to enable this feature was done by Peter Wu and Ray Satiro.

One URL standard please

Following up on the problem with our current lack of a universal URL standard that I blogged about in May 2016: My URL isn’t your URL. I want a single, unified URL standard that we would all stand behind, support and adhere to.

What triggers me this time, is yet another issue. A friendly curl user sent me this URL:

http://user@example.com:80@daniel.haxx.se

… and pasting this URL into different tools and browsers show that there’s not a wide agreement on how this should work. Is the URL legal in the first place and if so, which host should a client contact?

  • curl treats the ‘@’-character as a separator between userinfo and host name so ‘example.com’ becomes the host name, the port number is 80 followed by rubbish that curl ignores. (wget2, the next-gen wget that’s in development works identically)
  • wget extracts the example.com host name but rejects the port number due to the rubbish after the zero.
  • Edge and Safari say the URL is invalid and don’t go anywhere
  • Firefox and Chrome allow ‘@’ as part of the userinfo, take the ’80’ as a password and the host name then becomes ‘daniel.haxx.se’

The only somewhat modern “spec” for URLs is the WHATWG URL specification. The other major, but now somewhat aged, URL spec is RFC 3986, made by the IETF and published in 2005.

In 2015, URL problem statement and directions was published as an Internet-draft by Masinter and Ruby and it brings up most of the current URL spec problems. Some of them are also discussed in Ruby’s WHATWG URL vs IETF URI post from 2014.

What I would like to see happen…

Which group? A group!

Friends I know in the WHATWG suggest that I should dig in there and help them improve their spec. That would be a good idea if fixing the WHATWG spec would be the ultimate goal. I don’t think it is enough.

The WHATWG is highly browser focused and my interactions with members of that group that I have had in the past, have shown that there is little sympathy there for non-browsers who want to deal with URLs and there is even less sympathy or interest for URL schemes that the popular browsers don’t even support or care about. URLs cover much more than HTTP(S).

I have the feeling that WHATWG people would not like this work to be done within the IETF and vice versa. Since I’d like buy-in from both camps, and any other camps that might have an interest in URLs, this would need to be handled somehow.

It would also be great to get other major URL “consumers” on board, like authors of popular URL parsing libraries, tools and components.

Such a URL group would of course have to agree on the goal and how to get there, but I’ll still provide some additional things I want to see.

Update: I want to emphasize that I do not consider the WHATWG’s job bad, wrong or lost. I think they’ve done a great job at unifying browsers’ treatment of URLs. I don’t mean to belittle that. I just know that this group is only a small subset of the people who probably should be involved in a unified URL standard.

A single fixed spec

I can’t see any compelling reasons why a URL specification couldn’t reach a stable state and get published as *the* URL standard. The “living standard” approach may be fine for certain things (and in particular browsers that update every six weeks), but URLs are supposed to be long-lived and inter-operate far into the future so they really really should not change. Therefore, I think the IETF documentation model could work well for this.

The WHATWG spec documents what browsers do, and browsers do what is documented. At least that’s the theory I’ve been told, and it causes a spinning and never-ending loop that goes against my wish.

Document the format

The WHATWG specification is written in a pseudo code style, describing how a parser would “walk” over the string with a state machine and all. I know some people like that, I find it utterly annoying and really hard to figure out what’s allowed or not. I much more prefer the regular RFC style of describing protocol syntax.

IDNA

Can we please just say that host names in URLs should be handled according to IDNA2008 (RFC 5895)? WHATWG URL doesn’t state any IDNA spec number at all.

Move out irrelevant sections

“Irrelevant” when it comes to documenting the URL format that is. The WHATWG details several things that are related to URL for browsers but are mostly irrelevant to other URL consumers or producers. Like section “5. application/x-www-form-urlencoded” and “6. API”.

They would be better placed in a “URL considerations for browsers” companion document.

Working doesn’t imply sensible

So browsers accept URLs written with thousands of forward slashes instead of two. That is not a good reason for the spec to say that a URL may legitimately contain a thousand slashes. I’m totally convinced there’s no critical content anywhere using such formatted URLs and no soul will be sad if we’d restricted the number to a single-digit. So we should. And yeah, then browsers should reject URLs using more.

The slashes are only an example. The browsers have used a “liberal in what you accept” policy for a lot of things since forever, but we must resist to use that as a basis when nailing down a standard.

The odds of this happening soon?

I know there are individuals interested in seeing the URL situation getting worked on. We’ve seen articles and internet-drafts posted on the issue several times the last few years. Any year now I think we will see some movement for real trying to fix this. I hope I will manage to participate and contribute a little from my end.

Lesser HTTPS for non-browsers

An HTTPS client needs to do a whole lot of checks to make sure that the remote host is fine to communicate with to maintain the proper high security levels.

In this blog post, I will explain why and how the entire HTTPS ecosystem relies on the browsers to be good and strict and thanks to that, the rest of the HTTPS clients can get away with being much more lenient. And in fact that is good, because the browsers don’t help the rest of the ecosystem very much to do good verification at that same level.

Let me me illustrate with some examples.

CA certs

The server’s certificate must have been signed by a trusted CA (Certificate Authority). A client then needs the certificates from all the CAs that are trusted. Who’s a trusted CA and how would a client get their certs to use for verification?

You can say that you trust the same set of CAs that your operating system vendor trusts (which I’ve always thought is a bit of a stretch but hey, I can very well understand the convenience in this). If you want to do this as an HTTPS client you need to use native APIs in Windows or macOS, or you need to figure out where the cert bundle is stored if you’re using Linux.

If you’re not using the native libraries on windows and macOS or if you can’t find the bundle in your Linux distribution, or you’re in one of a large amount of other setups where you can’t use someone else’s bundle, then you need to gather this list by yourself.

How on earth would you gather a list of hundreds of CA certs that are used for the popular web sites on the net of today? Stand on someone else’s shoulders and use what they’ve done? Yeps, and conveniently enough Mozilla has such a bundle that is licensed to allow others to use it…

Mozilla doesn’t offer the set of CA certs in a format that anyone else can use really, which is the primary reason why we offer Mozilla’s cert bundle converted to PEM format on the curl web site. The other parties that collect CA certs at scale (Microsoft for Windows, Apple for macOS, etc) do even less.

Before you ask, Google doesn’t maintain their own list for Chrome. They piggyback the CA store provided on the operating system it runs on. (Update: Google maintains its own list for Android/Chrome OS.)

Further constraints

But the browsers, including Firefox, Chrome, Edge and Safari all add additional constraints beyond that CA cert store, on what server certificates they consider to be fine and okay. They blacklist specific fingerprints, they set a last allowed date for certain CA providers to offer certificates for servers and more.

These additional constraints, or additional rules if you want, are never exported nor exposed to the world in ways that are easy for anyone to mimic (in other ways than that everyone of course can implement the same code logic in their ends). They’re done in code and they’re really hard for anyone not a browser to implement and keep up with.

This makes every non-browser HTTPS client susceptible to okaying certificates that have already been deemed not OK by security experts at the browser vendors. And in comparison, not many HTTPS clients can compare or stack up the amount of client-side TLS and security expertise that the browser developers can.

HSTS preload

HTTP Strict Transfer Security is a way for sites to tell clients that they are to be accessed over HTTPS only for a specified time into the future, and plain HTTP should then not be used for the duration of this rule. This setup removes the Man-In-The-Middle (MITM) risk on subsequent accesses for sites that may still get linked to via HTTP:// URLs or by users entering the web site names directly into the address bars and so on.

The browsers have a “HSTS preload list” which is a list of sites that people have submitted and they are HSTS sites that basically never time out and always will be accessed over HTTPS only. Forever. No risk for MITM even in the first access to these sites.

There are no such HSTS preload lists being provided for non-browser HTTPS clients so there’s no easy way for non-browsers to avoid the first access MITM even for these class of forever-on-HTTPS sites.

Update: The Chromium HSTS preload list is available in a JSON format.

SHA-1

I’m sure you’ve heard about the deprecation of SHA-1 as a certificate hashing algorithm and how the browsers won’t accept server certificates using this starting at some cut off date.

I’m not aware of any non-browser HTTPS client that makes this check. For services, API providers and others don’t serve “normal browsers” they can all continue to play SHA-1 certificates well into 2017 without tears or pain. Another ecosystem detail we rely on the browsers to fix for us since most of these providers want to work with browsers as well…

This isn’t really something that is magic or would be terribly hard for non-browsers to do, its just that it will make some users suddenly get errors for their otherwise working setups and that takes a firm attitude from the software provider that is hard to maintain. And you’d have to introduce your own cut-off date that you’d have to fight with your users about! 😉

TLS is hard to get right

TLS and HTTPS are full of tricky areas and dusty corners that are hard to get right. The more we can share tricks and rules the better it is for everyone.

I think the browser vendors could do much better to help the rest of the ecosystem. By making their meta data available to us in sensible formats mostly. For the good of the Internet.

Disclaimer

Yes I work for Mozilla which makes Firefox. A vendor and a browser that I write about above. I’ve been communicating internally about some of these issues already, but I’m otherwise not involved in those parts of Firefox.

HTTPS proxy with curl

Starting in version 7.52.0 (due to ship December 21, 2016), curl will support HTTPS proxies when doing network transfers, and by doing this it joins the small exclusive club of HTTP user-agents consisting of Firefox, Chrome and not too many others.

Yes you read this correctly. This is different than the good old HTTP proxy.

HTTPS proxy means that the client establishes a TLS connection to the proxy and then communicates over that, which is different to the normal and traditional HTTP proxy approach where the clients speak plain HTTP to the proxy.

Talking HTTPS to your proxy is a privacy improvement as it prevents people from snooping on your proxy communication. Even when using HTTPS over a standard HTTP proxy, there’s typically a setting up phase first that leaks information about where the connection is being made, user credentials and more. Not to mention that an HTTPS proxy makes HTTP traffic “safe” to and from the proxy. HTTPS to the proxy also enables clients to speak HTTP/2 more easily with proxies. (Even though HTTP/2 to the proxy is not yet supported in curl.)

In the case where a client wants to talk HTTPS to a remote server, when using a HTTPS proxy, it sends HTTPS through HTTPS.

Illustrating this concept with images. When using a traditional HTTP proxy, we connect initially to the proxy with HTTP in the clear, and then from then on the HTTPS makes it safe:

HTTP proxyto compare with the HTTPS proxy case where the connection is safe already in the first step:

HTTPS proxyThe access to the proxy is made over network A. That network has traditionally been a corporate network or within a LAN or something but we’re seeing more and more use cases where the proxy is somewhere on the Internet and then “Network A” is really huge. That includes use cases where the proxy for example compresses images or otherwise reduces bandwidth requirements.

Actual HTTPS connections from clients to servers are still done end to end encrypted even in the HTTP proxy case. HTTP traffic to and from the user to the web site however, will still be HTTPS protected to the proxy when a HTTPS proxy is used.

A complicated pull request

This awesome work was provided by Dmitry Kurochkin, Vasy Okhin, and Alex Rousskov. It was merged into master on November 24 in this commit.

Doing this sort of major change in the TLS area in curl code is a massive undertaking, much so because of the fact that curl supports getting built with one out of 11 or 12 different TLS libraries. Several of those are also system-specific so hardly any single developer can even build all these backends on his or hers own machines.

In addition to the TLS backend maze, curl and library also offers a huge amount of different options to control the TLS connection and handling. You can switch on and off features, provide certificates, CA bundles and more. Adding another layer of TLS pretty much doubles the amount of options since now you can tweak everything both in the TLS connection to the proxy as well as the one to the remote peer.

This new feature is supported with the OpenSSL, GnuTLS and NSS backends to start with.

Consider it experimental for now

By all means, go ahead and use it and torture the code and file issues for everything bad you see, but I think we make ourselves a service by considering this new feature set to be a bit experimental in this release.

New options

There’s a whole forest of new command line and libcurl options to control all the various aspects of the new TLS connection this introduces. Since it is a totally separate connection it gets a whole set of options that are basically identical to the server connection but with a –proxy prefix instead. Here’s a list:

  --proxy-cacert 
  --proxy-capath
  --proxy-cert
  --proxy-cert-type
  --proxy-ciphers
  --proxy-crlfile
  --proxy-insecure
  --proxy-key
  --proxy-key-type
  --proxy-pass
  --proxy-ssl-allow-beast
  --proxy-sslv2
  --proxy-sslv3
  --proxy-tlsv1
  --proxy-tlsuser
  --proxy-tlspassword
  --proxy-tlsauthtype

curl and TLS 1.3

Draft 18 of the TLS version 1.3 spec was publiSSL padlockshed at the end of October 2016.

Already now, both Firefox and Chrome have test versions out with TLS 1.3 enabled. Firefox 52 will have it by default, and while Chrome will ship it, I couldn’t figure out exactly when we can expect it to be there by default.

Over the last few days we’ve merged TLS 1.3 support to curl, primarily in this commit by Kamil Dudka. Both the command line tool and libcurl will negotiate TLS 1.3 in the next version (7.52.0 – planned release date at the end of December 2016) if built with a TLS library that supports it and told to do it by the user.

The two current TLS libraries that will speak TLS 1.3 when built with curl right now is NSS and BoringSSL. The plan is to gradually adjust curl over time as the other libraries start to support 1.3 as well. As always we will appreciate your help in making this happen!

Right now, there’s also the minor flux in that servers and clients may end up running implementations of different draft versions of the TLS spec which contributes to a layer of extra fun!

Three TLS current 1.3 test servers to play with: https://enabled.tls13.com/ , https://www.allizom.org/ and https://tls13.crypto.mozilla.org/. I doubt any of these will give you any guarantees of functionality.

TLS 1.3 offers a few new features that allow clients such as curl to do subsequent TLS connections much faster, with only 1 or even 0 RTTs, but curl has no code for any of those features yet.