When Mac OS X first launched they did so without an existing poll function. They later added poll() in Mac OS X 10.3, but we quickly discovered that it was broken (it returned a non-zero value when asked to wait for nothing) so in the curl project we added a check in configure for that and subsequently avoided using poll() in all OS X versions to and including Mac OS 10.8 (Darwin 12). The code would instead switch to the alternative solution based on select() for these platforms.
With the release of Mac OS X 10.9 “Mavericks” in October 2013, Apple had fixed their poll() implementation and we’ve built libcurl to use it since with no issues at all. The configure script picks the correct underlying function to use.
Enter macOS 10.12 (yeah, its not called OS X anymore) “Sierra”, released in September 2016. Quickly we discovered that poll() once against did not act like it should and we are back to disabling the use of it in preference to the backup solution using select().
The new error looks similar to the old problem: when there’s nothing to wait for and we ask poll() to wait N milliseconds, the 10.12 version of poll() returns immediately without waiting. Causing busy-loops. The problem has been reported to Apple and its Radar number is 28372390. (There has been no news from them on how they plan to act on this.)
If none of the defined events have occurred on any selected file descriptor, poll() waits at least timeout milliseconds for an event to occur on any of the selected file descriptors.
We pushed a configure check for this in curl, to be part of the upcoming 7.51.0 release. I’ll also show you a small snippet you can use stand-alone below.
Apple is hardly alone in the broken-poll department. Remember how Windows’ WSApoll is broken?
Here’s a little code snippet that can detect the 10.12 breakage:
This poll bug has been confirmed fixed in the macOS 10.12.2 update (released on December 13, 2016), but I’ve found no official mention or statement about this fact.
Chevrolet Traverse 2018 uses curl according to its owners manual on page 403. It is mentioned almost identically in other Chevrolet model manuals such as for the Corvette, the 2018 Camaro, the 2018 TRAX, the 2013 VOLT, the 2014 Express and the 2017 BOLT.
The curl license is also in owner manuals for other brands and models such as in the GMC Savana, Cadillac CT6 2016, Opel Zafira, Opel Insignia, Opel Astra, Opel Karl, Opel Cascada, Opel Mokka, Opel Ampera, Vauxhall Astra … (See 100 million cars run curl).
The Onkyo TX-NR609 AV-Receiver uses libcurl as shown by the license in its manual. (Thanks to Marc Hörsken)
Over time, I’ve reluctantly come to terms with the fact that a lot of questions and answers about curl is not done on the mailing lists we have setup in the project itself.
A primary such external site with curl related questions is of course stackoverflow – hardly news to programmers of today. The questions tagged with curl is of course only a very tiny fraction of the vast amount of questions and answers that accumulate on that busy site.
The pile of questions tagged with curl on stackoverflow has just surpassed the staggering number of 25,000. Of course, these questions involve persons who ask about particular curl behaviors (and a large portion is about PHP/CURL) but there’s also a significant amount of tags for questions where curl is only used to do something and that other something is actually what the question is about. And ‘libcurl’ is used as a separate tag and is often used independently of the ‘curl’ one. libcurl is tagged on almost 2,000 questions.
But still. 25,000 questions. Wow.
I visit that site every so often and answer to some questions but I often end up feeling a great “distance” between me and questions there, and I have a hard time to bridge that gap. Also, stackoverflow the site and the format isn’t really suitable for debugging or solving problems within curl so I often end up trying to get the user move over to file an issue on curl’s github page or discuss the curl problem on a mailing list instead. Forums more suitable for plenty of back-and-forth before the solution or fix is figured out.
Now, any bets for how long it takes until we reach 100K questions?
… out of the top ten million sites that is. So there’s at least that many, quite likely a few more.
This is according to w3techs who runs checks daily. Over the last few months, there have been about 50,000 new sites per month switching it on.
It also shows that the HTTP/2 ratio has increased from a little over 1% deployment a year ago to the 10% today.
HTTP/2 gets more used the more popular site it is. Among the top 1,000 sites on the web, more than 20% of them use HTTP/2. HTTP/2 also just recently (September 9) overcame SPDY among the top-1000 most popular sites.
On September 7, Amazon announced their CloudFront service having enabled HTTP/2, which could explain an adoption boost over the last few days. New CloudFront users get it enabled by default but existing users actually need to go in and click a checkbox to make it happen.
As the web traffic of the world is severely skewed toward the top ones, we can be sure that a significantly larger share than 10% of the world’s HTTPS traffic is using version 2.
Recent usage stats in Firefox shows that HTTP/2 is used in half of all its HTTPS requests!
To spread the word, to show off the logo, to share the love, to boost the brand, to allow us to fill up our own and our friend’s laptop covers I ordered a set of curl stickers to hand out to friends and fans whenever I meet any going forward. They arrived today, and I thought I’d give you a look. (You can still purchase your own set of curl stickers from unixstickers.com)
During the autumn 1996 I took my first swim in the ocean known as HTTP. Twenty years ago now.
I had previously worked with writing an IRC bot in C, and IRC is a pretty simple text based protocol over TCP so I could use some experiences from that when I started to look into HTTP. That IRC bot was my first real application distributed to the world that was using TCP/IP. It was portable to most unixes and Amiga and it was open source.
I decided I should spice up the bot and make it offer a currency exchange rate service so that people who were chatting could ask the bot what 200 SEK is when converted to USD or what 50 AUD might be in DEM. – Right, there was no Euro currency yet back then!
I simply had to fetch the currency rates at a regular interval and keep them in the same server that ran the bot. I just needed a little tool to download the rates over HTTP. How hard can that be? I googled around (this was before Google existed so that was not the search engine I could use!) and found a tool named ‘httpget’ that made pretty much what I wanted. It truly was tiny – a few hundred lines of code.
I don’t have an exact date saved or recorded for when this happened, only the general time frame. You know, we had no smart phones, no Google calendar and no digital cameras. I sported my first mobile phone back then, the sexy Nokia 1610 – viewed in the picture on the right here.
The HTTP/1.0 RFC had just recently came out – which was the first ever real spec published for HTTP. RFC 1945 was published in May 1996, but I was blissfully unaware of the youth of the standard and I plunged into my little project. This was the first published HTTP spec and it says:
HTTP has been in use by the World-Wide Web global information initiative since 1990. This specification reflects common usage of the protocol referred too as "HTTP/1.0". This specification describes the features that seem to be consistently implemented in most HTTP/1.0 clients and servers.
Many years after that point in time, I have learned that already at this time when I first searched for a HTTP tool to use, wget already existed. I can’t recall that I found that in my searches, and if I had found it maybe history would’ve made a different turn for me. Or maybe I found it and discarded for a reason I can’t remember now.
I wasn’t the original author of httpget; Rafael Sagula was. But I started contributing fixes and changes and soon I was the maintainer of it. Unfortunately I’ve lost my emails and source code history from those earliest years so I cannot easily show my first steps. Even the oldest changelogs show that we very soon got help and contributions from users.
The earliest saved code archive I have from those days, is from after we had added support for Gopher and FTP and renamed the tool ‘urlget’. urlget-3.5.zip was released on January 20 1998 which thus was more than a year later my involvement in httpget started.
The original httpget/urlget/curl code was stored in CVS and it was licensed under the GPL. I did most of the early development on SunOS and Solaris machines as my first experiments with Linux didn’t start until 97/98 something.
The first web page I know we have saved on archive.org is from December 1998 and by then the project had been renamed to curl already. Roughly two years after the start of the journey.
RFC 2068 was the first HTTP/1.1 spec. It was released already in January 1997, so not that long after the 1.0 spec shipped. In our project however we stuck with doing HTTP 1.0 for a few years longer and it wasn’t until February 2001 we first started doing HTTP/1.1 requests. First shipped in curl 7.7. By then the follow-up spec to HTTP/1.1, RFC 2616, had already been published as well.
The IETF working group called HTTPbis was started in 2007 to once again refresh the HTTP/1.1 spec, but it took me a while until someone pointed out this to me and I realized that I too could join in there and do my part. Up until this point, I had not really considered that little me could actually participate in the protocol doings and bring my views and ideas to the table. At this point, I learned about IETF and how it works.
I posted my first emails on that list in the spring 2008. The 75th IETF meeting in the summer of 2009 was held in Stockholm, so for me still working on HTTP only as a spare time project it was very fortunate and good timing. I could meet a lot of my HTTP heroes and HTTPbis participants in real life for the first time.
I have participated in the HTTPbis group ever since then, trying to uphold the views and standpoints of a command line tool and HTTP library – which often is not the same as the web browsers representatives’ way of looking at things. Since I was employed by Mozilla in 2014, I am of course now also in the “web browser camp” to some extent, but I remain a protocol puritan as curl remains my first “child”.
I’m employed by Mozilla. The same Mozilla that recently has announced that it is looking around for feedback on how to revamp its logo and graphical image.
It was with amusement I saw one of the existing suggestions for a new logo by using “://” (colon slash slash) the name:
… compared with the recently announced new curl logo:
Me being in both teams and being a general Internet protocol enthusiast I couldn’t be more happy if Mozilla would end up using a design so clearly based on the same underlying thoughts. After all, Imitation is the sincerest of flattery as Charles Caleb Colton once so eloquently expressed it.
PowerShell is a spiced up command line shell made by Microsoft. According to some people, it is a really useful and good shell alternative.
Already a long time ago, we got bug reports from confused users who couldn’t use curl from their PowerShell prompts and it didn’t take long until we figured out that Microsoft had added aliases for both curl and wget. The alias had the shell instead invoke its own command called “Invoke-WebRequest” whenever curl or wget was entered. Invoke-WebRequest being PowerShell’s own version of a command line tool for fiddling with URLs.
Invoke-WebRequest is of course not anywhere near similar to neither curl nor wget and it doesn’t support any of the command line options or anything. The aliases really don’t help users. No user who would want the actual curl or wget is helped by these aliases, and users who don’t know about the real curl and wget won’t use the aliases. They were and remain pointless. But they’ve remained a thorn in my side ever since. Me knowing that they are there and confusing users every now and then – not me personally, since I’m not really a Windows guy.
It took 34 minutes for them to close the pull request:
“Those aliases have existed for multiple releases, so removing them would be a breaking change.”
To be honest, I didn’t expect them to merge it easily. I figure they added those aliases for a reason back in the day and it seems unlikely that I as an outsider would just make them change that decision just like this out of the blue.
But the story didn’t end there. Obviously more Microsoft people gave the PR some attention and more comments were added. Like this:
“You bring up a great point. We added a number of aliases for Unix commands but if someone has installed those commands on WIndows, those aliases screw them up.
We need to fix this.”
So, maybe it will trigger a change anyway? The story is ongoing…
Section 9.1.1 in RFC7540 explains how HTTP/2 clients can reuse connections. This is my lengthy way of explaining how this works in reality.
Many connections in HTTP/1
With HTTP/1.1, browsers are typically using 6 connections per origin (host name + port). They do this to overcome the problems in HTTP/1 and how it uses TCP – as each connection will do a fair amount of waiting. Plus each connection is slow at start and therefore limited to how much data you can get and send quickly, you multiply that data amount with each additional connection. This makes the browser get more data faster (than just using one connection).
Add sharding
Web sites with many objects also regularly invent new host names to trigger browsers to use even more connections. A practice known as “sharding”. 6 connections for each name. So if you instead make your site use 4 host names you suddenly get 4 x 6 = 24 connections instead. Mostly all those host names resolve to the same IP address in the end anyway, or the same set of IP addresses. In reality, some sites use many more than just 4 host names.
The sad reality is that a very large percentage of connections used for HTTP/1.1 are only ever used for a single HTTP request, and a very large share of the connections made for HTTP/1 are so short-lived they actually never leave the slow start period before they’re killed off again. Not really ideal.
One connection in HTTP/2
With the introduction of HTTP/2, the HTTP clients of the world are going toward using a single TCP connection for each origin. The idea being that one connection is better in packet loss scenarios, it makes priorities/dependencies work and reusing that single connections for many more requests will be a net gain. And as you remember, HTTP/2 allows many logical streams in parallel over that single connection so the single connection doesn’t limit what the browsers can ask for.
Unsharding
The sites that created all those additional host names to make the HTTP/1 browsers use many connections now work against the HTTP/2 browsers’ desire to decrease the number of connections to a single one. Sites don’t want to switch back to using a single host name because that would be a significant architectural change and there are still a fair number of HTTP/1-only browsers still in use.
Enter “connection coalescing”, or “unsharding” as we sometimes like to call it. You won’t find either term used in RFC7540, as it merely describes this concept in terms of connection reuse.
Connection coalescing means that the browser tries to determine which of the remote hosts that it can reach over the same TCP connection. The different browsers have slightly different heuristics here and some don’t do it at all, but let me try to explain how they work – as far as I know and at this point in time.
Coalescing by example
Let’s say that this cool imaginary site “example.com” has two name entries in DNS: A.example.com and B.example.com. When resolving those names over DNS, the client gets a list of IP address back for each name. A list that very well may contain a mix of IPv4 and IPv6 addresses. One list for each name.
You must also remember that HTTP/2 is also only ever used over HTTPS by browsers, so for each origin speaking HTTP/2 there’s also a corresponding server certificate with a list of names or a wildcard pattern for which that server is authorized to respond for.
In our example we start out by connecting the browser to A. Let’s say resolving A returns the IPs 192.168.0.1 and 192.168.0.2 from DNS, so the browser goes on and connects to the first of those addresses, the one ending with “1”. The browser gets the server cert back in the TLS handshake and as a result of that, it also gets a list of host names the server can deal with: A.example.com and B.example.com. (it could also be a wildcard like “*.example.com”)
If the browser then wants to connect to B, it’ll resolve that host name too to a list of IPs. Let’s say 192.168.0.2 and 192.168.0.3 here.
Host A: 192.168.0.1 and 192.168.0.2
Host B: 192.168.0.2 and 192.168.0.3
Now hold it. Here it comes.
The Firefox way
Host A has two addresses, host B has two addresses. The lists of addresses are not the same, but there is an overlap – both lists contain 192.168.0.2. And the host A has already stated that it is authoritative for B as well. In this situation, Firefox will not make a second connect to host B. It will reuse the connection to host A and ask for host B’s content over that single shared connection. This is the most aggressive coalescing method in use.
The Chrome way
Chrome features a slightly less aggressive coalescing. In the example above, when the browser has connected to 192.168.0.1 for the first host name, Chrome will require that the IPs for host B contains that specific IP for it to reuse that connection. If the returned IPs for host B really are 192.168.0.2 and 192.168.0.3, it clearly doesn’t contain 192.168.0.1 and so Chrome will create a new connection to host B.
Chrome will reuse the connection to host A if resolving host B returns a list that contains the specific IP of the connection host A is already using.
The Edge and Safari ways
They don’t do coalescing at all, so each host name will get its own single connection. Better than the 6 connections from HTTP/1 but for very sharded sites that means a lot of connections even in the HTTP/2 case.
curl also doesn’t coalesce anything (yet).
Surprises and a way to mitigate them
Given some comments in the Firefox bugzilla, the aggressive coalescing sometimes causes some surprises. Especially when you have for example one IPv6-only host A and a second host B with both IPv4 and IPv6 addresses. Asking for data on host A can then still use IPv4 when it reuses a connection to B (assuming that host A covers host B in its cert).
In the rare case where a server gets a resource request for an authority (or scheme) it can’t serve, there’s a dedicated error code 421 in HTTP/2 that it can respond with and the browser can then go back and retry that request on another connection.
Starts out with 6 anyway
Before the browser knows that the server speaks HTTP/2, it may fire up 6 connection attempts so that it is prepared to get the remote site at full speed. Once it figures out that it doesn’t need all those connections, it will kill off the unnecessary unused ones and over time trickle down to one. Of course, on subsequent connections to the same origin the client may have the version information cached so that it doesn’t have to start off presuming HTTP/1.
curl has been shipped by default on Mac OS X since many years – I actually couldn’t even manage to figure out exactly how many. It is built and bundled with the operating system by Apple itself and on Apple’s own terms and even though I’m the main curl developer I’ve never discussed this with them or even been asked or told about their plans. I’m not complaining, our license allows this and I’m nothing but happy with them shipping curl to millions of Mac users.
Leaving OpenSSL
Originally, curl on Mac was built against OpenSSL for the TLS and SSL support, but over time our friends at Apple have switched more and more of their software over to use their own TLS and crypto library Secure Transport instead of OpenSSL. A while ago Apple started bundling curl built to use the native mac TLS library instead of OpenSSL.
As you may know, when you build curl you can select from eleven different TLS libraries and one of them of course is Secure Transport. Support for this TLS back-end in curl was written by curl hackers, but it apparently got to a quality level good enough for Apple to decide to build curl with this back-end and ship it like that.
The Secure Transport back-end is rather capable and generally doesn’t cause many reasons for concern. There’s however one notable little glitch that people keep asking me about…
curl doesn’t support HTTP/2 on mac!
There are two obvious reasons why not, and they are:
1. No ALPN with Secure Transport
Secure Transport doesn’t offer any public API to enable HTTP/2 with ALPN when speaking HTTPS. Sure, we know Apple supports HTTP/2 already in several other aspects in their ecosystem and we can check their open code so we know there’s support for HTTP/2 and ALPN. There’s just no official APIs for us to use to switch it on!
So, if you insist on building curl to use Secure Transport instead of one of the many alternatives that actually support ALPN just fine, then you can’t negotiate HTTP/2 over TLS!
2. No nghttp2 with Mac OS
Even without ALPN support, you could actually still negotiate HTTP/2 over plain text TCP connections if you have a server that supports it. But even then curl depends on the awesome nghttp2 library to provide the frame level protocol encoding/decoding and more. If Apple would decide to enable HTTP/2 support for curl on Mac OS, they need to build it against nghttp2. I really think they should.
Homebrew and friends to the rescue!
Correct. You can still install your own separate curl binary (and libcurl library) from other sources, like for example Homebrew or Macports and they do offer versions built against other TLS back-ends and nghttp2 and then of course HTTP/2 works just fine with curl on mac.
Did I file a bug with Apple?
No, but I know for certain that there has been a bug report filed by someone else. Unfortunately it isn’t public so I can’t link nor browse it.