curl_multi_wakeup() is a new friend in the libcurl API family. It will show up for the first time in the upcoming 7.68.0 release, scheduled to happen on January 8th 2020.
Sleeping is important
One of the core functionalities in libcurl is the ability to do multiple parallel transfers in the same thread. You then create and add a number of transfers to a multi handle. Anyway, I won’t explain the entire API here but the gist of where I’m going with this is that you’ll most likely sooner or later end up calling the curl_multi_poll() function which asks libcurl to wait for activity on any of the involved transfers – or sleep and don’t return for the next N milliseconds.
Calling this waiting function (or using the older curl_multi_wait() or even doing a select() or poll() call “manually”) is crucial for a well-behaving program. It is important to let the code go to sleep like this when there’s nothing to do and have the system wake up it up again when it needs to do work. Failing to do this correctly, risk having libcurl instead busy-loop somewhere and that can make your application use 100% CPU during periods. That’s terribly unnecessary and bad for multiple reasons.
Wakey wakey
When your application calls libcurl to say “sleep for a second or until something happens on these N transfers” and something happens and the application for example needs to shut down immediately, users have been asking for a way to do a wake up call.
– Hey libcurl, wake up and return early from the poll function!
You could achieve this previously as well, but then it required you to write quite a lot of extra code, plus it would have to be done carefully if you wanted it to work cross-platform etc. Now, libcurl will provide this utility function for you out of the box!
curl_multi_wakeup()
This function explicitly makes a curl_multi_poll() function return immediately. It is designed to be possible to use from a different thread. You will love it!
curl_multi_poll()
This is the only call that can be woken up like this. Old timers may recognize that this is also a fairly new function call. We introduced it in 7.66.0 back in September 2019. This function is very similar to the older curl_multi_wait() function but has some minor behavior differences that also allow us to introduce this new wakeup ability.
Credits
This function was brought to us by the awesome Gergely Nagy.
The ETag HTTP response header is an identifier for a specific version of a resource. It lets caches be more efficient and save bandwidth, as a web server does not need to resend a full response if the content has not changed. Additionally, etags help prevent simultaneous updates of a resource from overwriting each other (“mid-air collisions”).
In short, a server can include this header when it responds with a resource, and in subsequent requests when a client wants to get an updated version of that document it sends back the same ETag and says “please give me a new version if it doesn’t match this ETag anymore”. The server will then respond with a 304 if there’s nothing new to return.
It is a better way than modification time stamp to identify a specific resource version on the server.
ETag options
Starting in version 7.68.0 (due to ship on January 8th, 2020), curl will feature two new command line options that makes it easier for users to take advantage of these features. You can of course try it out already now if you build from git or get a daily snapshot.
The ETag options are perfect for situations like when you run a curl command in a cron job to update a file if it has indeed changed on the server.
--etag-save <filename>
Issue the curl command and if there’s an ETag coming back in the response, it gets saved in the given file.
--etag-compare <filename>
Load a previously stored ETag from the given file and include that in the outgoing request (the file should only consist of the specific ETag “value” and nothing else). If the server deems that the resource hasn’t changed, this will result in a 304 response code. If it has changed, the new content will be returned.
Update the file if newer than previously stored ETag:
The other method to accomplish something similar is the -z (or --time-cond) option. It has been supported by curl since the early days. Using this, you give curl a specific time that will be used in a conditional request. “only respond with content if the document is newer than this”:
curl -z "5 dec 2019" https:/example.com
You can also do the inversion of the condition and ask for the file to get delivered only if it is older than a certain time (by prefixing the date with a dash):
curl -z "-1 jan 2003" https://example.com
Perhaps this is most conveniently used when you let curl get the time from a file. Typically you’d use the same file that you’ve saved from a previous invocation and now you want to update if the file is newer on the remote site:
It could be noted that these new features are built entirely in the curl tool by using libcurl correctly with the already provided API, so this change is done entirely outside of the library.
Credits
The idea for the ETag goodness came from Paul Hoffman. The implementation was brought by Maros Priputen – as his first contribution to curl! Thanks!
By late 2019, there’s an estimated amount of ten billion curl installations in the world. Of course this is a rough estimate and depends on how you count etc.
There are several billion mobile phones and tablets and a large share of those have multiple installations of curl. Then there all the Windows 10 machines, web sites, all macs, hundreds of millions of cars, possibly a billion or so games, maybe half a billion TVs, games consoles and more.
How much data are they transferring?
In the high end of volume users, we have at least two that I know of are doing around one million requests/sec on average (and I’m not even sure they are the top users, they just happen to be users I know do high volumes) but in the low end there will certainly be a huge amount of installations that barely ever do any requests at all.
If there are two users that I know are doing one million requests/sec, chances are there are more and there might be a few doing more than a million and certainly many that do less but still many.
Among many of the named and sometimes high profiled apps and users I know use curl, I very rarely know exactly for what purpose they use curl. Also, some use curl to do very many small requests and some will use it to do a few but very large transfers.
Additionally, and this really complicates the ability to do any good estimates, I suppose a number of curl users are doing transfers that aren’t typically considered to be part of “the Internet”. Like when curl is used for doing HTTP requests for every single subway passenger passing ticket gates in the London underground, I don’t think they can be counted as Internet transfers even though they use internet protocols.
How much data are browsers driving?
According to some data, there is today around 4.388 billion “Internet users” (page 39) and the world wide average time spent “on the Internet” is said to be 6 hours 42 minutes (page 50). I think these numbers seem credible and reasonable.
According to broadbandchoices, an average hour of “web browsing” spends about 25MB. According to databox.com, an average visit to a web site is 2-3 minutes. httparchive.org says the median page needs 74 HTTP requests to render.
So what do users do with their 6 hours and 42 minutes “online time” and how much of it is spent in a browser? I’ve tried to find statistics for this but failed.
@chuttenc (of Mozilla) stepped up and helped me out with getting stats from Firefox users. Based on stats from users that used Firefox on the day of October 1, 2019 and actually used their browser that day, they did 2847 requests per client as median with the median download amount 18808 kilobytes. Of that single day of use.
I don’t have any particular reason to think that other browsers, other days or users of other browsers are very different than Firefox users of that single day. Let’s count with 3,000 requests and 20MB per day. Interestingly, that makes the average data size per request a mere 6.7 kilobytes.
A median desktop web page total size is 1939KB right now according to httparchive.org (and the mobile ones are just slightly smaller so the difference isn’t too important here).
Based on the median weight per site from httparchive, this would imply that a median browser user visits the equivalent of 15 typical sites per day (30MB/1.939MB).
If each user spends 3 minutes per site, that’s still just 45 minutes of browsing per day. Out of the 6 hours 42 minutes. 11% of Internet time is browser time.
3000 requests x 4388000000 internet users, makes 13,164,000,000,000 requests per day. That’s 13.1 trillion HTTP requests per day.
The world’s web users make about 152.4 million HTTP requests per second.
(I think this is counting too high because I find it unlikely that all internet users on the globe use their browsers this much every day.)
The equivalent math to figure out today’s daily data amounts transferred by browsers makes it 4388000000 x 30MB = 131,640,000,000 megabytes/day. 1,523,611 megabytes per second. 1.5 TB/sec.
30MB/day equals a little under one GB/month per person. Feels about right.
Back to curl usage
The curl users with the highest request frequencies known to me (*) are racing away at one million requests/second on average, but how many requests do the others actually do? It’s really impossible to say. Let’s play the guessing game!
First, it feels reasonable to assume that these two users that I know of are not alone in doing high frequency transfers with curl. Purely based on probability, it seems reasonable to assume that the top-20 something users together will issue at least 10 million requests/second.
Looking at the users that aren’t in that very top. Is it reasonable to assume that each such installed curl instance makes a request every 10 minutes on average? Maybe it’s one per every 100 minutes? Or is it 10 per minute? There are some extremely high volume and high frequency users but there’s definitely a very long tail of installations basically never doing anything… The grim truth is that we simply cannot know and there’s no way to even get a ballpark figure. We need to guess.
Let’s toy with the idea that every single curl instance on average makes a transfer, a request, every tenth minute. That makes 10 x 10^9 / 600 = 16.7 million transfers per second in addition to the top users’ ten million. Let’s say 26 million requests per second. The browsers of the world do 152 million per second.
If each of those curl requests transfer 50Kb of data (arbitrarily picked out of thin air because again we can’t reasonably find or calculate this number), they make up (26,000,000 x 50 ) 1.3 TB/sec. That’s 85% of the data volume all the browsers in the world transfer.
The world wide browser market share distribution according to statcounter.com is currently: Chrome at 64%, Safari at 16.3% and Firefox at 4.5%.
This simple-minded estimate would imply that maybe, perhaps, possibly, curl transfers more data an average day than any single individual browser flavor does. Combined, the browsers transfer more.
Guesses, really?
Sure, or call them estimates. I’m doing them to the best of my ability. If you have data, reasoning or evidence to back up modifications my numbers or calculations that you can provide, nobody would be happier than me! I will of course update this post if that happens!
(*) = I don’t name these users since I’ve been given glimpses of their usage statistics informally and I’ve been asked to not make their numbers public. I hold my promise by not revealing who they are.
Thanks
Thanks to chuttenc for the Firefox numbers, as mentioned above, and thanks also to Jan Wildeboer for helping me dig up stats links used in this post.
I’ve watched how my thirteen year old son goes about to acquire information about things online. I am astonished how he time and time again deliberately chooses to get it from a video on YouTube rather than trying to find the best written documentation for whatever he’s looking for. I just have to accept that some people, even some descendants in my own family tree, prefer video as a source of information. And I realize he’s not alone.
My intent is to record a series of short and fairly independent episodes, each detailing a specific libcurl area. A particular “thing”, feature, option or area of the APIs. Each episode is also thoroughly documented and all the source code seen on the video is available on the site so that viewers can either follow along while viewing, or go back to the code afterward as a reference. Or both!
I’ve done the four first episodes so far, and they range from five minutes to nineteen minutes a piece. I expect that it might take me a while to just complete the list of episodes I could come up with myself. I also hope and expect that readers and viewers will think of other areas that I could cover so the list of video episodes could easily expand over time.
Feedback
If you have comments on the episodes. If you have suggestion of what to improve or subjects to cover, head over to the libcurl-video-tutorials github page and file an issue or two!
Video setup
I use a Debian Linux installation to develop on. I figure it should be similar enough to many other systems.
Video wise, in each episode I show you my text editor for code, a terminal window for building the code, running what we build in the episode and also for looking up man page information etc. And a small image of myself. Behind those three squares, there’s a photo of a forest (taken by me).
I plan to make each episode use the same basic visuals.
In the initial “setup” episode I create a generic Makefile, which we can reuse in several (all?) other episodes to build the example code easily and swiftly. I’ve previously learned that people consider Makefiles difficult, or sometimes even magic, to work with so I wanted to get that out of the way from the start and then focus more directly on actual C code that uses libcurl.
Receive data episode
Here’s the “receive data” episode as an example of how this can look.
The first ever public release of curl was uploaded on March 20, 1998. 7924 days ago.
3.15 commits per day on average since inception.
These 25000 commits have been authored by 751 different persons.
Through the years, 47 of these 751 authors have ever authored 10 commits or more within a single year. In fact, the largest number of people that did 10 commits or more within a single year is 13 that happened in both 2014 and 2017.
19 of the 751 authors did ten or more changes in more than one calendar year. 5 of the authors have done ten or more changes during ten or more years.
I wrote a total of 14273 of the 25000 commits. 57%.
Hooray for all of us!
(Yes of course 25000 commits is a totally meaningless number in itself, it just happened to be round and nice and serves as a good opportunity to celebrate and reflect over things!)
There has been 56 days since curl 7.66.0 was released. Here comes 7.67.0!
This might not be a release with any significant bells or whistles that will make us recall this date in the future when looking back, but it is still another steady step along the way and thanks to the new things introduced, we still bump the minor version number. Enjoy!
If you need excellent commercial support for whatever you do with curl. Contact us at wolfSSL.
Numbers
the 186th release 3 changes 56 days (total: 7,901) 125 bug fixes (total: 5,472) 212 commits (total: 24,931) 1 new public libcurl function (total: 81) 0 new curl_easy_setopt() option (total: 269) 1 new curl command line option (total: 226) 68 contributors, 42 new (total: 2,056) 42 authors, 26 new (total: 744) 0 security fixes (total: 92) 0 USD paid in Bug Bounties
The 3 changes
Disable progress meter
Since virtually forever you’ve been able to tell curl to “shut up” with -s. The long version of that is --silent. Silent makes the curl tool disable the progress meter and all other verbose output.
Starting now, you can use --no-progress-meter, which in a more granular way only disables the progress meter and lets the other verbose outputs remain.
CURLMOPT_MAX_CONCURRENT_STREAMS
When doing HTTP/2 using curl and multiple streams over a single connection, you can now also set the number of parallel streams you’d like to use which will be communicated to the server. The idea is that this option should be possible to use for HTTP/3 as well going forward, but due to the early days there it doesn’t yet.
CURLU_NO_AUTHORITY
This is a new flag that the URL parser API supports. It informs the parser that even if it doesn’t recognize the URL scheme it should still allow it to not have an authority part (like host name).
Bug-fixes
Here are some interesting bug-fixes done for this release. Check out the changelog for the full list.
Winbuild build error
The winbuild setup to build with MSVC with nmake shipped in 7.66.0 with a flaw that made it fail. We had added the vssh directory but not adjusted these builds scripts for that. The fix was of course very simple.
We have since added several winbuild builds to the CI to make sure we catch these kinds of mistakes earlier and better in the future.
FTP: optimized CWD handling
At least two landed bug-fixes make curl avoid issuing superfluous CWD commands (FTP lingo for “cd” or change directory) thereby reducing latency.
HTTP/3
Several fixes improved HTTP/3 handling. It builds on Windows better, the ngtcp2 backend now also behaves correctly on macOS, the build instructions are clearer.
Mimics socketpair on Windows
Thanks to the new socketpair look-alike function, libcurl now provides a socket for the application to wait for even when doing name resolves in the dedicated resolver thread. This makes the Windows code work catch up with the similar change that landed in 7.66.0. This makes it easier for applications to behave correctly during the short time gaps when libcurl resolves a host name and nothing else is happening.
curl with lots of URLs
With the introduction of parallel transfers in 7.66.0, we changed how curl allocated handles and setup transfers ahead of time. This made command lines that for example would use [1-1000000] ranges create a million CURL handles and thus use a lot of memory.
It did in fact break a few existing use cases where people did very large ranges with curl. Starting now, curl will just create enough curl handles ahead of time to allow the maximum amount of parallelism requested and users should yet again be able to specify ranges with many million iterations.
curl -d@ was slow
It was discovered that if you ask curl to post data with -d @filename, that operation was unnecessary slow for large files and was sped up significantly.
DoH fixes
Several corrections were made after some initial fuzzing of the DoH code. A benign buffer overflow, a memory leak and more.
HTTP/2 fixes
We relaxed the :authority push promise checks, fixed two cases where libcurl could “forget” a stream after it had delivered all data and dup’ed HTTP/2 handles could issue dummy PRIORITY frames!
connect with ETIMEDOUT now makes CURLE_OPERATION_TIMEDOUT
When libcurl’s connect attempt fails and errno says ETIMEDOUT it means that the underlying TCP connect attempt timed out. This will now be reflected back in the libcurl API as the timed out error code instead of the previously used CURLE_COULDNT_CONNECT.
One of the use cases for this is curl’s --retry option which now considers this situation to be a timeout and thus will consider it fine to retry…
Parsing URL with fragment and question mark
There was a regression in the URL parser that made it mistreat URLs without a query part but with a question mark in the fragment.
In the afternoon of October 1st 2019, I had the pleasure of welcoming Linus Larsson and Jonas Lindkvist into my home in Huddinge, south of Stockholm, Sweden. My home is also my office as I work full-time from home. These two fine gentlemen work for Sweden’s largest morning newspaper, Dagens Nyheter, which boasts 850,000 daily readers.
Jonas took what felt like a hundred photos of me, most of them when I sit in my office chair at my regular desk where my primary development computers and environment are. As you can see in the two photos on this blog post. I will admit that I did minimize most of my regular Windows from the screens to that I wouldn’t accidentally reveal something personal or sensitive, but on the plus side is that if you pay close attention you can see my Simon Stålenhag desktop backgrounds better!
Me and Linus then sat down and talked. We talked about my background, how curl was created and how it has “taken off” to an extent I of course could never even dream about. Today, I estimate that curl runs in perhaps ten billion installations. A truly mind boggling – and humbling – number.
The interview/chat lasted for about an hour or so. I figured we had touched most relevant areas and Linus seemed content with the material and input he’d gotten from me. As this topic and article wasn’t really time sensitive or something that would have to be timed with something particular Linus explained that he didn’t know exactly when it would get published and it didn’t bother me. I figured it would be cool whenever!
On the morning of October 14 I collected the paper from my mailbox (because yes, I still do have a paper version newspaper arriving at my home every morning) and boom, I spotted an interesting little note in the lower right hand corner.
You can see the (Swedish-speaking) front-page blurb on the photo on the right.
The interesting timing this morning made it out so that this was the same morning I delivered a keynote at Castor Software Days at KTH in Stockholm titled “curl, a hobby project that conquered the world” (slides) – which by the way was received very well and I got a lot of positive comments and interesting conversations afterwards. And lots of people of course noticed the interestingly timed coincidence with the DN article!
The DN article reaches out to “ordinary” people in ways I’m not used to, so of course this made more of my non-techie friends suddenly realize a little more of what I do. I think it captures my “journey” and my approach to life and curl fairly well.
I’ll probably extend this blog post with links/photos of the actual DN articles at a later point once I feel I don’t risk undermining DN’s business by doing so.
(photos by Jonas Lindkvist, Dagens Nyheter, used in the online article about me)
I personally have not done this many commits to curl in a single month (August 2019) for over three years. This increased activity is of course primarily due to the merge of and work with the HTTP/3 code. And yet, that is still only in its infancy…
the 185th release 6 changes 54 days (total: 7,845) 81 bug fixes (total: 5,347) 214 commits (total: 24,719) 1 new public libcurl function (total: 81) 1 new curl_easy_setopt() option (total: 269) 4 new curl command line option (total: 225) 46 contributors, 23 new (total: 2,014) 29 authors, 14 new (total: 718) 2 security fixes (total: 92) 450 USD paid in Bug Bounties
Two security advisories
TFTP small blocksize heap buffer overflow
(CVE-2019-5482) If you told curl to do TFTP transfers using a smaller than default “blocksize” (default being 512), curl could overflow a heap buffer used for the protocol exchange. Rewarded 250 USD from the curl bug bounty.
FTP-KRB double-free
(CVE-2019-5481) If you used FTP-kerberos with curl and the server maliciously or mistakenly responded with a overly large encrypted block, curl could end up doing a double-free in that exit path. This would happen on applications where allocating a large 32 bit max value (up to 4GB) is a problem. Rewarded 200 USD from the curl bug bounty.
Changes
The new features in 7.66.0 are…
HTTP/3
This experimental feature is disabled by default but can be enabled and works (by some definition of “works”). Daniel went through “HTTP/3 in curl” in this video from a few weeks ago:
There’s a standard HTTP header that some servers return when they can’t or won’t respond right now, which indicates after how many seconds or at what point in the future the request might be fulfilled. libcurl can now return that number easily and curl’s –retry option makes use of it (if present).
curl_multi_poll
curl_multi_poll is a new function offered that is very similar to curl_multi_wait, but with one major benefit: it solves the problem for applications of what to do for the occasions when libcurl has no file descriptor at all to wait for. That has been a long-standing and perhaps far too little known issue.
SASL authzid
When using SASL authentication, curl and libcurl now can provide the authzid field as well!
Bug-fixes
Some interesting bug-fixes included in this release..
.netrc and .curlrc on Windows
Starting now, curl and libcurl will check for and use the dot-prefixed versions of these files even on Windows and only fall back and check for and use the underscore-prefixed versions for compatibility if the dotted one doesn’t exist. This unifies curl’s behavior across platforms.
asyn-thread: create a socketpair to wait on
With this perhaps innocuous-sounding change, libcurl on Linux and other Unix systems will now provide a file descriptor for the application to wait on while name resolving in a background thread. This lets applications know better when to call libcurl again and avoids having to just blindly wait and retry. A performance gain.
Credentials in URL when using HTTP proxy
We found and fixed a regression that made curl not use credentials properly from the URL when doing multi stage authentication (like HTTP Digest) with a proxy.
Move code into vssh for SSH backends
A mostly janitor-style fix that also now abstracted away more SSH-using code to not know what particular SSH backend that is being used while at the same time making it easier to write and provide new SSH backends in the future. I’m personally working a little slowly on one, to be talked about at a later point.
Disable HTTP/0.9 by default
If you want libcurl to accept and deliver HTTP/0.9 responses to your application, you need to tell it to do that. Starting in this version, curl will consider those invalid HTTP responses by default.
alt-svc improvements
We introduced alt-svc support a while ago but as it is marked experimental and nobody felt a strong need to use it, it clearly hasn’t been used or tested much in real life. When we’ve worked on using alt-svc to bootstrap into HTTP/3 we found and fixed a whole range of little issues with the alt-svc support and it is now in a much better shape. However, it is still marked experimental.
IPv6 addresses in URLs
It was reported that the URL parser would accept malformatted IPv6 addresses that subsequently and counter-intuitively would get resolved as a host name internally! An example URL would be “https://[ab.de]/’ – where all the letters and symbols within the brackets are individually allowed components of a IPv6 numerical address but it still isn’t a valid IPv6 syntax and instead is a legitimate and valid host name.
Going forward!
We recently ran a poll among users of what we feel are the more important things to work on, and with that the rough roadmap has been updated. Those are things I want to work on next but of course I won’t guarantee anything and I will greatly appreciate all help and assistance that I can get. And sure, we can and will work on other things too!
$ cd $HOME/src
$ unzip wolfssl-4.1.0-gplv3-fips-ready.zip
$ cd wolfssl-4.1.0-gplv3-fips-ready
Build the fips-ready wolfSSL and install it somewhere suitable
$ ./configure --prefix=$HOME/wolfssl-fips --enable-harden --enable-all
$ make -sj
$ make install
Download curl, the normal curl package. (in my case I got curl 7.65.3)
Unzip the source code somewhere suitable
$ cd $HOME/src
$ unzip curl-7.65.3.zip
$ cd curl-7.65.3
Build curl with the just recently built and installed fips ready wolfSSL version.
$ LD_LIBRARY_PATH=$HOME/wolfssl-fips/lib ./configure --with-wolfssl=$HOME/wolfssl-fips --without-ssl
$ make -sj
Now, verify that your new build matches your expectations by:
$ ./src/curl -V
It should show that it uses wolfSSL and that all the protocols and features you want are enabled and present. If not, iterate until it does!
“FIPS Ready means that you have included the FIPS code into your build and that you are operating according to the FIPS enforced best practices of default entry point, and Power On Self Test (POST).”