10,000 stars

On github, you can ‘star’ a project. It’s a fairly meaningless way to mark your appreciation of a project hosted on that site and of course, the number doesn’t really mean anything and it certainly doesn’t reflect how popular or widely used or unused that particular software project is. But here I am, highlighting the fact that today I snapped the screenshot shown above when the curl project just reached this milestone: 10,000 stars.

In the great scheme of things, the most popular and starred projects on github of course have magnitudes more stars. Right now, curl ranks as roughly the 885th most starred project on github. According to github themselves, they host an amazing 25 million public repositories which thus puts curl in the top 0.004% star-wise.

There was appropriate celebration going on in the Stenberg casa tonight and here’s a photo to prove it:

I took a photo when we celebrated 1,000 stars. It doesn’t feel so long ago but was a little over 1500 days ago.

August 12 2014

Onwards and upwards!

The Polhem prize, one year later

On September 25th 2017, I received the email that first explained to me that I had been awarded the Polhem Prize.

Du har genom ett omfattande arbete vaskats fram som en värdig mottagare av årets Polhemspris. Det har skett genom en nomineringskommitté och slutligen ett råd med bred sammansättning. Priset delas ut av Kungen den 19 oktober på Tekniska muséet.

My attempt of an English translation:

You have been selected as a worthy recipient of this year's Polhem prize through extensive work. It has been through a nomination committee and finally a council of broad composition. The prize is awarded by the King on October 19th at the Technical Museum.

A gold medal

At the award ceremony in October 2017 I received the gold medal at the most fancy ceremony I could ever wish for, where I was given the most prestigious award I couldn’t have imagined myself even being qualified for, handed over by no other than the Swedish King.

An entire evening with me in focus, where I was the final grand finale act and where my life’s work was the primary reason for all those people being dressed up in fancy clothes!

Things have settled down since. The gold medal has started to get a little dust on it where it lies here next to me on my work desk. I still glance at it every once in a while. It still feels surreal. It’s a fricking medal in pure gold with my name on it!

I almost forget the money part of the prize. I got a lot of money as well, but in retrospect it is really the honors, that evening and the gold medal that stick best in my memory. Money is just… well, money.

So did the award and prize make my life any different? Yes sure, a little, and I’ll tell you how.

What’s all that time spent on?

My closest surrounding of friends and family got a better understanding of what I’ve actually been doing all these long hours, all these years and more than one phrase in the style of “oh, so you actually did something useful?!” have been uttered.

Certainly I’ve tried to explain to them before, but nothing works as good as a gold medal from an award committee to say that what I do is actually appreciated “out there” and it has made a serious impact on the world.

I think I’m considered a little less weird now when I keep spending night hours in front of my computer when the house is otherwise dark and silent. Well, maybe still weird, but at least my weirdness has proven to result in something useful for mankind and that’s more than many other sorts of weird do… We all have hobbies.

What is curl?

Family and friends have gotten a rudimentary level of understanding of what curl is and what it does. I’m not suggesting they fully grasp it or know what an “internet protocol” is now, but at least a lot of people understand that it works with “internet transfers”. It’s not like people were totally uninterested before, but when I was given this prize – by a jury of engineers no less – that says this is a significant invention and accomplishment with a value that “can not be overestimated“, it made them more interested. The little video that was produced helped:

Some mysteries remain

People in general still have a hard time to grasp the reach of the project, how much time I’ve spent so far on it, how I can find motivation to keep up the work and not the least how this is all given away for free for everyone.

The simple fact that these are all questions that I’ve been asked I think is a small reward in itself. I think the fact that I was awarded this prize for my work on Open Source is awesome and I feel honored to be a person who introduces this way of thinking to some of the people who previously would think that you have to sell proprietary things or earn a lot of money for your products in order to impact and change society as a whole.

Not widely known

The Polhem prize is not widely known in Sweden among the general populace and thus neither is the fact that I won it. Only a very special subset of people know about this. Of course it is even less known outside of Sweden and in fact the information about the prize given in English is very sparse.

Next year’s winner

The other day I received my invitation to participate in this year’s award ceremony on November 14. Of course I’ll happily accept that and I will be there and celebrate the winner this year!

The curl project

How did the prize affect the project itself, the project that I was awarded for having cared for this long?

It hasn’t affected it much at all (as far as I can tell). The project has moved along like before and we’ve worked on fixing bugs and added features and cool things over time after my award just as we did before it. That’s how it has felt like. Business as usual.

If anything, I think I might have gotten some renewed energy and interest in the project and the commit author statistics actually show that my commit frequency has gone up since around the time I got the award. Our gitstats show that I’ve done more than half of the commits every single month the last year, most of this time even more than 70% of the commits.

I may have served twenty years here, but I’m not done yet!

More curl bug bounty

Together with Bountygraph, the curl project now offers money to security researchers for report security vulnerabilities to us.

https://bountygraph.com/programs/curl

The idea is that sponsors donate money to the bounty fund, and we will use that fund to hand out rewards for reported issues. It is a way for the curl project to help compensate researchers for the time and effort they spend helping us improving our security.

Right now the bounty fund is very small as we just started this project, but hopefully we can get a few sponsors interested and soon offer “proper” rewards at decent levels in case serious flaws are detected and reported here.

If you’re a company using curl or libcurl and value security, you know what you can do…

Already before, people who reported security problems could ask for money from Hackerone’s IBB program, and this new program is in addition to that – even though you won’t be able to receive money from both bounties for the same issue.

After I announced this program on twitter yesterday, I did an interview with Arif Khan for latesthackingnews.com. Here’s what I had to say:

A few questions

Q: You have launched a self-managed bug bounty program for the first time. Earlier, IBB used to pay out for most security issues in libcurl. How do you think the idea of self-management of a bug bounty program, which has some obvious problems such as active funding might eventually succeed?

First, this bounty program is run on bountygraph.com so I wouldn’t call it “self-managed” since we’re standing on a lot of infra setup and handled by others.

To me, this is an attempt to make a bounty program that is more visible as clearly a curl bounty program. I love Hackerone and the IBB program for what they offer, but it is A) very generic, so the fact that you can get money for curl flaws there is not easy to figure out and there’s no obvious way for companies to sponsor curl security research and B) they are very picky to which flaws they pay money for (“only critical flaws”) and I hope this program can be a little more accommodating – assuming we get sponsors of course.

Will it work and make any differences compared to IBB? I don’t know. We will just have to see how it plays out.

Q: How do you think the crowdsourcing model is going to help this bug bounty program?

It’s crucial. If nobody sponsors this program, there will be no money to do payouts with and without payouts there are no bounties. Then I’d call the curl bounty program a failure. But we’re also not in a hurry. We can give this some time to see how it works out.

My hope is though that because curl is such a widely used component, we will get sponsors interested in helping out.

Q: What would be the maximum reward for most critical a.k.a. P0 security vulnerabilities for this program?

Right now we have a total of 500 USD to hand out. If you report a p0 bug now, I suppose you’ll get that. If we just get sponsors, I’m hoping we should be able to raise that reward level significantly. I might be very naive, but I think we won’t have to pay for very many critical flaws.

It goes back to the previous question: this model will only work if we get sponsors.

Q: Do you feel there’s a risk that bounty hunters could turn malicious?

I don’t think this bounty program particularly increases or reduces that risk to any significant degree. Malicious hunters probably already exist and I would assume that blackhat researchers might be able to extract more money on the less righteous markets if they’re so inclined. I don’t think we can “outbid” such buyers with this program.

Q: How will this new program mutually benefit security researchers as well as the open source community around curl as a whole?

Again, assuming that this works out…

Researchers can get compensated for the time and efforts they spend helping the curl project to produce and provide a more secure product to the world.

curl is used by virtually every connected device in the world in one way or another, affecting every human in the connected world on a daily basis. By making sure curl is secure we keep users safe; users of countless devices, applications and networked infrastructure.

Update: just hours after this blog post, Dropbox chipped in 32,768 USD to the curl bounty fund…

The world’s biggest curl installations

curl is quite literally used everywhere. It is used by a huge number of applications and devices. But which applications, devices and users are the ones with the largest number of curl installations? I’ve tried to come up with a list…

I truly believe curl is one of the world’s most widely used open source projects.

If you have comments, other suggestions or insights to help me polish this table or the numbers I present, please let me know!

Some that didn’t make the top-10

10 million Nintendo Switch game consoles all use curl, more than 20 million Chromebooks have been sold and they have curl as part of their bundled OS and there’s an estimated 40 million printers (primarily by Epson and HP) that aren’t on the top-10. To reach this top-list, we’re looking at 50 million instances minimum…

10. Internet servers: 50 million

There are many (Linux mainly) servers on the Internet. curl and libcurl comes pre-installed on some Linux distributions and for those that it doesn’t, most users and sysadmins install it. My estimate says there are few such servers out there without curl on them.

This source says there were 75 million servers “hosting the Internet” back in 2013.

curl is a default HTTP provider for PHP and a huge percentage of the world’s web sites run at least parts with PHP.

9. Sony Playstation 4: 75 million

Bundled with the Operating system on this game console comes curl. Or rather libcurl I would expect. Sony says 75 million units have been sold.

curl is given credit on the screen Open Source software used in the Playstation 4.

8. Netflix devices: 90 million

I’ve been informed by “people with knowledge” that libcurl runs on all Netflix’s devices that aren’t browsers. Some stats listed on the Internet says 70% of the people watching Netflix do this on their TVs, which I’ve interpreted as possible non-browser use. 70% of the total 130 million Netflix customers makes 90.

libcurl is not used for the actual streaming of the movie, but for the UI and things.

7. Grand Theft Auto V: 100 million

The very long and rarely watched ending sequence to this game does indeed credit libcurl. It has also been recorded as having been sold in 100 million copies.

There’s an uncertainty here if libcurl is indeed used in this game for all platforms GTA V runs on, which then could possibly reduce this number if it is not.

6. macOS machines: 100 million

curl has shipped as a bundled component of macOS since August 2001. In April 2017, Apple’s CEO Tim Cook says that there were 100 million active macOS installations.

Now, that statement was made a while ago but I don’t have any reason to suspect that the number has gone down notably so I’m using it here. No macs ship without curl!

5. cars: 100 million

I wrote about this in a separate blog post. Eight of the top-10 most popular car brands in the world use curl in their products. All in all I’ve found curl used in over twenty car brands.

Based on that, rough estimates say that there are over 100 million cars in the world with curl in them today. And more are coming.

4. Fortnite: 120 million

This game is made by Epic Games and credits curl in their Third Party Software screen.

In June 2018, they claimed 125 million players. Now, I supposed a bunch of these players might not actually have their own separate device but I still believe that this is the regular setup for people. You play it on your own console, phone or computer.

3. Television sets: 380 million

We know curl is used in television sets made by Sony, Philips, Toshiba, LG, Bang & Olufsen, JVC, Panasonic, Samsung and Sharp – at least.

The wold market was around 229 million television sets sold in 2017 and about 760 million TVs are connected to the Internet. Counting on curl running in 50% of the connected TVs (which I think is a fair estimate) makes 380 million devices.

2. Windows 10: 500 million

Since a while back, Windows 10 ships curl bundled by default. I presume most Windows 10 installations actually stay fairly updated so over time most of the install base will run a version that bundles curl.

In May 2017, one number said 500 million Windows 10 machines.

1. Smart phones: 3000 million

I posit that there are almost no smart phones or tablets in the world that doesn’t run curl.

curl is bundled with the iOS operating system so all iPhones and iPads have it. That alone is about 1.3 billion active devices.

curl is bundled with the Android version that Samsung, Xiaomi and OPPO ship (and possibly a few other flavors too). According to some sources, Samsung has something like 30% market share, and Apple around 20% – for mobile phones. Another one billion devices seems like a fair estimate.

Further, curl is used by some of the most used apps on phones: Youtube, Instagram, Skype, Spotify etc. The three first all boast more than one billion users each, and in Youtube’s case it also claims more than one billion app downloads on Android. I think it’s a safe bet that these together cover another 700 million devices. Possibly more.

Same users, many devices

Of course we can’t just sum up all these numbers and reach a total number of “curl users”. The fact is that a lot of these curl instances are used by the same users. With a phone, a game console, a TV and some more an ordinary netizen runs numerous different curl instances in their daily lives.

Summary

Did I ever expect this level of success? No.

 

libcurl gets a URL API

libcurl has done internet transfers specified as URLs for a long time, but the URLs you’d tell libcurl to use would always just get parsed and used internally.

Applications that pass in URLs to libcurl would of course still very often need to parse URLs, create URLs or otherwise handle them, but libcurl has not been helping with that.

At the same time, the under-specification of URLs has led to a situation where there’s really no stable document anywhere describing how URLs are supposed to work and basically every implementer is left to handle the WHATWG URL spec, RFC 3986 and the world in between all by themselves. Understanding how their URL parsing libraries, libcurl, other tools and their favorite browsers differ is complicated.

By offering applications access to libcurl’s own URL parser, we hope to tighten a problematic vulnerable area for applications where the URL parser library would believe one thing and libcurl another. This could and has sometimes lead to security problems. (See for example Exploiting URL Parser in Trending Programming Languages! by Orange Tsai)

Additionally, since libcurl deals with URLs and virtually every application using libcurl already does some amount of URL fiddling, it makes sense to offer it in the “same package”. In the curl user survey 2018, more than 40% of the users said they’d use an URL API in libcurl if it had one.

Handle based

Create a handle, operate on the handle and then cleanup the handle when you’re done with it. A pattern that is familiar to existing users of libcurl.

So first you just make the handle.

/* create a handle */
CURLU *h = curl_url();

Parse a URL

Give the handle a full URL.

/* "set" a URL in the handle */
curl_url_set(h, CURLUPART_URL,
"https://example.com/path?q=name", 0);

If the parser finds a problem with the given URL it returns an error code detailing the error.  The flags argument (the zero in the function call above) allows the user to tweak some parsing behaviors. It is a bitmask and all the bits are explained in the curl_url_set() man page.

A parsed URL gets split into its components, parts, and each such part can be individually retrieved or updated.

Get a URL part

Get a separate part from the URL by asking for it. This example gets the host name:

/* extract host from the URL */
char *host;
curl_url_get(h, CURLUPART_HOST, &host, 0);

/* use it, then free it */
curl_free(host);

As the example here shows, extracted parts must be specifically freed with curl_free() once the application is done with them.

The curl_url_get() can extract all the parts from the handle, by specifying the correct id in the second argument. scheme, user, password, port number and more. One of the “parts” it can extract is a bit special: CURLUPART_URL. It returns the full URL back (normalized and using proper syntax).

curl_url_get() also has a flags option to allow the application to specify certain behavior.

Set a URL part

/* set a URL part */
curl_url_set(h, CURLUPART_PATH, "/index.html", 0);

curl_url_set() lets the user set or update all and any of the individual parts of the URL.

curl_url_set() can also update the full URL, which also accepts a relative URL in case an existing one was already set. It will then apply the relative URL onto the former one and “transition” to the new absolute URL. Like this;

/* first an absolute URL */
curl_url_set(h, CURLUPART_URL,
     "https://example.org:88/path/html", 0);

/* .. then we set a relative URL "on top" */
curl_url_set(h, CURLUPART_URL,
     "../new/place", 0);

Duplicate a handle

It might be convenient to setup a handle once and then make copies of that…

CURLU *n = curl_url_dup(h);

Cleanup the handle

When you’re done working with this URL handle, free it and all its related resources.

curl_url_cleanup(h);

Ship?

This API is marked as experimental for now and ships for the first time in libcurl 7.62.0 (October 31, 2018). I will happily read your feedback and comments on how it works for you, what’s missing and what we should fix to make it even more usable for you and your applications!

We call it experimental to reserve the right to modify it slightly  going forward if necessary, and as soon as we remove that label the API will then be fixed and stay like that for the foreseeable future.

See also

The URL API section in Everything curl.

DoH in curl

DNS-over-HTTPS (DoH) is being designed (it is not an RFC quite yet but very soon!) to allow internet clients to get increased privacy and security for their name resolves. I’ve previously explained the DNS-over-HTTPS functionality within Firefox that ships in Firefox 62 and I did a presentation about DoH and its future in curl at curl up 2018.

We are now introducing DoH support in curl. I hope this will not only allow users to start getting better privacy and security for their curl based internet transfers, but ideally this will also provide an additional debugging tool for DoH in other clients and servers.

Let’s take a look at how we plan to let applications enable this when using libcurl and how libcurl has to work with this internally to glue things together.

How do I make my libcurl transfer use DoH?

There’s a primary new option added, which is the “DoH URL”. An application sets the CURLOPT_DOH_URL for a transfer, and then libcurl will use that service for resolving host names. Easy peasy. There should be nothing else in the transfer that changes or appears differently. It’ll just resolve the host names over DoH instead of using the default resolver!

What about bootstrap, how does libcurl find the DoH server’s host name?

Since the DoH URL itself typically is given using a host name, that first host name will be resolved using the normal resolver – or if you so desire, you can provide the IP address for that host name with the CURLOPT_RESOLVE option just like you can for any host name.

If done using the resolver, the resolved address will then be kept in libcurl’s DNS cache for a short while and the DoH connection will be kept in the regular connection pool with the other connections, making subsequent DoH resolves on the same handle much faster.

How do I use this from the command line?

Tell curl which DoH URL to use with the new –doh-url command line option:

$ curl --doh-url https://dns-server.example.com https://www.example.com

How do I make my libcurl code use this?

curl = curl_easy_init();
curl_easy_setopt(curl, CURLOPT_URL,
                 "https://curl.haxx.se/");
curl_easy_setopt(curl, CURLOPT_DOH_URL,
                 "https://doh.example.com/");
res = curl_easy_perform(curl);

Internals

Internally, libcurl itself creates two new easy handles that it adds to the existing multi handles and they are then performing two HTTP requests while the original transfer sits in the “waiting for name resolve” state. Once the DoH requests are completed, the original transfer’s state can progress and continue on.

libcurl handles parallel transfers perfectly well already and by leveraging the already existing support for this, it was easy to add this new functionality and still work non-blocking and even event-based correctly depending on what libcurl API that is being used.

We had to add a new little special thing that makes libcurl handle the end of a transfer in a new way since there are now easy handles that are created and added to the multi handle entirely without the user’s knowledge, so the code also needs to remove and delete those handles when they’re done serving their purposes.

Was this hard to add to a 20 year old code base?

Actually, no. It was surprisingly easy, but then I’ve also worked on a few different client-side DoH implementations already so I had gotten myself a clear view of how I wanted the functionality to work plus the fact that I’m very familiar with the libcurl internals.

Plus, everything inside libcurl is already using non-blocking code and the multi interface paradigms so the foundation for adding parallel transfers like this was already in place.

The entire DoH patch for curl, including documentation and test cases, was a mere 1500 lines.

Ship?

This is merged into the master branch in git and is planned to ship as part of the next release: 7.62.0 at the end of October 2018.

curl 7.61.1 comes with only bug-fixes

Already at the time when we shipped the previous release, 7.61.0, I had decided I wanted to do a patch release next. We had some pretty serious HTTP/2 bugs in the pipe to get fixed and there were a bunch of other unresolved issues also awaiting their treatments. Then I took off on vacation and and the HTTP/2 fixes took a longer time than expected to get on top of, so I subsequently decided that this would become a bug-fix-only release cycle. No features and no changes would be merged into master. So this is what eight weeks of only bug-fixes can look like.

Numbers

the 176th release
0 changes
56 days (total: 7,419)

102 bug fixes (total: 4,640)
151 commits (total: 23,439)
0 new curl_easy_setopt() options (total: 258)

0 new curl command line option (total: 218)
46 contributors, 21 new (total: 1,787)
27 authors, 14 new (total: 612)
  1 security fix (total: 81)

Notable bug-fixes this cycle

Among the many small fixes that went in, I feel the following ones deserve a little extra highlighting…

NTLM password overflow via integer overflow

This latest security fix (CVE-2018-14618) is almost identical to an earlier one we fixed back in 2017 called CVE-2017-8816, and is just as silly…

The internal function Curl_ntlm_core_mk_nt_hash() takes a password argument, the same password that is passed to libcurl from an application. It then gets the length of that password and allocates a memory area that is twice the length, since it needs to expand the password. Due to a lack of checks, this calculation will overflow and wrap on a 32 bit machine if a password that is longer than 2 gigabytes is passed to this function. It will then lead to a very small memory allocation, followed by an attempt to write a very long password to that small memory buffer. A heap memory overflow.

Some mitigating details: most architectures support 64 bit size_t these days. Most applications won’t allow passing in passwords that are two gigabytes.

This bug has been around since libcurl 7.15.4, released back in 2006!

Oh, and on the curl web site we now use the CVE number in the actual URL for all the security vulnerabilities to make them easier to find and refer to.

HTTP/2 issues

This was actually a whole set of small problems that together made the new crawler example not work very well – until fixed. I think it is safe to say that HTTP/2 users of libcurl have previously used it in a pretty “tidy” fashion, because I believe I corrected four or five separate issues that made it misbehave.  It was rather pure luck that has made it still work as well as it has for past users!

Another HTTP/2 bug we ran into recently involved us discovering a little quirk in the underlying nghttp2 library, which in some very special circumstances would refuse to blank out the stream id to struct pointer mapping which would lead to it delivering a pointer to a stale (already freed) struct at a later point. This is fixed in nghttp2 now, shipped in its recent 1.33.0 release.

Windows send-buffer tuning

Making uploads on Windows from between two to seven times faster than before is certainly almost like a dream come true. This is what 7.61.1 offers!

Upload buffer size increased

In tests triggered by the fix above, it was noticed that curl did not meet our performance expectations when doing uploads on really high speed networks, notably on localhost or when using SFTP. We could easily double the speed by just increasing the upload buffer size. Starting now, curl allocates the upload buffer on demand (since many transfers don’t need it), and now allocates a 64KB buffer instead of the previous 16KB. It has been using 16KB since the 2001, and with the on-demand setup and the fact that computer memories have grown a bit during 17 years I think it is well motivated.

A future curl version will surely allow the application to set this upload buffer size. The receive buffer size can already be set.

Darwinssl goes ALPN

While perhaps in the grey area of what a bugfix can be, this fix  allows curl to negotiate ALPN using the darwinssl backend, which by extension means that curl built to use darwinssl can now – finally – do HTTP/2 over HTTPS! Darwinssl is also known under the name Secure Transport, the native TLS library on macOS.

Note however that macOS’ own curl builds that Apple ships are no longer built to use Secure Transport, they use libressl these days.

The Auth Bearer fix

When we added support for Auth Bearer tokens in 7.61.0, we accidentally caused a regression that now is history. This bug seems to in particular have hit git users for some reason.

-OJ regression

The introduction of bold headers in 7.61.0 caused a regression which made a command line like “curl -O -J http://example.com/” to fail, even if a Content-Disposition: header with a correct file name was passed on.

Cookie order

Old readers of this blog may remember my ramblings on cookie sort order from back in the days when we worked on what eventually became RFC 6265.

Anyway, we never did take all aspects of that spec into account when we sort cookies on the HTTP headers sent off to servers, and it has very rarely caused users any grief. Still, now Daniel Gustafsson did a glorious job and tweaked the code to also take creation order into account, exactly like the spec says we should! There’s still some gotchas in this, but at least it should be much closer to what the spec says and what some sites might assume a cookie-using client should do…

Unbold properly

Yet another regression. Remember how curl 7.61.0 introduced the cool bold headers in the terminal? Turns out I of course had my escape sequences done wrong, so in a large number of terminal programs the end-of-bold sequence (“CSI 21 m”) that curl sent didn’t actually switch off the bold style. This would lead to the terminal either getting all bold all the time or on some terminals getting funny colors etc.

In 7.61.1, curl sends the “switch off all styles” code (“CSI 0 m”) that hopefully should work better for people!

Next release!

We’ve held up a whole bunch of pull requests to ship this patch-only release. Once this is out the door, we’ll open the flood gates and accept the nearly 10 changes that are eagerly waiting merge. Expect my next release blog post to mention several new things in curl!