curl survey 2017 – analysis

The results are in. The curl user survey 2017 was up for a little over two weeks and attracted answers from a total of 513 individuals. This was a much better turnout that last year’s disappointment – thank you everyone!

The 2017 survey analysis as a 40 page PDF

This year we learned that the distribution curve for the amount of protocols people use curl for looks like this:

And the interest in getting even more protocols supported is still high, if not even very high and I think the top-most requested protocol is a bit surprising:

The outcome of the survey is the analysis document in which I’ve summarized by thoughts and added a bunch of graphs and other diagrams that illustrate the numbers. In particular compared to previous’ years results. It became a 40 page thing as I’ve tried to be detailed and also somewhat elaborate on commenting and reacting to a lot of the write-in suggestions and comments!

If you want to draw your own conclusions or just verify mine, I also offer you the following source material:

  1. The pristine 2017 CSV file as downloaded from Google, with all the results from the survey.
  2. To compare with last year, I also offer you the 2016 CSV file.
  3. During my the work of producing the analysis document, I imported the 2017 CSV file into libreoffice and fiddled with a lot of numbers and graphs, most of that didn’t end up in the document but you can find the raw 2017 survey libreoffice calc file and verify the outcomes or the formulas used.

Number of semicolon separated words in libreoffice calc

=LEN(TRIM(A1))-LEN(SUBSTITUTE(TRIM(A1),";",""))+1

The length of the entire A1 field (white space trimmed) minus the length of the A1 field with all the semicolons removed, plus one.

A string in A1 that looks like “A;B;C” will give the number 3.

I had to figure this out and I thought I’d store it here for later retrieval. Maybe someone else will enjoy it as well. The same formula works in Excel too I think.

curl: 5000 stars

The curl project has been hosted on github since March 2010, and we just now surpassed 5000 stars. A star on github of course being a completely meaningless measurement for anything. But it does show that at least 5000 individuals have visited the page and shown appreciation this low impact way.

5000 is not a lot compared to the really popular projects.

On August 12, 2014 I celebrated us passing 1000 stars with a beer. Hopefully it won’t be another seven years until I can get my 10000 stars beer!

The curl user survey 2017

The annual survey for curl and libcurl users is open. The 2017 edition has some minor edits since last year but is mostly the same set of questions used before. To help us detect changes and trends over time.

If you use curl or libcurl, in any way, shape or form, please consider spending a few minutes of your precious time on this. Your input helps us understand where we are and in which direction we should go next.

Fill in the form!

The poll is open fourteen days from Friday May 12th until midnight (CEST) May 27th 2017. All data we collect is non-personal and anonymous.

To get some idea of what sort of information we extract and collect from the results, have a look at the analysis of last year’s survey.

The subsequent analysis of the 2017 user survey.

Improving timers in libcurl

A few years ago I explained the timer and timeout concepts of the libcurl internals. A decent prerequisite for this post really.

Today, I introduced “timer IDs” internally in libcurl so that all expiring timers we use has to specify which timer it is with an ID, and we only have a set number of IDs to select from. Turns out there are only about 10 of them. 10 different ones. Per easy handle.

With this change, we now only allow one running timer for each ID, which then makes it possible for us to change timers during execution so that they never “fire” in vain like they used to do (since we couldn’t stop or remove them them before they expired previously). This change makes event loops slightly more efficient since now they will end up getting much fewer “spurious” timeouts that happen only because we had no infrastructure to prevent them.

Another benefit from only keeping one single timer for each ID in the list of timers, is that the dynamic list of running timers immediately become much shorter. This, because many times the same timer ID was used again and we would then add a new node to the list so the timer that had one purpose would expire twice (or more). But now it no longer works like that. In some typical uses I’ve tested, I’ve seen the list shrink from a maximum of 7-8 nodes down to a maximum of 1 or 2.

Finally, since we now have a finite number of timers that can be set at any given time and we know the maximum amount is fairly small, I could make the timer code completely skip using dynamic memory. Allocating 10 tiny structs as part of the main handle allocation is more efficient than doing tiny mallocs() for them one by one. In a basic comparison test I’ve run, this reduced the total number of allocations from 82 to 72 for “curl localhost”.

This change will be included in the pending curl release targeted to ship on June 14th 2017.  Possibly called version 7.54.1.

Those are all in a tree

As explained previously: the above explanation of timers goes for the set of timers kept for each individual easy handle, and with libcurl you can add an unlimited amount of easy handles to a multi handle (to perform lots of transfers in parallel) and then the multi handle has a self-balanced splay tree with the nearest-in-time timer for each individual easy handle as nodes in the tree, so that it can quickly and easily figure out which handle that needs attention next and when in time that is.

The illustration below shows a captured imaginary moment in time when there are five easy handles in different colors, all doing their own separate transfers, Each easy handle has three private timers set. The tree contains five nodes and the root of the tree is the node representing the the easy handle that needs to be taken care of next (in time). It also means we immediately know exactly how long time there is left until libcurl needs to act next.

Expiry

As soon “time N” occurs or expires, libcurl takes care of what the yellow handle needs to do and then removes that timer node from the tree. The yellow handle then advances the next timer first in line and the tree gets re-adjusted accordingly so that the new yellow first-node gets re-inserted positioned at the right place in the tree.

In our imaginary case here, the new yellow time N (formerly known as N + 1) is now later in time than L, but before M. L is now nearest in time and the tree has now adjusted to look something like this:

Since the tree only really cares about the root timer for each handle, you also see how adding a new timeout to single easy handle that isn’t the next in time is a really quick operation. It just adds a node in a linked list – per that specific handle. The linked list which now has a maximum length that is capped to the total amount of different timers: 10.

Straight-forward!

A curl delivery network

I’ve run my own public web sites on hardware I’ve administered myself for over twenty years now. I’ve hosted the curl web site myself since it’s inception.

The curl web site at curl.haxx.se has recently been delivering roughly 1.5 terabyte of data to the world per month. The CA bundle we convert to PEM from the Mozilla source code, is alone downloaded more than 100,000 times per day. Occasional blog entries I’ve posted here on my blog have climbed very fast on popular sites such as Hacker news and Reddit, and have resulted in intense visitor storms hitting this same server – sometimes reaching visitor counts above 200,000 “uniques” – most of them within the first few hours of the publication. At times, those visitor spikes have effectively brought the server to its knees.

Yes, my personal web site and the curl web site are both sharing the same physical server. It also hosts more than a dozen other sites and numerous services for our own pleasures and fun, providing services for a handful of different open source projects. So when the server has to cease doing work because it runs out of memory or hits other resource restraints, that causes interruptions all over. Oh yes, and my email doesn’t reach me.

Inconvenient and annoying.

The server

Haxx owns and runs this co-located server that we have a busload of web servers on – for the good of the projects and people that run things on it. This machine’s worst bottle neck is available RAM memory and perhaps I/O performance. Every time the server goes down to a crawl due to network traffic overload we discuss how we should upgrade it. Installing a new machine and transferring over all the sites and services is work. Work that none of us at Haxx are very happy to volunteer to do. So it hasn’t been done yet, and frankly the server handles the daily load just fine and without even a blink. Which is ninety nine point something percent of the time…

Haxx pays for a certain amount of network traffic so as long as we’re below some threshold we remain paying the same monthly fee. We don’t want to increase the traffic by magnitudes as that would cost more.

The specific machine, that sits deep inside a server room in Stockholm Sweden, is a five(?) years old Dell Poweredge E310, Intel Xeon X3440 2.53GHz with 8GB ram, This model is shown on the image at the top.

Alternatives that hasn’t helped

Why not a mirror system? We had a fair amount of curl site mirrors a few years ago, but it never worked well because they were always less reliable than the main site and they often turned stale and out of sync with the master site which eventually just hurt users.

They also trick visitors into bookmark or otherwise go back to the mirror site instead of the real one and there were always the annoying people who couldn’t resist but to fill the mirror with ads and stuff. Plus, they didn’t help much with with the storms to the main site.

Why not a cloud server? Because with the amount of services, servers and various things we do on our server, it would be inconvenient and expensive. But perhaps even more because we started out like this so we have invested time and energy into the infrastructure as it works right now. And I enjoy rowing my own boat!

The CDN

Fastly reached out and graciously offered to help us handle the load. Both on the account of traffic amounts but also to save our machine from struggling this hard the next time I’ll write something that tickles people’s curiosity (or rage) to that level when several thousands of visitors want to read the same article at the same time.

Starting now, the curl.haxx.se and the daniel.haxx.se web sites are fronted by Fastly. It should give web site visitors from all over the world faster response times and it will make the site more reliable and less likely to have problems due to traffic load going forward.

In case you’re not familiar with what a CDN is, a simplified explanation would say it is a globally distributed network of reverse proxy servers deployed in multiple data centers. These CDN servers front the Internet and will to the largest extent possible serve the visitors with the right content directly from their own caches instead of them reaching the actual lowly backend server I run that hosts the original content. Fastly has lots of servers across the globe for this purpose. Users who are a long way away from Sweden will probably be the ones who will notice this change the most, as you may suddenly find haxx.se content much closer (network round-trip wise) than before.

Standards

These new servers will host the sites over HTTPS just like before, and they will require TLS 1.2 and SNI. They will work over IPv6 and support HTTP/2.  Network standard wise, there shouldn’t be any step down – and honestly, I haven’t exactly been on the cutting edge of these technologies myself for these sites in the past.

Editing the site

We will keep editing and maintaining the site like before. It is made up of an old system with templates and include files that generate mostly static web pages. The site is mostly available on github and using that, you can build a local version for development and trying out changes before they land.

Hopefully, this move to Fastly will only make the site faster and more reliable. If you notice any glitches or experience any problems with the site, please let us know!