Category Archives: Open Source

Open Source, Free Software, and similar

The curl bus factor

bus factor: the minimum number of team members that have to suddenly disappear from a project before the project stalls due to lack of knowledgeable or competent personnel.

Projects should strive to survive

If a project is worth using and deploying today and if it is a project worth sending patches to right now, it is also a project that should position itself to survive a loss of key individuals. However unlikely or unfortunate such an event would be.

Tools to calculate bus factor

All the available tools that determine the bus factor for a given project only run on code and check for commits, code churn or check how many files each person has done a significant share of changes in etc.

This number is really impossible to figure out without tools and tools really cannot take “general knowledge” into account, or “this person answers a lot of email on the list”, or this person has 48k in reputation on stack overflow already for responding to questions about the project.

The bus factor as evaluated by a tool pretty much has to be about amount of code, size of code or number of code changes, which may or may not be a good indicator of who knows what about the code. Those who author and commit changes probably have a good idea but a real problem is that you can’t reverse that view and say that just because you didn’t commit or change something, you don’t know. Do you know more about the code if you did many commits? Do you know more about the code if you changed more lines of code?

We can’t prove or assume lack of knowledge or interest by an absence of commits, edits or changes. And yet we can’t calculate bus factor if there’s no tool or way to calculate it.

A look at curl

curl is soon 20 years old and boasts 22k something commits. I’m the author of about 57% of them, and the second-most committer (who’s not involved anymore) has about 12%. That makes two committers having done 15.3k commits out of the 22k. If we for simplicity calculate bus factor based on commit numbers, we’d need 8580 commits from others and I would stop completely, to reach bus factor >2 (when the 2 top committers have less than 50% of the commits), which at the current commit rate equals in about 5 years. And it would take about 3 years to just push the factor above 1. So even when new people joins the project, they have a really hard time to significantly change the bus factor…

The image above shows the relative share of commits done in the curl project’s git source code repository (as a share of the total amount) by the top 4 commiters from January 1 2010 to July 5 2017 (click for higher resolution). The top dotted line shows the combined share of all four (at 82% right now) and the dark blue line is my share. You can see how my commit share has shrunk from 72% down to 57% over these last 7.5 years. If this trend holds, I’ll have less than 50% of the total commits done in curl in 3-4 years.

At the same time, the thicker light blue line that climbs up into the right is the total number of authors in the git repository, which recently surpassed 500 as you can see. (The line uses the right Y-axes)

We’re approaching 1600 individually named contributors thanked in the project and every release we do (we ship one every 8 weeks) has around 40 contributors, out of which typically around half are newcomers. The long tail is very long and the amount of drive-by just-once contributors is high. Also note how the number 1600 is way higher than the 500 something that has authored commits. Lots of people contribute in other ways.

When we ask our users “why don’t you contribute (more) to the project?” (which we do annually) what do they answer? They say its because 1) everything works, 2) I don’t have time 3) things get fixed fast enough 4) I don’t know the programming language 5) I don’t have the energy.

First as the 6th answer (at 5% 2017) comes “other” where some people actually say they wouldn’t know where to start and so on.

All of this taken together: there are no visible signs of us suffering from having a low bus factor. Lots of signs that people can do things when they want to if I don’t do it. Lots of signs that the code and concepts are understood.

Lots of signs that a low bus factor is not a big problem here. Or perhaps rather that the bus factor isn’t really as low as any tool would calculate it.

What if I…

Do I know who would pick up the project and move on if I die today? No. We’re a 100% volunteer-driven project. We create one of the world’s most widely used software components (easily more than three billion instances and counting) but we don’t know who’ll be around tomorrow to work on it. I can’t know because that’s not how the project works.

Given the extremely wide use of our stuff, given the huge contributor base, given the vast amounts of documentation and tests I think it’ll work out.

Just because you have a large bus factor doesn’t necessarily make the project a better place to ask questions. We’ve seen projects in the past where N persons involved are all from the same company and when that company removes its support for that project those people all go away. High bus factor, no people to ask.

Finally, let me just add that I would of course love to have many more committers and contributors in the curl project, and I think we would be an even better project if we did. But that’s a separate issue.

c-ares 1.13.0

The c-ares project may not be very fancy or make a lot of noise, but it steadily moves forward and boasts an amazing 95% code coverage in the automated tests.

Today we release c-ares 1.13.0.

This time there’s basically three notable things to take home from this, apart from the 20-something bug-fixes.

CVE-2017-1000381

Due to an oversight there was an API function that we didn’t fuzz and yes, it was found out to have a security flaw. If you ask a server for a NAPTR DNS field and that response comes back crafted carefully, it could cause c-ares to access memory out of bounds.

All details for CVE-2017-1000381 on the c-ares site.

(Side-note: this is the first CVE I’ve received with a 7(!)-digit number to the right of the year.)

cmake

Now c-ares can optionally be built using cmake, in addition to the existing autotools setup.

Virtual socket IO

If you have a special setup or custom needs, c-ares now allows you to fully replace all the socket IO functions with your own custom set with ares_set_socket_functions.

“OPTIONS *” with curl

(Note: this blog post as been updated as the command line option changed after first publication, based on comments to this very post!)

curl is arguably a “Swiss army knife” of HTTP fiddling. It is one of the available tools in the toolbox with a large set of available switches and options to allow us to tweak and modify our HTTP requests to really test, debug and torture our HTTP servers and services.

That’s the way we like it.

In curl 7.55.0 it will take yet another step into this territory when we finally introduce a way for users to send “OPTION *” and similar requests to servers. It has been requested occasionally by users over the years but now the waiting is over. (brought by this commit)

“OPTIONS *” is special and peculiar just because it is one of the few specified requests you can do to a HTTP server where the path part doesn’t start with a slash. Thus you cannot really end up with this based on a URL and as you know curl is pretty much all about URLs.

The OPTIONS method was introduced in HTTP 1.1 already back in RFC 2068, published in January 1997 (even before curl was born) and with curl you’ve always been able to send an OPTIONS request with the -X option, you just were never able to send that single asterisk instead of a path.

In curl 7.55.0 and later versions, you can remove the initial slash from the path part that ends up in the request by using –request-target. So to send an OPTION * to example.com for http and https URLs, you could do it like:

$ curl --request-target "*" -X OPTIONS http://example.com
$ curl --request-target "*" -X OPTIONS https://example.com/

In classical curl-style this also opens up the opportunity for you to issue completely illegal or otherwise nonsensical paths to your server to see what it does on them, to send totally weird options to OPTIONS and similar games:

$ curl --request-target "*never*" -X OPTIONS http://example.com

$ curl --request-target "allpasswords" http://example.com

Enjoy!

curl doesn’t spew binary anymore

One of the least favorite habits of curl during all these years, I’ve been told, is when users forget to instruct the command line tool where to store the downloaded file and as a direct consequence, curl instead sends a lot of binary “gunk” to the terminal. The end result of that is at best just a busload of weird-looking characters on the screen, but with just a little bit of bad luck it can also lock up the terminal completely or change it in other ways.

Starting in curl 7.55.0 (from this commit), curl will inspect the beginning of each download that has been told to get sent to the terminal (tty!) and attempt to detect and prevent raw binary output to get sent there. The code is only simply looking for a binary zero in the data.

$ curl https://example.com/image.jpg
Warning: Binary output can mess up your terminal. Use "--output -" to tell curl to output it to your terminal anyway, or consider "--output <FILE>" to save to a file.

As the warning message says, there’s an option to use to switch off this emergency check for when you truly know what you’re doing and you don’t need curl to prevent you from doing this. Then you just tell curl explicitly that you want the output to stdout, with “–output -” (or “-o -” for a shorter version):

$ curl -o - https://example.com/binblob.img

We’re eager to get your input and feedback on how this works. We are aware of the risk of false positives for UTF-16 and UTF-32 outputs, but we think they are rare enough to not make this a huge problem.

This feature should be able to drastically reduce the risk for this:

Pipes

(Update, added after the initial posting.)

So many have remarked or otherwise asked how this affects when stdout is piped into something else. It doesn’t affect that! The whole point of this check is to only show the warning message if the binary output is sent to the terminal. If you instead pipe the output to another program or if you redirect the output with >, that will not trigger this warning but will instead continue just like before. Just like you’d expect it to.

curl: read headers from file

Starting in curl 7.55.0 (since this commit), you can tell curl to read custom headers from a file. A feature that has been asked for numerous times in the past, and the answer has always been to write a shell script to do it. Like this:

#!/bin/sh
while read line; do
  args="$args -H '$line'";
done
curl $args $URL

That’s now a response of the past (or for users stuck on old curl versions). We can now instead tell curl to read headers itself from a file using the curl standard @filename way:

$ curl -H @headers https://example.com

… and this also works if you want to just send custom headers to the proxy you do CONNECT to:

$ curl --proxy-headers @headers --proxy proxy:8080 https://example.com/

(this is a pure curl tool change that doesn’t affect libcurl, the library)

curling over HTTP proxy

Starting in curl 7.55.0 (this commit), curl will no longer try to ask HTTP proxies to perform non-HTTP transfers with GET, except for FTP. For all other protocols, curl now assumes you want to tunnel through the HTTP proxy when you use such a proxy and protocol combination.

Protocols and proxies

curl supports 23 different protocols right now, if we count the S-versions (the TLS based alternatives) as separate protocols.

curl also currently supports seven different proxy types that can be set independently of the protocol.

One type of proxy that curl supports is a so called “HTTP proxy”. The official HTTP standard includes a defined way how to speak to such a proxy and ask it to perform the request on the behalf of the client. curl supports using that over either HTTP/1.1 or HTTP/1.0, where you’d typically only use the latter version if you the first really doesn’t work with your ancient proxy.

HTTP proxy

All that is fine and good. But HTTP proxies were really only defined to handle HTTP, and to some extent HTTPS. When doing plain HTTP transfers over a proxy, the client will send its request to the proxy like this:

GET http://curl.haxx.se/ HTTP/1.1
Host: curl.haxx.se
Accept: */*
User-Agent: curl/7.55.0

… but for HTTPS, which should provide end to end encryption, a client needs to ask the proxy to instead tunnel through the proxy so that it can do TLS all the way, without any middle man, to the server:

CONNECT curl.haxx.se:443 HTTP/1.1
Host: curl.haxx.se:443
User-Agent: curl/7.55.0

When successful, the proxy responds with a “200” which means that the proxy has established a TCP connection to the remote server the client asked it to connect to, and the client can then proceed and do the TLS handshake with that server. When the TLS handshake is completed, a regular GET request is then sent over that established and secure TLS “tunnel” to the server. A GET request that then looks like one that is sent without proxy:

GET / HTTP/1.1
Host: curl.haxx.se
User-Agent: curl/7.55.0
Accept: */*

FTP over HTTP proxy

Things get more complicated when trying to perform transfers over the HTTP proxy using schemes that aren’t HTTP. As already described above, HTTP proxies are basically designed only for doing HTTP over them, but as they have this concept of tunneling through to the remote server it doesn’t have to be limited to just HTTP.

Also, historically, for decades people have deployed HTTP proxies that recognize FTP URLs, and transparently handle them for the client so the client can almost believe it is HTTP while the proxy has to speak FTP to the remote server in the other end and convert it back to HTTP to the client. On such proxies (Squid and Apache both support this mode for example), this sort of request is possible:

GET ftp://ftp.funet.fi/ HTTP/1.1
Host: ftp.funet.fi
User-Agent: curl/7.55.0
Accept: */*

curl knows this and if you ask curl for FTP over an HTTP proxy, it will assume you have one of these proxies. It should be noted that this method of course limits what you can do FTP-wise and for example FTP upload is usually not working and if you ask curl to do FTP upload over and HTTP proxy it will do that with a HTTP PUT.

HTTP proxy tunnel

curl features an option (–proxytunnel) that lets the user forcible tell the client to not assume that the proxy speaks this protocol and instead use the CONNECT method with establishing a tunnel through the proxy to the remote server.

It should of course be noted that very few deployed HTTP proxies in the wild allow clients to CONNECT to whatever port they like. HTTP proxies tend to only allow connecting to port 443 as that is the official HTTPS port, and if you ask for another port it will respond back with a 4xx response code refusing to comply.

Not HTTP not FTP over HTTP proxy

So HTTP, HTTPS and FTP are sent over the HTTP proxy fine. That leaves us with nineteen more protocols. What happens with them when you ask curl to perform them over a HTTP proxy?

Now we have finally reached the change that has just been merged in curl and changes what curl does.

Before 7.55.0

curl would send all protocols as a regular GET to the proxy if asked to use a HTTP proxy without seeing the explicit proxy-tunnel option. This came from how FTP was done and grew from there without many people questioning it. Of course it wouldn’t ever work, but also very few people would actually attempt it because of that.

From 7.55.0

All protocols that aren’t HTTP, HTTPS or FTP will enable the tunnel-through mode automatically when a HTTP proxy is used. No more sending funny GET requests to proxies when they won’t work anyway. Also, it will prevent users from accidentally leak credentials to proxies that were intended for the server, which previously could happen if you omitted the tunnel option with a few authentication setups.

HTTP/2 proxy

Sorry, curl doesn’t support that yet. Patches welcome!

target-independent libcurl headers

We write libcurl to be very portable. It can be built and run on virtually every operating system with an CPU architecture that is at least 32 bit, from some of the most legacy Unixes from the early 90s to the most recent updates to all the popular systems, including the widespread mobile platforms.

Type sizes on different architectures

In the early 2000s we added support to libcurl for “large files” (back in the days when that support wasn’t always present in your operating systems) and large variable types (beyond 32 bits) to work for applications and libcurl alike, and to work the same way for libcurl-using applications independently on which platform you’d compile the code on.

We started out using compiler/system defines to figure out for example the size of the native “off_t” type to know if it was 32 bit or 64 bit. That turned out to be problematic as users accidentally ended up in situations where the library considered a type to be one size and the application considered it to be another, leading to unexpected behaviors at best or downright crashes and misery.

Determine at lib build-time

The fix to that run-time size-of-variables confusion was to generate a fixed “outcome” at build-time that would then be used by both the library and applications so that they could never again disagree on this. The obvious downside here was that we had to generate this target specific information into public headers for the library (known as curl/curlbuild.h). We didn’t like doing it this way, but this approach was a better situation than before as it caused less headaches for users.

Now we instead created problems for system packagers who wanted to provide a set of curl headers and allow users to build for example either a 32 bit build or a 64 bit build of their application – so they had to generate two sets of curl headers. Or having the headers on a shared file system to be used by many different systems. Inconvenient. But as this solution didn’t hurt too many people, was a cumbersome problem to fix and yet possible to work around, it remained in the curl project since August 7, 2008 (commit 14240e9e109fe6af1).

Determine at app build-time

In March 2017 (commit 9506d01ee5) we introduced a new take on this problem. A new header that checks systems defines and determines all the necessary information at the time the application is compiled instead of at the time libcurl is compiled. We call it curl/system.h.

The goal was to replace the generated curlbuild.h header, but since it would cause serious problems if this new header would get any different results (like variable type sizes) than the old header, it was a risky move. We needed extra seat-belts for this.

We therefor added the new header next to the old header in parallel, and introduced a test case in the curl test suite that verifies the output from the two systems and make sure that they agree, and had them present in the curl source tree, coexisting. The curl/system.h file of course without being used for anything real, but tested by everyone who runs the test suite – to make sure it isn’t awful.

We think the new header file has now proven itself worthy. We have not gotten any recent reports on problems with test 1541. It is time to cut out the old header system and launch the new!

Starting in release curl 7.55.0, due to be released on August 9, 2017, the header files will finally again be truly platform agnostic. It took us nine years but we finally did it! The bulk of the change is made in this commit.

Just another detail in the machinery.

curl survey 2017 – analysis

The results are in. The curl user survey 2017 was up for a little over two weeks and attracted answers from a total of 513 individuals. This was a much better turnout that last year’s disappointment – thank you everyone!

The 2017 survey analysis as a 40 page PDF

This year we learned that the distribution curve for the amount of protocols people use curl for looks like this:

And the interest in getting even more protocols supported is still high, if not even very high and I think the top-most requested protocol is a bit surprising:

The outcome of the survey is the analysis document in which I’ve summarized by thoughts and added a bunch of graphs and other diagrams that illustrate the numbers. In particular compared to previous’ years results. It became a 40 page thing as I’ve tried to be detailed and also somewhat elaborate on commenting and reacting to a lot of the write-in suggestions and comments!

If you want to draw your own conclusions or just verify mine, I also offer you the following source material:

  1. The pristine 2017 CSV file as downloaded from Google, with all the results from the survey.
  2. To compare with last year, I also offer you the 2016 CSV file.
  3. During my the work of producing the analysis document, I imported the 2017 CSV file into libreoffice and fiddled with a lot of numbers and graphs, most of that didn’t end up in the document but you can find the raw 2017 survey libreoffice calc file and verify the outcomes or the formulas used.

Number of semicolon separated words in libreoffice calc

=LEN(TRIM(A1))-LEN(SUBSTITUTE(TRIM(A1),";",""))+1

The length of the entire A1 field (white space trimmed) minus the length of the A1 field with all the semicolons removed, plus one.

A string in A1 that looks like “A;B;C” will give the number 3.

I had to figure this out and I thought I’d store it here for later retrieval. Maybe someone else will enjoy it as well. The same formula works in Excel too I think.

curl: 5000 stars

The curl project has been hosted on github since March 2010, and we just now surpassed 5000 stars. A star on github of course being a completely meaningless measurement for anything. But it does show that at least 5000 individuals have visited the page and shown appreciation this low impact way.

5000 is not a lot compared to the really popular projects.

On August 12, 2014 I celebrated us passing 1000 stars with a beer. Hopefully it won’t be another seven years until I can get my 10000 stars beer!