Category Archives: cURL and libcurl

curl and/or libcurl related

curlup 2017: curl now

At curlup 2017 in Nuremberg, I did a keynote and talked a little about the road to what we are and where we are right now in the curl project. There will hopefully be a recording of this presentation made available soon, but I wanted to entertain you all by also presenting some of the graphs from that presentation in a blog format for easy access and to share the information.

Some stats and numbers from the curl project early 2017. Unless otherwise mentioned, this is based on the availability of data that we have. The git repository has data from December 1999 and we have detailed release information since version 6.0 (September 13, 1999).

Web traffic

First out, web site traffic to curl.haxx.se over the seven last full years that I have stats for. The switch to a HTTPS-only site happened in February 2016. The main explanation to the decrease in spent bandwidth in 2016 is us removing the HTML and PDF versions of all documentation from the release tarballs (October 2016).

My log analyze software also tries to identify “human” traffic so this graph should not include the very large amount of bots and automation that hits our site. In total we serve almost twice the amount of data to “bots” than to human. A large share of those download the cacert.pem file we host.

Since our switch to HTTPS we have a 301 redirect from the HTTP site, and we still suffer from a large number of user-agents hitting us over and over without seemingly following said redirect…

Number of lines in git

Since we also have documentation and related things this isn’t only lines of code. Plain and simply: lines added to files that we have in git, and how the number have increased over time.

There’s one notable dip and one climb and I think they both are related to how we have rearranged documentation and documentation formatting.

Top-4 author’s share

This could also talk about how seriously we suffer from “the bus factor” in this project. Look at how large share of all commits that the top-4 commiters have authored. Not committed; authored. Of course we didn’t have proper separation between authors and committers before git (March 2010).

Interesting to note here is also that the author listed second here is Yang Tse, who hasn’t authored anything since August 2013. Me personally seem to have plateaued at around 57% of all commits during the recent year or two and the top-4 share is slowly decreasing but is still over 80% of the commits.

I hope we can get the top-4 share well below 80% if I rerun this script next year!

Number of authors over time

In comparison to the above graph, I did one that simply counted the total number of unique authors that have contributed a change to git and look at how that number changes over time.

The time before git is, again, somewhat of a lie since we didn’t keep track of authors vs committers properly then so we shouldn’t put too much value into that significant knee we can see on the graph.

To me, the main take away is that in spite of the top-4 graph above, this authors-over-time line is interestingly linear and shows that the vast majority of people who contribute patches only send in one or maybe a couple of changes and then never appear again in the project.

My hope is that this line will continue to climb over the coming years.

Commits per release

We started doing proper git tags for release for curl 6.5. So how many commits have we done between releases ever since? It seems to have gone up and down over time and I added an average number line in this graph which is at about 150 commits per release (and remember that we attempt to do them every 8 weeks since a few years back).

Towards the right we can see the last 20 releases or so showing a pattern of high bar, low bar, and I’ll get to that more in a coming graph.

Of course, counting commits is a rough measurement as they can be big or small, easy or hard, good or bad and this only counts them.

Commits per day

As the release frequency has varied a bit over time I figured I should just check and see how many commits we do in the project per day and see how that has changed (or not) over time. Remember, we are increasing the number of unique authors fairly fast but the top-4 share of “authorship” is fairly stable.

Turns our the number of commits per day has gone up and down a little bit through the git history but I can’t spot any obvious trend here. In recent years we seem to keep up more than 2 commits per day and during intense periods up to 8.

Days per release

Our general plan is since bunch of years back to do releases every 8 weeks like a clock work. 8 weeks is 56 days.

When we run into serious problems, like bugs that are really annoying or tedious to users or if we get a really serious security problem reported, we sometimes decide to go outside of the regular release schedule and ship something before the end of the 8-week cycle.

This graph clearly shows that over the last, say 20, releases we clearly have felt ourselves “forced” to do follow-up releases outside of the regular schedule. The right end of the graph shows a very clear saw-tooth look that proves this.

We’ve also discussed this slightly on the mailing list recently, and I’m certainly willing to go back and listen to people as to what we can do to improve this situation.

Bugfixes per release

We keep close track of all bugfixes done in git and mark them up and mention them in the RELEASE-NOTES document that we ship in every new release.

This makes it possible for us to go back and see how many bug fixes we’ve recorded for each release since curl 6.5. This shows a clear growth over time. It’s interesting since we don’t see this when we count commits, so it may just be attributed to having gotten better at recording the bugs in the files. Or that we now spend fewer commits per bug fix. Hard to tell exactly, but I do enjoy that we fix a lot of bugs…

Days spent per bugfix

Another way to see the data above is to count the number of bug fixes we do over time and just see how many days we need on average to fix bugs.

The last few years we do more bug fixes than there are days so if we keep up the trend this shows for 2017 we might be able to reach down to 0.5 days per bug fix on average. That’d be cool!

Coverity scans

We run coverity scans on the curl cover regularly and this service keeps a little graph for us showing the number of found defects over time. These days we have a policy of never allowing a defect detected by Coverity to linger around. We fix them all and we should have zero detected defects at all times.

The second graph here shows a comparison line with “other projects of comparable size”, indicating that we’re at least not doing badly here.

Vulnerability reports

So in spite of our grand intentions and track record shown above, people keep finding security problems in curl in a higher frequency than every before.

Out of the 24 vulnerabilities reported to the curl project in 2016, 7 was the result of the special security audit that we explicitly asked for, but even if we hadn’t asked for that and they would’ve remained unknown, 17 would still have stood out in this graph.

I do however think that finding – and reporting – security problem is generally more good than bad. The problems these reports have found have generally been around for many years already so this is not a sign of us getting more sloppy in recent years, I take it as a sign that people look for these problems better and report them more often, than before. The industry as a whole looks on security problems and the importance of them differently now than it did years ago.

curl up 2017, the venue

The fist ever physical curl meeting took place this last weekend before curl’s 19th birthday. Today curl turns nineteen years old.

After much work behind the scenes to set this up and arrange everything (and thanks to our awesome sponsors to contributed to this), over twenty eager curl hackers and friends from a handful of countries gathered in a somewhat rough-looking building at curl://up 2017 in Nuremberg, March 18-19 2017.

The venue was in this old factory-like facility but we put up some fancy signs so that people would find it:

Yes, continue around the corner and you’ll find the entrance door for us:

I know, who’d guessed that we would’ve splashed out on this fancy conference center, right? This is the entrance door. Enter and look for the next sign.

Yes, move in here through this door to the right.

And now, up these stairs…

When you’ve come that far, this is basically the view you could experience (before anyone entered the room):

And when Igor Chubin presents about wttr,in and using curl to do console based applications, it looked like this:

It may sound a bit lame to you, but I doubt this would’ve happened at all and it certainly would’ve been less good without our great sponsors who helped us by chipping in what we didn’t want to charge our visitors.

Thank you very much Kippdata, Ergon, Sevenval and Haxx for backing us!

19 years ago

19 years ago on this day I released the first ever version of a software project I decided to name curl. Just a little hobby you know. Nothing fancy.

19 years ago that was a few hundred lines of code. Today we’re at around 150.000 lines.

19 years ago that was mostly my thing and I sent it out hoping that *someone* would like it and find good use. Today virtually every modern internet-connected device in the world run my code. Every car, every TV, every mobile phone.

19 years ago was a different age not only to me as I had no kids nor house back then, but the entire Internet and world has changed significantly since.

19 years ago we’d had a handful of persons sending back bug reports and a few patches. Today we have over 1500 persons having helped out and we’re adding people to that list at a rapid pace.

19 years ago I would not have imagined that someone can actually stick around in a project like this for this long time and still find it so amazingly fun and interesting still.

19 years ago I hadn’t exactly established my “daily routine” of spare time development already but I was close and for the larger part of this period I have spent a few hours every day. All days really. Working on curl and related stuff. 19 years of a few hours every day equals a whole lot of time

I took us 19 years minus two days to have our first ever physical curl meeting, or conference if you will.

Some curl numbers

We released the 163rd curl release ever today. curl 7.53.0 – approaching 19 years since the first curl release (6914 days to be exact).

It took 61 days since the previous release, during which 47 individuals helped us fix 95 separate bugs. 25 of these contributors were newcomers. In total, we now count more than 1500 individuals credited for their help in the project.

One of those bug-fixes, one was a security vulnerability, upping our total number of vulnerabilities through the years to 62.

Since the previous release, 7.52.1, 155 commits were made to the source repository.

The next curl release, our 164th, is planned to ship in exactly 8 weeks.

One URL standard please

Following up on the problem with our current lack of a universal URL standard that I blogged about in May 2016: My URL isn’t your URL. I want a single, unified URL standard that we would all stand behind, support and adhere to.

What triggers me this time, is yet another issue. A friendly curl user sent me this URL:

http://user@example.com:80@daniel.haxx.se

… and pasting this URL into different tools and browsers show that there’s not a wide agreement on how this should work. Is the URL legal in the first place and if so, which host should a client contact?

  • curl treats the ‘@’-character as a separator between userinfo and host name so ‘example.com’ becomes the host name, the port number is 80 followed by rubbish that curl ignores. (wget2, the next-gen wget that’s in development works identically)
  • wget extracts the example.com host name but rejects the port number due to the rubbish after the zero.
  • Edge and Safari say the URL is invalid and don’t go anywhere
  • Firefox and Chrome allow ‘@’ as part of the userinfo, take the ’80’ as a password and the host name then becomes ‘daniel.haxx.se’

The only somewhat modern “spec” for URLs is the WHATWG URL specification. The other major, but now somewhat aged, URL spec is RFC 3986, made by the IETF and published in 2005.

In 2015, URL problem statement and directions was published as an Internet-draft by Masinter and Ruby and it brings up most of the current URL spec problems. Some of them are also discussed in Ruby’s WHATWG URL vs IETF URI post from 2014.

What I would like to see happen…

Which group? A group!

Friends I know in the WHATWG suggest that I should dig in there and help them improve their spec. That would be a good idea if fixing the WHATWG spec would be the ultimate goal. I don’t think it is enough.

The WHATWG is highly browser focused and my interactions with members of that group that I have had in the past, have shown that there is little sympathy there for non-browsers who want to deal with URLs and there is even less sympathy or interest for URL schemes that the popular browsers don’t even support or care about. URLs cover much more than HTTP(S).

I have the feeling that WHATWG people would not like this work to be done within the IETF and vice versa. Since I’d like buy-in from both camps, and any other camps that might have an interest in URLs, this would need to be handled somehow.

It would also be great to get other major URL “consumers” on board, like authors of popular URL parsing libraries, tools and components.

Such a URL group would of course have to agree on the goal and how to get there, but I’ll still provide some additional things I want to see.

Update: I want to emphasize that I do not consider the WHATWG’s job bad, wrong or lost. I think they’ve done a great job at unifying browsers’ treatment of URLs. I don’t mean to belittle that. I just know that this group is only a small subset of the people who probably should be involved in a unified URL standard.

A single fixed spec

I can’t see any compelling reasons why a URL specification couldn’t reach a stable state and get published as *the* URL standard. The “living standard” approach may be fine for certain things (and in particular browsers that update every six weeks), but URLs are supposed to be long-lived and inter-operate far into the future so they really really should not change. Therefore, I think the IETF documentation model could work well for this.

The WHATWG spec documents what browsers do, and browsers do what is documented. At least that’s the theory I’ve been told, and it causes a spinning and never-ending loop that goes against my wish.

Document the format

The WHATWG specification is written in a pseudo code style, describing how a parser would “walk” over the string with a state machine and all. I know some people like that, I find it utterly annoying and really hard to figure out what’s allowed or not. I much more prefer the regular RFC style of describing protocol syntax.

IDNA

Can we please just say that host names in URLs should be handled according to IDNA2008 (RFC 5895)? WHATWG URL doesn’t state any IDNA spec number at all.

Move out irrelevant sections

“Irrelevant” when it comes to documenting the URL format that is. The WHATWG details several things that are related to URL for browsers but are mostly irrelevant to other URL consumers or producers. Like section “5. application/x-www-form-urlencoded” and “6. API”.

They would be better placed in a “URL considerations for browsers” companion document.

Working doesn’t imply sensible

So browsers accept URLs written with thousands of forward slashes instead of two. That is not a good reason for the spec to say that a URL may legitimately contain a thousand slashes. I’m totally convinced there’s no critical content anywhere using such formatted URLs and no soul will be sad if we’d restricted the number to a single-digit. So we should. And yeah, then browsers should reject URLs using more.

The slashes are only an example. The browsers have used a “liberal in what you accept” policy for a lot of things since forever, but we must resist to use that as a basis when nailing down a standard.

The odds of this happening soon?

I know there are individuals interested in seeing the URL situation getting worked on. We’ve seen articles and internet-drafts posted on the issue several times the last few years. Any year now I think we will see some movement for real trying to fix this. I hope I will manage to participate and contribute a little from my end.

My talks at FOSDEM 2017

I couldn’t even recall how many times I’ve done this already, but in 2017 I am once again showing up in the cold and grey city called Brussels and the lovely FOSDEM conference, to talk. (Yes, it is cold and grey every February, trust me.) So I had to go back and count, and it turns out 2017 will become my 8th straight visit to FOSDEM and I believe it is the 5th year I’ll present there.First, a reminder about what I talked about at FOSDEM 2016: An HTTP/2 update. There’s also a (rather low quality) video recording of the talk to see there.

I’m scheduled for two presentations in 2017, and this year I’m breaking new ground for myself as I’m doing one of them on the “main track” which is the (according to me) most prestigious track held in one of the biggest rooms – seating more than 1,400 persons.

You know what’s cool? Running on billions of devices

Room: Janson, time: Saturday 14:00

Thousands of contributors help building the curl software which runs on several billions of devices and are affecting every human in the connected world daily. How this came to happen, who contributes and how Daniel at the wheel keeps it all together. How a hacking ring is actually behind it all and who funds this entire operation.

So that was HTTP/2, what’s next?

Room: UD2.218A, time: Saturday 16:30

A shorter recap on what HTTP/2 brought that HTTP/1 couldn’t offer before we dig in and look at some numbers that show how HTTP/2 has improved (browser) networking and the web experience for people.

Still, there are scenarios where HTTP/1’s multiple connections win over HTTP/2 in performance tests. Why is that and what is being done about it? Wasn’t HTTP/2 supposed to be the silver bullet?

A closer look at QUIC, its promises to fix the areas where HTTP/2 didn’t deliver and a check on where it is today. Is QUIC perhaps actually HTTP/3 in everything but the name?

Depending on what exactly happens in this area over time until FOSDEM, I will spice it up with more details on how we work on these protocol things in Mozilla/Firefox.

This will become my 3rd year in a row that I talk in the Mozilla devroom to present the state of the HTTP protocol and web transport.

6 hours of bliss

I sent out the release announcement for curl 7.52.0 exactly 07:59 in the morning of December 21, 2016. A Wednesday. We typically  release curl on Wednesdays out of old habit. It is a good release day.

curl 7.52.0 was just as any other release. Perhaps with a slightly larger set of new features than what’s typical for us. We introduce TLS 1.3 support, we now provide HTTPS-proxy support and the command line tool has this option called –fail-early that I think users will start to appreciate once they start to discover it. We also  announced three fixed security vulnerabilities. And some other good things.

I pushed the code to git, signed and uploaded the tarballs, I updated the info on the web site and I sent off that release announcement email and I felt good. Release-time good. That short feeling of relief and starting over on a new slate that I often experience these release days. Release days make me happy.

Any bets?

It is not unusual for someone to find a bug really fast after a release has shipped. As I was feeling good, I had to joke in the #curl IRC channel (42 minutes after that email):

08:41 <bagder> any bets on when the first bug report on the new release shows up? =)

Hours passed and maybe, just maybe there was not going to be any quick bugs filed on this release?

But of course. I wouldn’t write this blog post if it all had been nice and dandy. At 14:03, I got the email. 6 hours and 4 minutes since I wrote the 7.52.0 announcement email.

The email was addressed to the curl project security email list and included a very short patch and explanation how the existing code is wrong and needs “this fix” to work correctly. And it was entirely correct!

Now I didn’t feel that sense of happiness anymore. For some reason it was now completely gone and instead I felt something that involved sensations like rage, embarrassment and general tiredness. How the [beep] could this slip through like this?

I’ve done releases in the past that were broken to various extents but this is a sort of a new record and an unprecedented event. Enough time had passed that I couldn’t just yank the package from the download page either. I had to take it through the correct procedures.

What happened?

As part of a general code cleanup during this last development round, I changed all the internals to use a proper internal API to get random data and if libcurl is built with a TLS library it uses its provided API to get secure and safe random data. As a move to improve our use of random internally. We use this internal API for getting the nonce in authentication mechanisms such as Digest and NTLM and also for generating the boundary string in HTTP multipart formposts and more. (It is not used for any TLS or SSH level protocol stuff though.)

I did the largest part of the random overhaul of this in commit f682156a4f, just a little over a month ago.

Of course I made sure that all test cases kept working and there were no valgrind reports or anything, the code didn’t cause any compiler warnings. It did not generate any reports in the many clang-analyzer or Coverity static code analyzer runs we’ve done since. We run clang-analyzer daily and Coverity perhaps weekly.

But there’s a valgrind report just here!

Kamil Dudka, who sent the 14:03 email, got a valgrind error and that’s what set him off – but how come he got that and I didn’t?

The explanation consists of the following two conditions that together worked to hide the problem for us quite successfully:

  1. I (and I suppose several of the other curl hackers) usually build curl and libcurl “debug enabled”. This allows me to run more tests, do more diagnostics and debug it easier when I run into problems. It also provides a system with “fake random” so that we can actually verify that functions that otherwise use real random values generate the correct output when given a known random value… and yeah, this debug system prevented valgrind from detecting any problem!
  2. In the curl test suite we once had a problem with valgrind generating reports on third party libraries etc which then ended up as false positives. We then introduced a “valgrind report parser” that would detect if the report concerns curl or something else. It turns out this parser doesn’t detect the errors if curl is compiled without the cc’s -g command line option. And of course… curl and libcurl both build without -g by default!

The patch?

The vulnerable function basically uses this simple prototype. It is meant to get an “int” worth of random value stored in the buffer ‘rnd’ points to. That’s 4 bytes.

randit(struct Curl_easy *data, unsigned int *rnd)

But due to circumstances I can’t explain on anything other than my sloppy programming, I managed to write the function store random value in the actual pointer instead of the buffer it points to. So when the function returns, there’s nothing stored in the buffer. No 4 bytes of random. Just the uninitialized value of whatever happened to be there, on the stack.

The patch that fixes this problem looks like this (with some names shortened to simplify but keep the idea):

- res = random(data, (char *)&rnd, sizeof(rnd));
+ res = random(data, (char *)rnd, sizeof(*rnd));

So yeah. I introduced this security flaw in 7.52.0. We had it fixed in 7.52.1, released roughly 48 hours later.

(I really do not need comments on what other languages that wouldn’t have allowed this mistake or otherwise would’ve brought us world peace a long time ago.)

Make it not happen again

The primary way to make this same mistake not happen again easily, is that I’m removing the valgrind report parsing function from the test suite and we will now instead assume that valgrind reports will be legitimate and if not, work on suppressing the false positives in a better way.

References

This flaw is officially known as CVE-2016-9594

The real commit that fixed this problem is here, or as stand-alone patch.

The full security advisory for this flaw is here: https://curl.haxx.se/docs/adv_20161223.html

Facepalm photo by Alex E. Proimos.

xkcd: 221

 

curl man page disentangled

The nroff formatted source file to the man page for the curl command line tool was some 110K and consisted of more than 2500 lines by the time this overhaul, or disentanglement if you will, started. At the moment of me writing this, the curl version in git right now, supports 204 command line options.

Working with such a behemoth of a document has gotten a bit daunting to people and the nroff formatting itself is quirky and esoteric. For some time I’ve also been interested in creating some sort of system that would allow us to generate a single web page for each individual command line option. And then possibly allow for expanded descriptions in those single page versions.

To avoid having duplicated info, I decided to create a new system in which we can document each individual command line option in a separate file and from that collection of hundreds of files we can generate the big man page, we can generate the “curl –help” output and we can create all those separate pages suitable for use to render web pages. And we can automate some of the nroff syntax to make it less error-prone and cause less sore eyes for the document editors!

With this system we also get a unified handling of things added in certain curl versions, affecting only specific protocols or dealing with references like “see also” mentions. It gives us a whole lot of meta-data for the command line options if you will and this will allow us to do more fun things going forward I’m sure.

You’ll find the the new format documented, and you can check out the existing files to get a quick glimpse on how it works. As an example, look at the –resolve documentation source.

Today I generated the first full curl.1 replacement and pushed to git, but eventually that file will be removed from git and instead generated at build time by the regular build system. No need to commit a generated file in the long term.

2nd best in Sweden

“Probably the only person in the whole of Sweden whose code is used by all people in the world using a computer / smartphone / ATM / etc … every day. His contribution to the world is so large that it is impossible to understand the breadth.

(translated motivation from the Swedish original page)

Thank you everyone who nominated me. I’m truly grateful, honored and humbled. You, my community, is what makes me keep doing what I do. I love you all!

To list “Sweden’s best developers” (the list and site is in Swedish) seems like a rather futile task, doesn’t it? Yet that’s something the Swedish IT and technology news site Techworld has been doing occasionally for the last several years. With two, three year intervals since 2008.

Everyone reading this will of course immediately start to ponder on what developers they speak of or how they define developers and how on earth do you judge who the best developers are? Or even who’s included in the delimiter “Sweden” – is that people living in Sweden, born in Sweden or working in Sweden?

I’m certainly not alone in having chuckled to these lists when they have been published in the past, as I’ve never seen anyone on the list be even close to my own niche or areas of interest. The lists have even worked a little as a long-standing joke in places.

It always felt as if the people on the lists were found on another planet than mine – mostly just Java and .NET people. and they very rarely appeared to be developers who actually spend their days surrounded by code and programming. I suppose I’ve now given away some clues to some characteristics I think “a developer” should posses…

This year, their fifth time doing this list, they changed the way they find candidates, opened up for external nominations and had a set of external advisors. This also resulted in me finding several friends on the list that were never on it in the past.

Tonight I got called onto the stage during the little award ceremony and I was handed this diploma and recognition for landing at second place in the best developer in Sweden list.

img_20161201_192510

And just to keep things safe for the future, this is how the listing looks on the Swedish list page:

2nd-best-developer-2016Yes I’m happy and proud and humbled. I don’t get this kind of recognition every day so I’ll take this opportunity and really enjoy it. And I’ll find a good spot for my diploma somewhere around the house.

I’ll keep a really big smile on my face for the rest of the day for sure!

best-dev-2016(Photo from the award ceremony by Emmy Jonsson/IDG)