curlup 2017: curl now

At curlup 2017 in Nuremberg, I did a keynote and talked a little about the road to what we are and where we are right now in the curl project. There will hopefully be a recording of this presentation made available soon, but I wanted to entertain you all by also presenting some of the graphs from that presentation in a blog format for easy access and to share the information.

Some stats and numbers from the curl project early 2017. Unless otherwise mentioned, this is based on the availability of data that we have. The git repository has data from December 1999 and we have detailed release information since version 6.0 (September 13, 1999).

Web traffic

First out, web site traffic to curl.haxx.se over the seven last full years that I have stats for. The switch to a HTTPS-only site happened in February 2016. The main explanation to the decrease in spent bandwidth in 2016 is us removing the HTML and PDF versions of all documentation from the release tarballs (October 2016).

My log analyze software also tries to identify “human” traffic so this graph should not include the very large amount of bots and automation that hits our site. In total we serve almost twice the amount of data to “bots” than to human. A large share of those download the cacert.pem file we host.

Since our switch to HTTPS we have a 301 redirect from the HTTP site, and we still suffer from a large number of user-agents hitting us over and over without seemingly following said redirect…

Number of lines in git

Since we also have documentation and related things this isn’t only lines of code. Plain and simply: lines added to files that we have in git, and how the number have increased over time.

There’s one notable dip and one climb and I think they both are related to how we have rearranged documentation and documentation formatting.

Top-4 author’s share

This could also talk about how seriously we suffer from “the bus factor” in this project. Look at how large share of all commits that the top-4 commiters have authored. Not committed; authored. Of course we didn’t have proper separation between authors and committers before git (March 2010).

Interesting to note here is also that the author listed second here is Yang Tse, who hasn’t authored anything since August 2013. Me personally seem to have plateaued at around 57% of all commits during the recent year or two and the top-4 share is slowly decreasing but is still over 80% of the commits.

I hope we can get the top-4 share well below 80% if I rerun this script next year!

Number of authors over time

In comparison to the above graph, I did one that simply counted the total number of unique authors that have contributed a change to git and look at how that number changes over time.

The time before git is, again, somewhat of a lie since we didn’t keep track of authors vs committers properly then so we shouldn’t put too much value into that significant knee we can see on the graph.

To me, the main take away is that in spite of the top-4 graph above, this authors-over-time line is interestingly linear and shows that the vast majority of people who contribute patches only send in one or maybe a couple of changes and then never appear again in the project.

My hope is that this line will continue to climb over the coming years.

Commits per release

We started doing proper git tags for release for curl 6.5. So how many commits have we done between releases ever since? It seems to have gone up and down over time and I added an average number line in this graph which is at about 150 commits per release (and remember that we attempt to do them every 8 weeks since a few years back).

Towards the right we can see the last 20 releases or so showing a pattern of high bar, low bar, and I’ll get to that more in a coming graph.

Of course, counting commits is a rough measurement as they can be big or small, easy or hard, good or bad and this only counts them.

Commits per day

As the release frequency has varied a bit over time I figured I should just check and see how many commits we do in the project per day and see how that has changed (or not) over time. Remember, we are increasing the number of unique authors fairly fast but the top-4 share of “authorship” is fairly stable.

Turns our the number of commits per day has gone up and down a little bit through the git history but I can’t spot any obvious trend here. In recent years we seem to keep up more than 2 commits per day and during intense periods up to 8.

Days per release

Our general plan is since bunch of years back to do releases every 8 weeks like a clock work. 8 weeks is 56 days.

When we run into serious problems, like bugs that are really annoying or tedious to users or if we get a really serious security problem reported, we sometimes decide to go outside of the regular release schedule and ship something before the end of the 8-week cycle.

This graph clearly shows that over the last, say 20, releases we clearly have felt ourselves “forced” to do follow-up releases outside of the regular schedule. The right end of the graph shows a very clear saw-tooth look that proves this.

We’ve also discussed this slightly on the mailing list recently, and I’m certainly willing to go back and listen to people as to what we can do to improve this situation.

Bugfixes per release

We keep close track of all bugfixes done in git and mark them up and mention them in the RELEASE-NOTES document that we ship in every new release.

This makes it possible for us to go back and see how many bug fixes we’ve recorded for each release since curl 6.5. This shows a clear growth over time. It’s interesting since we don’t see this when we count commits, so it may just be attributed to having gotten better at recording the bugs in the files. Or that we now spend fewer commits per bug fix. Hard to tell exactly, but I do enjoy that we fix a lot of bugs…

Days spent per bugfix

Another way to see the data above is to count the number of bug fixes we do over time and just see how many days we need on average to fix bugs.

The last few years we do more bug fixes than there are days so if we keep up the trend this shows for 2017 we might be able to reach down to 0.5 days per bug fix on average. That’d be cool!

Coverity scans

We run coverity scans on the curl cover regularly and this service keeps a little graph for us showing the number of found defects over time. These days we have a policy of never allowing a defect detected by Coverity to linger around. We fix them all and we should have zero detected defects at all times.

The second graph here shows a comparison line with “other projects of comparable size”, indicating that we’re at least not doing badly here.

Vulnerability reports

So in spite of our grand intentions and track record shown above, people keep finding security problems in curl in a higher frequency than every before.

Out of the 24 vulnerabilities reported to the curl project in 2016, 7 was the result of the special security audit that we explicitly asked for, but even if we hadn’t asked for that and they would’ve remained unknown, 17 would still have stood out in this graph.

I do however think that finding – and reporting – security problem is generally more good than bad. The problems these reports have found have generally been around for many years already so this is not a sign of us getting more sloppy in recent years, I take it as a sign that people look for these problems better and report them more often, than before. The industry as a whole looks on security problems and the importance of them differently now than it did years ago.

curl up 2017, the venue

The fist ever physical curl meeting took place this last weekend before curl’s 19th birthday. Today curl turns nineteen years old.

After much work behind the scenes to set this up and arrange everything (and thanks to our awesome sponsors to contributed to this), over twenty eager curl hackers and friends from a handful of countries gathered in a somewhat rough-looking building at curl://up 2017 in Nuremberg, March 18-19 2017.

The venue was in this old factory-like facility but we put up some fancy signs so that people would find it:

Yes, continue around the corner and you’ll find the entrance door for us:

I know, who’d guessed that we would’ve splashed out on this fancy conference center, right? This is the entrance door. Enter and look for the next sign.

Yes, move in here through this door to the right.

And now, up these stairs…

When you’ve come that far, this is basically the view you could experience (before anyone entered the room):

And when Igor Chubin presents about wttr,in and using curl to do console based applications, it looked like this:

It may sound a bit lame to you, but I doubt this would’ve happened at all and it certainly would’ve been less good without our great sponsors who helped us by chipping in what we didn’t want to charge our visitors.

Thank you very much Kippdata, Ergon, Sevenval and Haxx for backing us!

19 years ago

19 years ago on this day I released the first ever version of a software project I decided to name curl. Just a little hobby you know. Nothing fancy.

19 years ago that was a few hundred lines of code. Today we’re at around 150.000 lines.

19 years ago that was mostly my thing and I sent it out hoping that *someone* would like it and find good use. Today virtually every modern internet-connected device in the world run my code. Every car, every TV, every mobile phone.

19 years ago was a different age not only to me as I had no kids nor house back then, but the entire Internet and world has changed significantly since.

19 years ago we’d had a handful of persons sending back bug reports and a few patches. Today we have over 1500 persons having helped out and we’re adding people to that list at a rapid pace.

19 years ago I would not have imagined that someone can actually stick around in a project like this for this long time and still find it so amazingly fun and interesting still.

19 years ago I hadn’t exactly established my “daily routine” of spare time development already but I was close and for the larger part of this period I have spent a few hours every day. All days really. Working on curl and related stuff. 19 years of a few hours every day equals a whole lot of time

I took us 19 years minus two days to have our first ever physical curl meeting, or conference if you will.

Some curl numbers

We released the 163rd curl release ever today. curl 7.53.0 – approaching 19 years since the first curl release (6914 days to be exact).

It took 61 days since the previous release, during which 47 individuals helped us fix 95 separate bugs. 25 of these contributors were newcomers. In total, we now count more than 1500 individuals credited for their help in the project.

One of those bug-fixes, one was a security vulnerability, upping our total number of vulnerabilities through the years to 62.

Since the previous release, 7.52.1, 155 commits were made to the source repository.

The next curl release, our 164th, is planned to ship in exactly 8 weeks.

New screen and new fuses

I got myself a new 27″ 4K screen to my work setup, a Dell P2715Q, and replaced one of my old trusty twenty-four inch friends with it.

I now work with the “Thinkpad 13″ on the left as my video conference machine (it does nothing else and it runs Windows!), the two mid screens are a 24″ and the new 27” and they are connected to my primary dev machine while the rightmost thing is my laptop for when I need to move.

Did everything run smoothly? Heck no.

When I first inserted the 4K screen without modifying anything else in the setup, it was immediately obvious that I really needed to upgrade my graphics card since it didn’t have muscles enough to drive the screen at 4K so the screen would then instead upscale a 1920×1200 image in a slightly blurry fashion. I couldn’t have that!

New graphics card

So when I was out and about later that day I more or less accidentally passed a Webhallen store, and I got myself a new card. I wanted to play it easy so I stayed with an AMD processor and went with ASUS Dual-Rx460-O2G. The key feature I wanted was to be able to drive one 4K screen and one at 1920×1200, and then I unfortunately had to give up on the ones with only passive cooling and I instead had to pick what sounds like a gaming card. (I hate shopping graphics cards.)As I was about to do surgery on the machine anyway. I checked and noticed that I could add more memory to the motherboard so I bought 16 more GB to a total of 32GB.

Blow some fuses

Later that night, when the house was quiet and dark I shut down my machine, inserted the new card, the new memory DIMMs and powered it back up again.

At least that was the plan. When I fired it back on, it said clock and my lamps around me all got dark and the machine didn’t light up at all. The fuse was blown! Man, wasn’t that totally unexpected?

I did some further research on what exactly caused the fuse to blow and blew a few more in the process, as I finally restored the former card and removed the memory DIMMs again and it still blew the fuse. Puzzled and slightly disappointed I went to bed when I had no more spare fuses.

I hate leaving the machine dead in parts on the floor with an uncertain future, but what could I do?

A new PSU

Tuesday morning I went to get myself a PSU replacement (Plexgear PS-600 Bronze), and once I had that installed no more fuses blew and I could start the machine again!

I put the new memory back in and I could get into the BIOS config with both screens working with the new card (and it detected 32GB ram just fine). But as soon as I tried to boot Linux, the boot process halted after just 3-4 seconds and seemingly just froze. Hm, I tested a few different kernels and safety mode etc but they all acted like that. Weird!

BIOS update

A little googling on the messages that appeared just before it froze gave me the idea that maybe I should see if there’s an update for my bios available. After all, I’ve never upgraded it and it was a while since I got my motherboard (more than 4 years).

I found a much updated bios image on ASUS support site, put it on a FAT-formatted USB-drive and I upgraded.

Now it booted. Of course the error messages I had googled for are still present, and I suppose they were there before too, I just hadn’t put any attention to them when everything was working dandy!

Displayport vs HDMI

I had the wrong idea that I should use the display port to get 4K working, but it just wouldn’t work. DP + DVI just showed up on one screen and I even went as far as trying to download some Ubuntu Linux driver package for Radeon RX460 that I found, but of course it failed miserably due to my Debian Unstable having a totally different kernel running and what not.

In a slightly desperate move (I had now wasted quite a few hours on this and my machine still wasn’t working), I put back the old graphics card – (with DVI + hdmi) only to note that it no longer works like it did (the DVI one didn’t find the correct resolution anymore). Presumably the BIOS upgrade or something shook the balance?

Back on the new card I booted with DVI + HDMI, leaving DP entirely, and now suddenly both screens worked!

HiDPI + LoDPI

Once I had logged in, I could configure the 4K screen to show at its full 3840×2160 resolution glory. I was back.

Now I only had to start fiddling with getting the two screens to somehow co-exist next to each other, which is a challenge in its own. The large difference in DPI makes it hard to have one config that works across both screens. Like I usually have terminals on both screens – which font size should I use? And I put browser windows on both screens…

So far I’ve settled with increasing the font DPI in KDE and I use two different terminal profiles depending on which screen I put the terminal on. Seems to work okayish. Some texts on the 4K screen are still terribly small, so I guess it is good that I still have good eye sight!

24 + 27

So is it comfortable to combine a 24″ with a 27″ ? Sure, the size difference really isn’t that notable. The 27 one is really just a few centimeters taller and the differences in width isn’t an inconvenience. The photo below shows how similar they look, size-wise:

Post FOSDEM 2017

I attended FOSDEM again in 2017 and it was as intense, chaotic and wonderful as ever. I met old friends, got new friends and I got to test a whole range of Belgian beers. Oh, and there was also a set of great open source related talks to enjoy!

On Saturday at 2pm I delivered my talk on curl in the main track in the almost frighteningly large room Janson. I estimate that it was almost half full, which would mean upwards 700 people in the audience. The talk itself went well. I got audible responses from the audience several times and I kept well within my given time with time over for questions. The trickiest problem was the audio from the people who asked questions because it wasn’t at all very easy to hear, while the audio is great for the audience and in the video recording. Slightly annoying because as everyone else heard, it made me appear half deaf. Oh well. I got great questions both then and from people approaching me after the talk. The questions and the feedback I get from a talk is really one of the things that makes me appreciate talking the most.

The video of the talk is available, and the slides can also be viewed.

So after I had spent some time discussing curl things and handing out many stickers after my talk, I managed to land in the cafeteria for a while until it was time for me to once again go and perform.

We’re usually a team of friends that hang out during FOSDEM and we all went over to the Mozilla room to be there perhaps 20 minutes before my talk was scheduled and wow, there was a huge crowd outside of that room already waiting by the time we arrived. When the doors then finally opened (about 10 minutes before my talk started), I had to zigzag my way through to get in, and there was a large amount of people who didn’t get in. None of my friends from the cafeteria made it in!

The Mozilla devroom had 363 seats, not a single one was unoccupied and there was people standing along the sides and the back wall. So, an estimated nearly 400 persons in that room saw me speak about HTTP/2 deployments numbers right now, how HTTP/2 doesn’t really work well under 2% packet loss situations and then a bit about how QUIC can solve some of that and what QUIC is and when we might see the first experiments coming with IETF-QUIC – which really isn’t the same as Google-QUIC was.

To be honest, it is hard to deliver a talk in twenty minutes and I  was only 30 seconds over my time. I got questions and after the talk I spent a long time talking with people about HTTP, HTTP/2, QUIC, curl and the future of Internet protocols and transports. Very interesting.

The video of my talk can be seen, and the slides are online too.

I’m not sure if I was just unusually unlucky in my choices, or if there really was more people this year, but I experienced that “FULL” sign more than usual this year.

I fully intend to return again next year. Who knows, maybe I’ll figure out something to talk about then too. See you there?

One URL standard please

Following up on the problem with our current lack of a universal URL standard that I blogged about in May 2016: My URL isn’t your URL. I want a single, unified URL standard that we would all stand behind, support and adhere to.

What triggers me this time, is yet another issue. A friendly curl user sent me this URL:

http://user@example.com:80@daniel.haxx.se

… and pasting this URL into different tools and browsers show that there’s not a wide agreement on how this should work. Is the URL legal in the first place and if so, which host should a client contact?

  • curl treats the ‘@’-character as a separator between userinfo and host name so ‘example.com’ becomes the host name, the port number is 80 followed by rubbish that curl ignores. (wget2, the next-gen wget that’s in development works identically)
  • wget extracts the example.com host name but rejects the port number due to the rubbish after the zero.
  • Edge and Safari say the URL is invalid and don’t go anywhere
  • Firefox and Chrome allow ‘@’ as part of the userinfo, take the ’80’ as a password and the host name then becomes ‘daniel.haxx.se’

The only somewhat modern “spec” for URLs is the WHATWG URL specification. The other major, but now somewhat aged, URL spec is RFC 3986, made by the IETF and published in 2005.

In 2015, URL problem statement and directions was published as an Internet-draft by Masinter and Ruby and it brings up most of the current URL spec problems. Some of them are also discussed in Ruby’s WHATWG URL vs IETF URI post from 2014.

What I would like to see happen…

Which group? A group!

Friends I know in the WHATWG suggest that I should dig in there and help them improve their spec. That would be a good idea if fixing the WHATWG spec would be the ultimate goal. I don’t think it is enough.

The WHATWG is highly browser focused and my interactions with members of that group that I have had in the past, have shown that there is little sympathy there for non-browsers who want to deal with URLs and there is even less sympathy or interest for URL schemes that the popular browsers don’t even support or care about. URLs cover much more than HTTP(S).

I have the feeling that WHATWG people would not like this work to be done within the IETF and vice versa. Since I’d like buy-in from both camps, and any other camps that might have an interest in URLs, this would need to be handled somehow.

It would also be great to get other major URL “consumers” on board, like authors of popular URL parsing libraries, tools and components.

Such a URL group would of course have to agree on the goal and how to get there, but I’ll still provide some additional things I want to see.

Update: I want to emphasize that I do not consider the WHATWG’s job bad, wrong or lost. I think they’ve done a great job at unifying browsers’ treatment of URLs. I don’t mean to belittle that. I just know that this group is only a small subset of the people who probably should be involved in a unified URL standard.

A single fixed spec

I can’t see any compelling reasons why a URL specification couldn’t reach a stable state and get published as *the* URL standard. The “living standard” approach may be fine for certain things (and in particular browsers that update every six weeks), but URLs are supposed to be long-lived and inter-operate far into the future so they really really should not change. Therefore, I think the IETF documentation model could work well for this.

The WHATWG spec documents what browsers do, and browsers do what is documented. At least that’s the theory I’ve been told, and it causes a spinning and never-ending loop that goes against my wish.

Document the format

The WHATWG specification is written in a pseudo code style, describing how a parser would “walk” over the string with a state machine and all. I know some people like that, I find it utterly annoying and really hard to figure out what’s allowed or not. I much more prefer the regular RFC style of describing protocol syntax.

IDNA

Can we please just say that host names in URLs should be handled according to IDNA2008 (RFC 5895)? WHATWG URL doesn’t state any IDNA spec number at all.

Move out irrelevant sections

“Irrelevant” when it comes to documenting the URL format that is. The WHATWG details several things that are related to URL for browsers but are mostly irrelevant to other URL consumers or producers. Like section “5. application/x-www-form-urlencoded” and “6. API”.

They would be better placed in a “URL considerations for browsers” companion document.

Working doesn’t imply sensible

So browsers accept URLs written with thousands of forward slashes instead of two. That is not a good reason for the spec to say that a URL may legitimately contain a thousand slashes. I’m totally convinced there’s no critical content anywhere using such formatted URLs and no soul will be sad if we’d restricted the number to a single-digit. So we should. And yeah, then browsers should reject URLs using more.

The slashes are only an example. The browsers have used a “liberal in what you accept” policy for a lot of things since forever, but we must resist to use that as a basis when nailing down a standard.

The odds of this happening soon?

I know there are individuals interested in seeing the URL situation getting worked on. We’ve seen articles and internet-drafts posted on the issue several times the last few years. Any year now I think we will see some movement for real trying to fix this. I hope I will manage to participate and contribute a little from my end.

QUIC is h2 over UDP

The third day of the QUIC interim passed and now that meeting has ended. It continued to work very well to attend from remote and the group manged to plow through an extensive set of issues. A lot of consensus was achieved and I personally now have a much better feel for the protocol and many of its details thanks to the many discussions.

The drafts are still a bit too early for us to start discussing inter-op for real. But there were mentions and hopes expressed that maybe maybe we might start to see some of that by mid 2017. When we did HTTP/2, we had about 10 different implementations by the time draft-04 was out. I suspect we will see a smaller set for QUIC simply because of it being much more complex.

The next interim is planned to occur in the beginning of June in Europe.

There is an official QUIC logo being designed, but it is not done yet so you still need to imagine one placed here.

QUIC needs HTTP/2 needs HTTP/1

QUIC is primarily designed to send and receive HTTP/2 frames and entire streams over UDP (not only, but this is where the bulk of the work has been put in so far). Sure, TLS encrypted and everything, but my point here is that it is being designed to transfer HTTP/2 frames. You remember how HTTP/2 is “just a new framing” layer that changes how HTTP is sent over the wire, but when “decoded” again in the receiving end it is in most important aspects still HTTP/1 there. You have to implement most of a HTTP/1 stack in order to support HTTP/2. Now QUIC adds another layer to that. QUIC is a new way to send HTTP/2 frames over the network.

A QUIC stack needs to handle most aspects of HTTP/2!

Of course, there are notable differences and changes to some underlying principles that makes QUIC a bit different. It isn’t exactly HTTP/2 over secure UDP. Let me give you a few examples…

Streams are more independent

Packets sent over the wire with UDP are independent from each other to a very large degree. In order to avoid Head-of-Line blocking (HoL), packets that are lost and re-transmitted will only block the particular streams to which the lost packets belong. The other streams can keep flowing, unaware and uncaring.

Thanks to the nature of the Internet and how packets are handled, it is not unusual for network packets to arrive in a slightly different order than they were sent, even when they aren’t exactly “lost”.

So, streams in HTTP/2 were entirely synced and the order the sender of frames use, will be the exact same order in which the frames arrive in the other end. Packet loss or not.

In QUIC, individual frames and entire streams may arrive in the receiver in a different order than what was used in the sender.

Stream ID gaps means open

When receiving a QUIC packet, there’s basically no way to know if there are packets missing that were intended to arrive but got lost and haven’t yet been re-transmitted.

If a frame is received that uses the new stream ID N (a stream not previously seen), the receiver is then forced to assume that all the other streams ID from our previously highest ID to N are all just missing and will arrive soon. They are then presumed to exist!

In HTTP/2, we could handle gaps in stream IDs much differently because of TCP. Then a gap is known to be deliberate.

Some h2 frames are done by QUIC

Since QUIC is designed with streams, flow control and more and is used to send HTTP/2 frames over them, some of the h2 frames aren’t needed but are instead handled by the transport layer within QUIC and won’t show up in the HTTP/2 layer.

HPACK goes QPACK?

HPACK is the header compression system used in HTTP/2. Among other things it features a dictionary that you manipulate with instructions and then subsequent header frames can refer to those dictionary indexes instead of sending the full header. Header frame one says “insert my user-agent string” and then header frame two can refer back to the index in the dictionary for where that identical user-agent string is stored.

Due to the out of order streams in QUIC, this dictionary treatment is harder. The second header frame could arrive before the first, so if it would refer to an index set in the first header frame, it would have to block the entire stream until that first header arrives.

HPACK also has a concept of just adding things to the dictionary without specifying the index, and since both sides are in perfect sync it works just fine. In QUIC, if we want to maintain the independence of streams and avoid blocking to the highest degree, we need to instead specify exact indexes to use and not assume perfect sync.

This (and more) are reasons why QPACK is being suggested as a replacement for HPACK when HTTP/2 header frames are sent over QUIC.

First QUIC interim – in Tokyo

The IETF working group QUIC has its first interim meeting in Tokyo Japan for three days. Day one is today, January 24th 2017.

As I’m not there physically, I attend the meeting from remote using the webex that’s been setup for this purpose, and I’ll drop in a little screenshot below from one of the discussions (click it for hires) to give you a feel for it. It shows the issue being discussed and the camera view of the room in Tokyo. I run the jabber client on a different computer which allows me to also chat with the other participants. It works really well, both audio and video are quite crisp and understandable.

Japan is eight hours ahead of me time zone wise, so this meeting  runs from 01:30 until 09:30 Central European Time. That’s less comfortable and it may cause me some troubles to attend the entire thing.

On QUIC

We started off at once with a lot of discussions on basic issues. Versioning and what a specific version actually means and entails. Error codes and how error codes should be used within QUIC and its different components. Should the transport level know about priorities or shouldn’t it? How is the security protocol decided?

Everyone who is following the QUIC issues on github knows that there are plenty of people with a lot of ideas and thoughts on these matters and this meeting shows this impression is real.

For a casual bystander, you might’ve been fooled into thinking that because Google already made and deployed QUIC, these issues should be if not already done and decided, at least fairly speedily gone over. But nope. I think there are plenty of indications already that the protocol outputs that will come in the end of this process, the IETF QUIC will differ from the Google QUIC in a fair number of places.

The plan is that the different QUIC drafts (there are at least 4 different planned RFCs as they’re divided right now) should all be “done” during 2018.

(At 4am, the room took lunch and I wrote this up.)

tech, open source and networking