A day in the curl project

September 29th, 2014 by Daniel Stenberg

cURLI maintain curl and lead the development there. This is how I spend my time an ordinary day in the project. Maybe I don’t do all of these things every single day, but sometimes I do and sometimes I just do a subset of them. I just want to give you a look into what I do and why I don’t add new stuff more often or faster… I spend about one to three hours on the project every day. Let me also stress that curl is a tiny little project in comparison with many other open source projects. I’m certainly not saying otherwise.

the new bug

Someone submits a new bug in the bug tracker or on one of the mailing lists. Most initial bug reports lack sufficient details so the first thing I do is ask for more info and possibly ask the submitter to try a recent version as very often we get bug reported on very old versions. Many bug reports take several demands for more info before the necessary details have been provided. I don’t really start to investigate a problem until I feel I have a sufficient amount of details. We’re a very small core team that acts on other people’s bugs.

the question by a newbie in the project

A new person shows up with a question. The question is usually similar to a FAQ entry or an example but not exactly. It deserves a proper response. This kind of question can often be answered by anyone, but also most people involved in the project don’t feel the need or “familiarity” to respond to such questions and therefore remain quiet.

the old mail I haven’t responded to yet

I want every serious email that reaches the mailing lists to get a response, so all mails that neither I nor anyone else responds to I keep around in my inbox and when I have idle time over I go back and catch up on old mails. Some of them can then of course result in a new bug or patch or whatever. Occasionally I have to resort to simply saving away the old mail without responding in order to catch up, just to cut the list of outstanding things to do a little.

the TODO list for my own sake, things I’d like to get working on

There are always things I really want to see done in the project, and I work on them far too little really. But every once in a while I ignore everything else in my life for a couple of hours and spend them on adding a new feature or fixing something I’ve been missing. Actual development of new features is a very small fraction of all time I spend on this project.

the list of open bug reports

I regularly revisit this list to see what I can do to push the open ones forward. Follow-up questions, deep dives into source code and specifications or just the sad realization that a particular issue won’t be fixed within the nearest time (year?) so that I close it as “future” and add the problem to our KNOWN_BUGS document. I strive to keep the bug list clean and only keep relevant bugs open. Those issues that are not reproducible, are left without the proper attention from the reporter or otherwise stall will get closed. In general I feel quite lonely as responder in the bug tracker…

the mailing list threads that are sort of dying but I do want some progress or feedback on

In my primary email inbox I usually keep ongoing threads around. Lots of discussions just silently stop getting more posts and thus slowly wither away further up the list to become forgotten and ignored. With some interval I go back to see if the posters are still around, if there’s any more feedback or whatever in order to figure out how to proceed with the subject. Very often this makes me get nothing at all back and instead I just save away the entire conversation thread, forget about it and move on.

the blog post I want to do about a recent change or fix I did I’d like to highlight

I try to explain some changes to the world in blog posts. Not all changes but the ones that are somehow noteworthy as they perhaps change the way things have been or introduce new fun features perhaps not that easily spotted. Of course all features are always documented etc, but sometimes I feel I need to put some extra attention on focus on things in a more free-form style. Or I just write about meta stuff, like this very posting.

the reviewing and merging of patches

One of the most important tasks I have is to review patches. I’m basically the only person in the project who volunteers to review patches against any angle or corner of the project. When people have spent time and effort and gallantly send the results of their labor our way in the best possible format (a patch!), the submitter deserves a good review and proper feedback. Also, paving the road for more patches is one of the best way to scale the project. Helping newcomers become productive is important.

Patches are preferably posted on the mailing lists but there’s also some coming in via pull requests on github and while I strongly discourage that (due to them not getting the same attention and possible scrutiny on the list like the others) I sometimes let them through anyway just to be smooth.

When the patch looks good (or sometimes good enough and I just edit some minor detail), I merge it.

the non-disclosed discussions about a potential security problem

We’re a small project with a wide reach and security problems can potentially have grave impact on users. We take security seriously, and we very often have at least one non-public discussion going on about a problem in curl that may have security implications. We then often work on phrasing security advisories, working down exactly which versions that are vulnerable, producing patches for at least the most recent ones of those affected versions and so on.

tame stackoverflow

stackoverflow.com has become almost like a wikipedia for source code and programming related issues (although it isn’t wiki), and that site is one of the primary referrers to curl’s web site these days. I tend to glance over the curl and libcurl related questions and offer my answers at times. If nothing else, it is good to help keeping the amount of disinformation at low levels.

I strongly disapprove of people filing bug reports on such places or even very detailed (lib)curl core questions that should’ve been asked on the curl-library list.

there are idle times too

Yeah. Not very often, but sometimes I actually just need a day off all this. Sometimes I just don’t find motivation or energy enough to dig into that terrible seldom-happening bug on a platform I’ve never seen personally. A project like this never ends. The same day we release a new release, we just reset our clocks and we’re back on improving curl, fixing bugs and cleaning up things for the next release. Forever and ever until the end of time.

keep-calm-and-improve-curl

Changing networks with Firefox running

September 26th, 2014 by Daniel Stenberg

Short recap: I work on network code for Mozilla. Bug 939318 is one of “mine” – yesterday I landed a fix (a patch series with 6 individual patches) for this and I wanted to explain what goodness that should (might?) come from this!

diffstat

diffstat reports this on the complete patch series:

29 files changed, 920 insertions(+), 162 deletions(-)

The change set can be seen in mozilla-central here. But I guess a proper description is easier for most…

The bouncy road to inclusion

This feature set and associated problems with it has been one of the most time consuming things I’ve developed in recent years, I mean in relation to the amount of actual code produced. I’ve had it “landed” in the mozilla-inbound tree five times and yanked out again before it landed correctly (within a few hours), every time of course reverted again because I had bugs remaining in there. The bugs in this have been really tricky with a whole bunch of timing-dependent and race-like problems and me being unfamiliar with a large part of the code base that I’m working on. It has been a highly frustrating journey during periods but I’d like to think that I’ve learned a lot about Firefox internals partly thanks to this resistance.

As I write this, it has not even been 24 hours since it got into m-c so there’s of course still a risk there’s an ugly bug or two left, but then I also hope to fix the pending problems without having to revert and re-apply the whole series…

Many ways to connect to networks

Firefox Nightly screenshotIn many network setups today, you get an environment and a network “experience” that is crafted for that particular place. For example you may connect to your work over a VPN where you get your company DNS and you can access sites and services you can’t even see when you connect from the wifi in your favorite coffee shop. The same thing goes for when you connect to that captive portal over wifi until you realize you used the wrong SSID and you switch over to the access point you were supposed to use.

For every one of these setups, you get different DHCP setups passed down and you get a new DNS server and so on.

These days laptop lids are getting closed (and the machine is put to sleep) at one place to be opened at a completely different location and rarely is the machine rebooted or the browser shut down.

Switching between networks

Switching from one of the networks to the next is of course something your operating system handles gracefully. You can even easily be connected to multiple ones simultaneously like if you have both an Ethernet card and wifi.

Enter browsers. Or in this case let’s be specific and talk about Firefox since this is what I work with and on. Firefox – like other browsers – will cache images, it will cache DNS responses, it maintains connections to sites a while even after use, it connects to some sites even before you “go there” and so on. All in the name of giving the users an as good and as fast experience as possible.

The combination of keeping things cached and alive, together with the fact that switching networks brings new perspectives and new “truths” offers challenges.

Realizing the situation is new

The changes are not at all mind-bending but are basically these three parts:

  1. Make sure that we detect network changes, even if just the set of available interfaces change. Send an event for this.
  2. Make sure the necessary parts of the code listens and understands this “network topology changed” event and acts on it accordingly
  3. Consider coming back from “sleep” to be a network changed event since we just cannot be sure of the network situation anymore.

The initial work has been made for Windows only but it allows us to smoothen out any rough edges before we continue and make more platforms support this.

The network changed event can be disabled by switching off the new “network.notify.changed” preference. If you do end up feeling a need for that, I really hope you file a bug explaining the details so that we can work on fixing it!

Act accordingly

So what is acting properly? What if the network changes in a way so that your active connections suddenly can’t be used anymore due to the new rules and routing and what not? We attack this problem like this: once we get a “network changed” event, we “allow” connections to prove that they are still alive and if not they’re torn down and re-setup when the user tries to reload or whatever. For plain old HTTP(S) this means just seeing if traffic arrives or can be sent off within N seconds, and for websockets, SPDY and HTTP2 connections it involves sending an actual ping frame and checking for a response.

The internal DNS cache was a bit tricky to handle. I initially just flushed all entries but that turned out nasty as I then also killed ongoing name resolves that caused errors to get returned. Now I instead added logic that flushes all the already resolved names and it makes names “in transit” to get resolved again so that they are done on the (potentially) new network that then can return different addresses for the same host name(s).

This should drastically reduce the situation that could happen before when Firefox would basically just freeze and not want to do any requests until you closed and restarted it. (Or waited long enough for other timeouts to trigger.)

The ‘N seconds’ waiting period above is actually 5 seconds by default and there’s a new preference called “network.http.network-changed.timeout” that can be altered at will to allow some experimentation regarding what the perfect interval truly is for you.

Firefox BallInitially on Windows only

My initial work has been limited to getting the changed event code done for the Windows back-end only (since the code that figures out if there’s news on the network setup is highly system specific), and now when this step has been taken the plan is to introduce the same back-end logic to the other platforms. The code that acts on the event is pretty much generic and is mostly in place already so it is now a matter of making sure the event can be generated everywhere.

My plan is to start on Firefox OS and then see if I can assist with the same thing in Firefox on Android. Then finally Linux and Mac.

I started on Windows since Windows is one of the platforms with the largest amount of Firefox users and thus one of the most prioritized ones.

More to do

There’s separate work going on for properly detecting captive portals. You know the annoying things hotels and airports for example tend to have to force you to do some login dance first before you are allowed to use the internet at that location. When such a captive portal is opened up, that should probably qualify as a network change – but it isn’t yet.

daniel.haxx.se week #3

September 22nd, 2014 by Daniel Stenberg

I won’t keep posting every video update here, but I mostly wanted to mention that I’ve kept posting a weekly video over at youtube basically explaining what’s going on right now within my dearest projects. Mostly curl and some Firefox stuff.

This week: libcurl server cert verification API got a bashing at SEC-T, is HTTP for UDP a good idea? How about adding HTTP cache support to libcurl? HTTP/2 is getting deployed as we speak. Interesting curl bug when used by XBMC. The patch series for Firefox bug 939318 is improving slowly – will it ever land?

Using APIs without reading docs

September 18th, 2014 by Daniel Stenberg

This morning, my debug session was interrupted for a brief moment when two friends independently of each other pinged me to inform me about a talk at the current SEC-T conference going on here in Stockholm right now. It was yet again time to bring up the good old fun called libcurl API bashing. Again from the angle that users who don’t read the API docs might end up using it wrong.

Updated: You can see Meredith Patterson’s talk here, and the libcurl parts start at 24:15.

The specific libcurl topic at hand once again mostly had the CURLOPT_VERIFYHOST option in focus, with basically is the same argument that was thrown at us two years ago when libcurl was said to be dangerous. It is not a boolean. It is an option that takes (or took) three different values, where 2 is the secure level and 0 is disabled.

SEC-T on curl API

(This picture is a screengrab from the live stream off youtube, I don’t have any link to a stored version of it yet. Click it for slightly higher resolution.)

Speaker Meredith L. Patterson actually spoke for quite a long time about curl and its options to verify server certificates. While I will agree that she has a few good points, it was still riddled with errors and I think she deliberately phrased things in a manner to make the talk good and snappy rather than to be factually correct and trying to understand why things are like they are.

The VERIFYHOST option apparently sounds as if it takes a boolean (accordingly), but it doesn’t. She says verifying a certificate has to be a Yes/No question so obviously it is a boolean. First, let’s be really technical: the libcurl options that take numerical values always accept a ‘long’ and all documentation specify which values you can pass in. None of them are boolean, not by actual type in the C language and not described like that in the man pages. There are however language bindings running on top of libcurl that may use booleans for the values that take 0 or 1, but there’s no guarantee we won’t add more values in a future to numerical options.

I wrote down a few quotes from her that I’d like to address.

“In order for it to do anything useful, the value actually has to be set to two”

I get it, she wants a fun presentation that makes the audience listen and grin cheerfully. But this is highly inaccurate. libcurl has it set to verify by default. An application doesn’t have to set it to anything. The only reason to set this value is if you’re not happy with checking the cert unconditionally, and then you’ve already wondered off the secure route.

“All it does when set to to two is to check that the common name in the cert matches the host name in the URL. That’s literally all it does.”

No, it’s not. It “only” verifies the host name curl connects to against the name hints in the server cert, yes, but that’s a lot more than just the common name field.

“there’s been 10 versions and they haven’t fixed this yet [...] the docs still say they’re gonna fix this eventually [...] I wanna know when eventually is”

Qualified BS and ignorance of details. Let’s see the actual code first: it ignores the 1 value and returns an error and thus leaves the internal default 2, Alas, code that sets 1 or 2 gets the same effect == verified certificate. Why is this a problem?

Then, she says she really wants to know when “eventually” is. (The docs say “Future versions will…”) So if she was so curious you’d think she would’ve tried to ask us? We’re an accessible bunch, on mailing lists, on IRC and on twitter. No she didn’t ask.

But perhaps most importantly: did she really consider why it returns an error for 1? Since libcurl silently accepted 1 as a value for something like 10 years, there are a lot of old installations “out there” in the wild, and by returning an error for 1 we try to make applications notice and adjust. By silently accepting 1 without errors, there would be no notice and people will keep using 1 in new applications as well and thus when running such an newly written application with an older libcurl – you’d be back to having the security problem again. So, we have the error there to improve the situation.

“a peer is someone like you [...] a host is a server”

I’m a networking guy since 20+ years and I’m not used to people having a hard time to understand these terms. While perhaps there are rookies out in the world who don’t immediately understand some terms in the curl option names, should we really be criticized for that? I find that a hilarious critique. Also, these names were picked 13 years ago and we have them around for compatibility and API stability.

“why would you ever want to …”

Welcome to the real world. Why would an application author ever want to have these options to something else than just full check and no check? Because people and software development is a large world with many different desires and use case scenarios and curl is more widely used and abused than what many people consider. Lots of people have wanted something else than just a Yes/No to server cert verification. In fact, I’ve had many users ask for even more switches and fine-grained ways to fiddle with verification. Yes/No is a lay mans simplified view of certificate verification.

SEC-T curl slide

(This picture is the slide from the above picture, just zoomed and straightened out a bit.)

API age, stability and organic growth

We started working on libcurl in spring 1999, we added the CURLOPT_SSL_VERIFYPEER option in October 2000 and we added CURLOPT_SSL_VERIFYHOST in August 2001. All that quite a long time ago.

Then add thousands of hours, hundreds of hackers, thousands of applications, a user count that probably surpasses one billion users by now. Then also add the fact that option names are sticky in the way we write docs, examples pop up all over the internet and everyone who’s close to the project learns them by name and spirit and we quite simply grow attached to them and the way they work. Changing the name of an option is really painful and cause of a lot of confusion.

I’ve instead tried to more and more emphasize the functionality in the docs, to stress what the options do and how to do server cert verifications with curl the safe way.

I can’t force users to read docs. I can’t forbid users to blindly assume something and I’m not in control of, nor do I want to affect, the large population of third party bindings that exist for using on top of libcurl to cater for every imaginable programming language – and some of them may of course themselves have documentation problems and what not.

Would I change some of the APIs and names for options we have in libcurl if I would redo them today? Yes I would.

So what do we do about it?

I think this is the only really interesting question to take from all this. Everyone wants stable APIs. Everyone wants sensible and easy to understand APIs and as we can see they should also basically be possible to figure out without reading any documentation. And yet the API has to be powerful and flexible enough to be really useful for all those different applications.

At this point where we have these options that we do, when you’ve done your mud slinging and the finger of blame is firmly pointed at us. How exactly do you suggest we move forward to fix these claimed problems?

Taking it personally

Before anyone tells me to not take it personally: curl is my biggest hobby and a project I’ve spent many years and thousands of hours on. Of course I take it personally, otherwise I would’ve stopped working in the project a long time ago. This is personal to me. I give it my loving care and personal energy and then someone comes here and throw ill-founded and badly researched criticisms at me. I think criticizers of open source projects should learn to discuss the matters with the projects as their primary way instead of using it to make their conference presentations become more feisty.

Snaxx delivers

September 17th, 2014 by Daniel Stenberg

A pint of guinnessLate in the year 1999 I quit my job. I handed over a signed paper where I wrote that I quit and then I started my new job first thing in the year 2000. I had a bunch of friends at the work I left and together with my closest friends (who coincidentally also switched jobs at roughly the same time) we decided we needed a way to keep in touch with friends that isn’t associated with our current employer.

The fix, the “employer independent” social thing to help us keep in touch with friends and colleagues in the industry, started on the last of February 2000. The 29th of February, since it was a leap year and that fact alone is a subject that itself must’ve been discussed at that meetup.

Snaxx was born.

Snaxx is getting a bunch of friends to a pub somewhere in Stockholm. Preferably a pub with lots of great beers and a sensible sound situation. That means as little music as possible and certainly no TVs or anything. We keep doing them at a pace of two or three per year or so.

Bishops Arms logo

Yesterday we had the 31st Snaxx and just under 30 guests showed up (that might actually have been the new all time high). We had many great beers, food and we argued over bug reporting, discussed source code formats, electric car charging, C64 nostalgia, mentioned Linux kernel debugging methods, how to transition from Erlang to javascript development and a whole load of other similarly very important topics. The Bishops Arms just happens to be a brand of pubs here that have a really sensible view on how to run pubs to be suitable for our events so yesterday we once again visited one of their places.

Thanks for a great time yesterday, friends! I’ll be setting up a date for number 32 soon. I figure it’ll be in the January 2015 time frame…If you want to get notified with an email, sign up yourself on the snaxx mailing list.

A few pictures from yesterday can be found on the Snaxx-31 G+ event page.

Daladevelop hackathon

September 15th, 2014 by Daniel Stenberg

On Saturday the 13th of September, I took part in a hackathon in Falun Sweden organized by Daladevelop.

20-something hacker enthusiasts gathered in a rather large and comfortable room in this place, an almost three hour drive from my home. A number of talks and lectures were held through the day and the difficulty level ranged from newbie to more advanced. My own contribution was a talk about curl followed by one about HTTP/2. Blabbermouth as I am, I exhausted the friendly audience by talking a good total of almost 90 minutes straight. I got a whole range of clever and educated questions and I think and hope we all had a good time as a result.

The organizers ran a quiz for two-person teams. I teamed up with Andreas Olsson in team Emacs, and after having identified x86 assembly, written binary, spotted perl, named Ada Lovelace, used the term lightfoot and provided about 15 more answers we managed to get first prize and the honor of having beaten the others. Great fun!

Video perhaps?

September 8th, 2014 by Daniel Stenberg

I decided to try to do a short video about my current work right now and make it available for you all. I try to keep it short (5-7 minutes) and I’m certainly no pro at it, but I will try to make a weekly one for a while and see if it gets any fun. I’m going to read your comments and responses to this very eagerly and that will help me decide how I will proceed on this experiment.

Enjoy.

HTTP/2 interop pains

September 2nd, 2014 by Daniel Stenberg

At around 06:49 CEST on the morning of August 27 2014, Google deployed an HTTP/2 draft-14 implementation on their front-end servers that handle logins to Google accounts (and possibly others). Those at least take care of all the various login stuff you do with Google, G+, gmail, etc.

The little problem with that was just that their implementation of HTTP2 is in disagreement with all existing client implementations of that same protocol at that draft level. Someone immediately noticed this problem and filed a bug against Firefox.

The Firefox Nightly and beta versions have HTTP2 enabled by default and so users quickly started to notice this and a range of duplicate bug reports have been filed. And keeps being filed as more users run into this problem. As far as I know, Chrome does not have this enabled by default so much fewer Chrome users get this ugly surprise.

The Google implementation has a broken cookie handling (remnants from the draft-13 it looks like by how they do it). As I write this, we’re on the 7th day with this brokenness. We advice bleeding-edge users of Firefox to switch off HTTP/2 support in the mean time until Google wakes up and acts.

You can actually switch http2 support back on once you’ve logged in and it then continues to work fine. Below you can see what a lovely (wildly misleading) error message you get if you try http2 against Google right now with Firefox:

google-http2-draft14-cookies

This post is being debated on hacker news.

Updated: 20:14 CEST: There’s a fix coming, that supposedly will fix this problem on Thursday September 4th.

Update 2: In the morning of September 4th (my time), Google has reverted their servers to instead negotiate SPDY 3.1 and Firefox is fine with this.

Firefox OS Flatfish Bluedroid fix

August 29th, 2014 by Daniel Stenberg

Hey, when I just built my own Firefox OS (b2g) image for my Firefox OS Tablet (flatfish) straight from the latest sources, I ran into this (known) problem:

Can't find necessary file(s) of Bluedroid in the backup-flatfish folder.
Please update the system image for supporting Bluedroid (Bug-986314),
so that the needed binary files can be extracted from your flatfish device.

So, as I struggled to figure out the exact instructions on how to proceed from this, I figured I should jot down what I did in the hopes that it perhaps will help a fellow hacker at some point:

  1. Download the 3 *.img files from the dropbox site that is referenced from bug 986314.
  2. Download the flash-flatfish.sh script from the same dropbox place
  3. Make sure you have ‘fastboot’ installed (I’m mentioning this here because it turned out I didn’t and yet I have already built and flashed my Flame phone successfully without having it). “apt-get install android-tools-fastboot” solved it for me. Note that if it isn’t installed, the flash-flatfish.sh script will claim that the device is not in fastboot mode and stop with an error message saying so.
  4. Finally: run the script “./flash-flatfish.sh [dir with the 3 .img files]“
  5. Once it has succeeded, the tablet reboots
  6. Remove the backup-flatfish directory in the build dir.
  7. Restart the flatfish build again and now it should get passed that Bluedroid nit

Enjoy!

Going to FOSDEM 2015

August 27th, 2014 by Daniel Stenberg

Yeps,

I’m going there and I know several friends are going too, so this is just my way of pointing this out to the ones of you who still haven’t made up your mind! There’s still a lot of time left as this event is taking place late January next year.

I intend to try to get a talk to present this time and I would love to meet up with more curl contributors and fans.

fosdem