Tag Archives: Open Source

A day in the curl project

cURLI maintain curl and lead the development there. This is how I spend my time an ordinary day in the project. Maybe I don’t do all of these things every single day, but sometimes I do and sometimes I just do a subset of them. I just want to give you a look into what I do and why I don’t add new stuff more often or faster… I spend about one to three hours on the project every day. Let me also stress that curl is a tiny little project in comparison with many other open source projects. I’m certainly not saying otherwise.

the new bug

Someone submits a new bug in the bug tracker or on one of the mailing lists. Most initial bug reports lack sufficient details so the first thing I do is ask for more info and possibly ask the submitter to try a recent version as very often we get bug reported on very old versions. Many bug reports take several demands for more info before the necessary details have been provided. I don’t really start to investigate a problem until I feel I have a sufficient amount of details. We’re a very small core team that acts on other people’s bugs.

the question by a newbie in the project

A new person shows up with a question. The question is usually similar to a FAQ entry or an example but not exactly. It deserves a proper response. This kind of question can often be answered by anyone, but also most people involved in the project don’t feel the need or “familiarity” to respond to such questions and therefore remain quiet.

the old mail I haven’t responded to yet

I want every serious email that reaches the mailing lists to get a response, so all mails that neither I nor anyone else responds to I keep around in my inbox and when I have idle time over I go back and catch up on old mails. Some of them can then of course result in a new bug or patch or whatever. Occasionally I have to resort to simply saving away the old mail without responding in order to catch up, just to cut the list of outstanding things to do a little.

the TODO list for my own sake, things I’d like to get working on

There are always things I really want to see done in the project, and I work on them far too little really. But every once in a while I ignore everything else in my life for a couple of hours and spend them on adding a new feature or fixing something I’ve been missing. Actual development of new features is a very small fraction of all time I spend on this project.

the list of open bug reports

I regularly revisit this list to see what I can do to push the open ones forward. Follow-up questions, deep dives into source code and specifications or just the sad realization that a particular issue won’t be fixed within the nearest time (year?) so that I close it as “future” and add the problem to our KNOWN_BUGS document. I strive to keep the bug list clean and only keep relevant bugs open. Those issues that are not reproducible, are left without the proper attention from the reporter or otherwise stall will get closed. In general I feel quite lonely as responder in the bug tracker…

the mailing list threads that are sort of dying but I do want some progress or feedback on

In my primary email inbox I usually keep ongoing threads around. Lots of discussions just silently stop getting more posts and thus slowly wither away further up the list to become forgotten and ignored. With some interval I go back to see if the posters are still around, if there’s any more feedback or whatever in order to figure out how to proceed with the subject. Very often this makes me get nothing at all back and instead I just save away the entire conversation thread, forget about it and move on.

the blog post I want to do about a recent change or fix I did I’d like to highlight

I try to explain some changes to the world in blog posts. Not all changes but the ones that are somehow noteworthy as they perhaps change the way things have been or introduce new fun features perhaps not that easily spotted. Of course all features are always documented etc, but sometimes I feel I need to put some extra attention on focus on things in a more free-form style. Or I just write about meta stuff, like this very posting.

the reviewing and merging of patches

One of the most important tasks I have is to review patches. I’m basically the only person in the project who volunteers to review patches against any angle or corner of the project. When people have spent time and effort and gallantly send the results of their labor our way in the best possible format (a patch!), the submitter deserves a good review and proper feedback. Also, paving the road for more patches is one of the best way to scale the project. Helping newcomers become productive is important.

Patches are preferably posted on the mailing lists but there’s also some coming in via pull requests on github and while I strongly discourage that (due to them not getting the same attention and possible scrutiny on the list like the others) I sometimes let them through anyway just to be smooth.

When the patch looks good (or sometimes good enough and I just edit some minor detail), I merge it.

the non-disclosed discussions about a potential security problem

We’re a small project with a wide reach and security problems can potentially have grave impact on users. We take security seriously, and we very often have at least one non-public discussion going on about a problem in curl that may have security implications. We then often work on phrasing security advisories, working down exactly which versions that are vulnerable, producing patches for at least the most recent ones of those affected versions and so on.

tame stackoverflow

stackoverflow.com has become almost like a wikipedia for source code and programming related issues (although it isn’t wiki), and that site is one of the primary referrers to curl’s web site these days. I tend to glance over the curl and libcurl related questions and offer my answers at times. If nothing else, it is good to help keeping the amount of disinformation at low levels.

I strongly disapprove of people filing bug reports on such places or even very detailed (lib)curl core questions that should’ve been asked on the curl-library list.

there are idle times too

Yeah. Not very often, but sometimes I actually just need a day off all this. Sometimes I just don’t find motivation or energy enough to dig into that terrible seldom-happening bug on a platform I’ve never seen personally. A project like this never ends. The same day we release a new release, we just reset our clocks and we’re back on improving curl, fixing bugs and cleaning up things for the next release. Forever and ever until the end of time.

keep-calm-and-improve-curl

Firefox OS Flatfish Bluedroid fix

Hey, when I just built my own Firefox OS (b2g) image for my Firefox OS Tablet (flatfish) straight from the latest sources, I ran into this (known) problem:

Can't find necessary file(s) of Bluedroid in the backup-flatfish folder.
Please update the system image for supporting Bluedroid (Bug-986314),
so that the needed binary files can be extracted from your flatfish device.

So, as I struggled to figure out the exact instructions on how to proceed from this, I figured I should jot down what I did in the hopes that it perhaps will help a fellow hacker at some point:

  1. Download the 3 *.img files from the dropbox site that is referenced from bug 986314.
  2. Download the flash-flatfish.sh script from the same dropbox place
  3. Make sure you have ‘fastboot’ installed (I’m mentioning this here because it turned out I didn’t and yet I have already built and flashed my Flame phone successfully without having it). “apt-get install android-tools-fastboot” solved it for me. Note that if it isn’t installed, the flash-flatfish.sh script will claim that the device is not in fastboot mode and stop with an error message saying so.
  4. Finally: run the script “./flash-flatfish.sh [dir with the 3 .img files]”
  5. Once it has succeeded, the tablet reboots
  6. Remove the backup-flatfish directory in the build dir.
  7. Restart the flatfish build again and now it should get passed that Bluedroid nit

Enjoy!

Credits in the curl project

Friends!

When we receive patches, improvements, suggestions, advice and whatever that lead to a change in curl or libcurl, I make an effort to log the contributor’s name in association with that change. Ideally, I add a line in the commit message. We use “Reported-by: <full name>” quite frequently but also other forms of “…-by: <full name>” too like when there was an original patch by someone or testing and similar. It shouldn’t matter what the nature of the contribution is, if it helped us it is a contribution and we say thanks!

curl-give-credits

I want all patch providers and all of us who have push rights to use this approach so that we give credit where credit is due. Giving credit is the only payment we can offer in this project and we should do it with generosity.

The green bars on the right show the results from the question how good we are at giving credit in the project from the 2014 curl survey, where 5 is really good and 1 is really bad. Not too shabby, but I’d say we can do even better! (59% checked the top score, 15% checked the 3′)

I have a script called contributors.sh that extracts all contributors since a tag (typically the previous release) and I use that to get a list of names to thank in the RELEASE-NOTES file for the pending curl release. Easy and convenient.

After every release (which means every 8th week) I then copy the list of names from RELEASE-NOTES into docs/THANKS. So all contributors get remembered and honored after having helped us in one way or another.

When there’s no name

When contributors don’t provide a real name but only a nick name like foobar123, user_5678 and so on I tend to consider that as request to not include the person’s name anywhere and hence I tend to not include it in the THANKS or RELEASE-NOTES. This also sometimes the result of me not always wanting to bother by asking people over and over again for their real name in case they want to be given proper and detailed credit for what they’ve provided to us.

Unfortunately, a notable share of all contributions we get to the project are provided by people “hiding” behind a made up handle. I’m fine with that as long as it truly is what the helpers’ actually want.

So please, if you help us out, we will happily credit you, but please tell us your name!

keep-calm-and-improve-curl

Me in numbers, today

Number of followers on twitter: 1,302

Number of commits during the last 365 days at github: 686

Number of publicly visible open source commits counted by openhub: 36,769

Number of questions I’ve answered on stackoverflow: 403

Number of connections on LinkedIn: 608

Number of days I’ve committed something in the curl project: 2,869

Number of commits by me, merged into Mozilla Firefox: 9

Number of blog posts on daniel.haxx.se, including this: 734

Number of friends on Facebook: 150

Number of open source projects I’ve contributed to, openhub again: 35

Number of followers on Google+: 557

Number of tweets: 5,491

Number of mails sent to curl mailing lists: 21,989

TOTAL life achievement: 71,602

The curl and libcurl 2014 survey

Reading through the answers to the curl project‘s survey “curl and libcurl 2014” is very interesting and educational.

After having lead and participated in this project for so long I have my own picture of what we’re good and bad at. That’s not exactly the same image I get when I read the survey responses. That’s of course the educating part and I really want to learn from this poll and see where to put in some efforts and attempt to improve. At the same time I’ve been working for a while to put together a roadmap for the project, and the survey will help guide us with that work as well.

The full generated summary of the answers can be found on the site, but I thought I do the extra effort here and try to extrapolate data, compare and try to get to the real story that lurks in the shadows.

Over the almost 10 days the poll was open, we received 194 responses. I was hoping for more participation, but on the other hand I don’t think more people would’ve given a much different view. My only concern would be that I’m not sure exactly how well we reached out.

Almost all curl users use it for HTTP and HTTPS. Sure, we also use a lot of other protocols and in fact all supported protocols did up having at least two users according to the survey, but only a single digit percentage did not mark HTTP and HTTPS as protocols they use. The least used supported protocol gopher, is used among 1.5% of the users who responded.

FTPS and SFTP are basically equally much used and they are the 4th and 5th most used protocols. HTTP, HTTPS and FTP are clearly our most popular protocols.

Only one in five users use curl on a single platform. All others use it on two or more, and one if four use it on four or more with an unexpectedly high 11% saying they use it on 5 or more platforms! That’s a pretty strong message to me that our multi-platform strategy is important.

Our users have been with us for a long time. Half of the users have been using curl for five years or more! A fifth has been with us for 8 years or more! And yet there seems to be a healthy amount of newcomers finding us as 14% is within their first year.

The above numbers combined, I’m not surprised but only happy to see that 4 out of 5 users are also involved in other open source projects. curl is just one piece in a large ecosystem and I think it is good that we all participate in several projects so that we learn and cross-pollinate where possible!

Less than half of the respondents are subscribed to a curl mailing list, and curl-library is the most popular one. This also reflects in subscriber numbers on the actual mailing lists where curl-library with its 1400+ members has almost twice as many subscribers as curl-users. One way to view this is that we are old enough, established enough and working enough so that users don’t have to subscribe to our lists to keep up. The less optimistic way to see it could be that this is because we haven’t reached out good enough or that our mailing list culture/setup isn’t welcoming enough.

Perhaps most surprising to me: that several persons got upset and reacted strongly to the question about how good we treat “female and other minorities” in the project. To me there’s no doubt that female contributors are a minority in the curl community and I want to learn if we’re doing our best to be inclusive and open to all possible contributors. Or at least how good/bad people think we are doing.

29% of the respondents have contributed patches, meaning 56 individuals. I think that tells more about the ones who took part of the survey than it measures participation level among “regular users”.

Documentation

A big revelation for me was the question where I asked people to identify the “worst parts” of the project. The image here below is the look of the summary.

It quite clearly identifies “documentation” as the area in most need of improvements.

I don’t think the amount of docs is the problem. After discussing with people I think the primary issues are:

  • Some collections of docs are just too big and hard to find in, like the curl man page and the curl_easy_setopt man pages. We need to split them up and/or rearrange somehow to help people find the info they need. Work has started on this. I’ll follow up with details later.
  • We get slightly bad “reviews” on this when people confuse the libcurl bindings’ lack of docs to be our problem. Lots of libcurl bindings are not very good documented – but they are separate projects not controlled or documented by us. I don’t know what we can do to help that situation. Suggestions are very welcome!
  • We don’t have much step-by-step tutorials on how to get started and how to knit things together. We mostly provide reference manuals. I will appreciate help with improving this!

cURL

I go Mozilla

Mozilla dinosaur head logo

In January 2014, I start working for Mozilla

I’ve worked in open source projects for some 20 years and I’ve maintained curl and libcurl for over 15 years. I’m an internet protocol geek at heart and Mozilla seems like a perfect place for me to continue to explore this interest of mine and combine it with real open source in its purest form.

I plan to use my experiences from all my years of protocol fiddling and making stuff work on different platforms against random server implementations into the networking team at Mozilla and work on improving Firefox and more.

I’m putting my current embedded Linux focus to the side and I plunge into a worldwide known company with worldwide known brands to do open source within the internet protocols I enjoy so much. I’ll be working out of my home, just outside Stockholm Sweden. Mozilla has no office in my country and I have no immediate plans of moving anywhere (with a family, kids and all established here).

I intend to bring my mindset on protocols and how to do things well into the Mozilla networking stack and world and I hope and expect that I will get inspiration and input from Mozilla and take that back and further improve curl over time. My agreement with Mozilla also gives me a perfect opportunity to increase my commitment to curl and curl development. I want to maintain and possibly increase my involvement in IETF and the httpbis work with http2 and related stuff. With one foot in Firefox and one in curl going forward, I think I may have a somewhat unique position and attitude toward especially HTTP.

I’ve not yet met another Swedish Mozillian but I know I’m not the only one located in Sweden. I guess I now have a reason to look them up and say hello when suitable.

Björn and Linus will continue to drive and run Haxx with me taking a step back into the shadows (Haxx-wise). I’ll still be part of the collective Haxx just as I was for many years before I started working full-time for Haxx in 2009. My email address, my sites etc will remain on haxx.se.

I’m looking forward to 2014!

Rpi night in GBG

pelagicore logo

Daniel talking So I flew down to and participated at yet another embedded Linux hacking event that was also co-organized by me, that took place yesterday (November 20th 2013) in Gothenburg Sweden.

The event was hosted by Pelagicore in their nice downtown facilities and it was fully signed up with some 28 attendees.

I held a talk about the current situation of real-time and low latency in the Linux kernel, a variation of a talk I’ve done before and even if I have modified it since before you can still get the gist of it on this old slideshare upload. As you can see on the photo I can do hand-wavy gestures while talking! When I finally shut up, we were fed tasty sandwiches and there was some time to socialize and actually hack on some stuff.

Embedded Linux hackers in GBG

I then continued my tradition and held a contest. This time I did raise the complexity level a bit as I decided I wanted a game with more challenges and something that feels less like a quiz and more like a game or a maze. See my separate post for full details and for your chance to test your skills.

This event was also nicely synced in time with the recent introduction of the foss-gbg mailing list, which is an effort to gather people in the area that have an interest in Free and Open Source Software. Much in the same way foss-sthlm was made a couple of years ago.

Pelagicore also handed out 9 Raspberry Pis at the event to lucky attendees.

Embedded and Raspberry Pis in GBG

Kjell Ericson's blinking leds

On November 20, we’ll gather a bunch of interested people in the same room and talk embedded Linux, open source and related matters. I’ll do a talk about real-time in Linux and I’ll run a contest in the same spirit as I’ve done before several times.

Sign-up here!

Pelagicore is hosting and sponsoring everything. I’ll mostly just show up and do what I always do: talk a lot.

So if you live in the area and are into open source and possibly embedded, do show up and I can promise you a good time.

(The photo is actually taken during one of our previous embedded hacking events.)

source code survival rate

The curl project has its roots in the late 1996, but we haven’t kept track of all of the early code history. We imported our code to Sourceforge late 1999 and that’s how far back we can see in our current git repository. The exact date is “Wed Dec 29 14:20:26 UTC 1999”. So, almost 14 years of development.

Warning: this blog post contains more useless info and graphs than many mortals can handle. Be aware!

How much old code remain in the current source tree? Or perhaps put differently: how is the refresh rate of the code? We fix bugs, we change things, we add features. Surely we’ll slowly over time rewrite the old code and replace it with new more shiny and better working code? I decided to check this. Here’s what I found!

The tools

We have all code in git. ‘git blame’ is the primary tool I used as it lists all lines of all source code and tells us when it was added. I did some additional perl scripting around it.

The code

I decided to check all code in the src/ and lib/ directories in the curl and libcurl source tree. The source code is used to create both the curl tool and the libcurl library and back in 1999 there was no libcurl like today so we do get a slightly better coverage of history this way.

In total this sums up to some 112000 lines in the current .c and .h files.

To count the total amount of commits done to those specific files through history I ran:

git log --oneline src/*.[ch] lib/*.[ch] | wc -l

6047 commits in total. (if I don’t specify the files and count all commits in the repo it ends up at 16954)

git stats

We run gitstats on the curl repo every day so you can go there for some more and current stats. Right now it tells us that average number of commits is 4.7 per active day (that means days when actually something was committed), or 3.4 per all days over the entire time. There was git activity 3576 days in total. By 224 authors.

Surviving commits

How much of the code would you think still remains that were present already that December day 1999?

How much of the code in the current code base would you think was written the last few years?

Commit vs Author vs Date

I wanted to see how much old code that exists, or perhaps how the age of the code is represented in the current code base. I decided to therefore base my logic on the author time that git tracks. It is basically the time when the author of a change commits it to his/her local tree as then the change can be applied later on by a committer that can be someone else, but the author time remains the same. Sometimes a committer commits multiple patches at once, possibly at a much later time etc so I figured the author time would be a better time stamp. I also decided to track the date instead of just the commit hash so that I can sort the changes properly and also make interesting graphs that are based on that time. I use the time with a second precision so changes done a second apart will be recorded as two separate changes while two commits done with the same author time stamp will be counted as the same time.

I had my script run ‘git blame –line-porcelain’ for all files and had my script sum up all changes done on the same time.

Some totals

The code base contains changes written at 4147 different times. Converted to UTC times, they happened on 2076 unique days. On 167 unique months. That’s every month since the beginning.

We’re talking about 312 files.

Number of lines changed over time

A graph with changes over time. The Y axis is number of lines that were changed on that particular time. (click for higher res)

Lines changed over time

Ok you object, that doesn’t look very appealing. So here’s the same data but with all the changes accumulated over time.

accumulated

Do you think the same as I do? Isn’t it strangely linear? It seems that the number of added lines that remain in the code today is virtually the same over time! But fair enough, the changes in the X axis are not distributed according to the time/date they represent so we shouldn’t be fooled by the time, but certainly we can see that changes in general only bring in a certain amount of surviving modified lines.

Another way to count the changes is then to check all the ~4000 change times of the present code, and see how many days between them there are:

delta

Ah, now finally we’re seeing something. Older code that is still present clearly was made with longer periods in between the changes that have lasted. It makes perfect sense to me, since the many years of development probably have later overwritten a lot of code that was written in between.

Also, it is clearly that among the more recent changes that have survived they were often done on the same day or just a few days away from another lasting change.

Grouped on date ranges

The number of modified lines split up on the individual year the change came in.

year

Interesting! The general trend is clear and not surprising. Two years stand out from the trend, 2004 and 2011. I have not yet investigated what particular larger changes that were made those years that have survived. The bump for 1999 is simply the original import and most of those lines are preprocessor lines like #ifdef and #include or just opening and closing braces { and }.

Splitting up the number of surviving lines on the specific year+month they were added:

month

This helps us analyze the previous chart. As we can see, the rather tall bars from 2004 and 2011 are actually several months wide and explains the bumps in the year-chart. Clearly we made some larger effort on those periods that were good enough to still remain in the code.

Correlate to added or removed lines?

So, can we perhaps see if some years’ more activity in number of added or removed source lines can be tracked back to explain the number of surviving source code lines? I ran “git diff [hash1]..[hash2] –stat — lib/*.[ch] src/*.[ch]” for all years to get a summary of number of added and removed source code lines that year. I added those number to the table with surviving lines and then I made another graph:

year-again

Funnily enough, we see almost an exact correlation there for the first eight years and then the pattern breaks. From the year 2009 the number of removed lines went down but still the amount of surving lines went up quite a bit and then the graphs jump around a bit.

My interpretation of this graph is this boring: the amount of surviving code in absolute numbers is clearly correlating to the amount of added code. And that we removed more code yearly in the 2000-2003 period than what has survived.

But notice how the blue line is closing the gap to the orange/red one over time, which means that percentage wise there’s more surviving code in more recent code! How much?

Here’s the amount of surviving lines/added lines and a second graph looking at surviving lines/(added + removed) to see if the mere source code activity would be a more suitable factor to compare against…

relation survival vs added and removed lines

Code committed within the last 5 years are basically 75% left but then it goes downhill down to the 18% survival rate of the 1999 code import.

If you can think of other good info to dig out, let me know!

1999,1699
2000,1115
2001,3061
2002,2432
2003,2578
2004,7644
2005,4016
2006,5101
2007,7665
2008,7292
2009,9460
2010,11762
2011,19642
2012,11842
2013,16844