Category Archives: Development

My first Mozilla week

Working from home

I get up in the morning, shave, eat breakfast and make sure all family members get off as they should. Most days I walk my son to school (some 800 meters) and then back again. When they’re all gone, the house is quiet and then me and my cup of coffee go upstairs and my work day begins.

Systems and accounts

I have spent time this week to setup accounts and sign up for various lists and services. Created profiles, uploaded pictures, confirmed passwords. I’ve submitted stuff and I’ve signed things. There’s quite a lot of systems in use.

My colleagues

I’ve met a few. The Necko team isn’t very big but the entire company is huge and there are just so many people and names. I haven’t yet had any pressing reason to meet a lot of people nor learn a lot of names. I feel like I’m starting out this really slowly and gradually.

Code base

Firefox is a large chunk of code. It takes some 20 minutes to rebuild on my 3.5GHz quad-core Core-i7 with SSD. I try to pull code and rebuild every morning now so that I can dogfood and live on the edge. I also have a bunch of local patches now, some of them which I want to have stewing in my own browser for a while so that I know they at least don’t have any major negative impact!

Figuring out the threading, XPCOM, the JavaScript stuff and everything is a massive task. I really cannot claim to have done more than just scratched the surface so far, but at least I am scratching and I’ve “etagged” the whole lot and I’ve spent some time reading and reviewing code. Attaching a gdb to a running Firefox and checking out behavior and how it looks has also helped.

Netwerk code size

“Netwerk” is the directory name of the source tree where most of the network code is located. It is actually not so ridiculously large as one could fear. Counting only C++ and header files, it sums up to about 220K lines of code. Of course not everything interesting is in this tree, but still. Not mindbogglingly large.

Video conferencing

I’ll admit I’ve not participated in this sort of large scale video conferences before this. With Vidyo and all the different people and offices signed up at once – it is a quite impressive setup actually. My only annoyance so far is that I didn’t get the sound for Vidyo to work for me in Linux with my headphones. The other end could hear me but I couldn’t hear them! I had to defer to using Vidyo on a windows laptop instead.

Doing the video conferencing on a laptop instead of on my desktop machine has its advantages when I do them during the evenings when the rest of the family is at home since then I can move my machine somewhere and sit down somewhere where they won’t disturb me and I won’t disturb them.

Bugzilla

The bug tracker is really in the center for this project, or at least for how I view it and work with it right now. During my first week I’ve so far filed two bug reports and I’ve submitted a suggested patch for a third bug. One of my bugs (Bug 959100 – ParseChunkRemaining doesn’t detect chunk size overflow) has been reviewed fine and is now hopefully about to be committed.

I’ve requested commit access (#961018) as a “level 1” and I’ve signed the committer’s agreement. Level 1 is entry level and only lets me push to the Try server but still, I fully accept that there’s a process to follow and I’m in no hurry. I’ll get to level 3 soon enough I’m sure.

Mercurial

What can I say. After having used it a bit this week without any particularly fancy operations, I prefer git so much more. Of course I’m also much more used to git, but I find that for a lot of the stuff where both have similar concepts I prefer to git way. Oh well, its just a tool. I’ll get around. Possibly I’ll try out the git mirror soon and see if that provides a more convenient environment for me.

curl

What impact did all this new protocol and network code stuff during my work days have on my curl activities?

I got inspired to fix both the chunked encoding parser and the cookie parser’s handling of max-age in libcurl.

What didn’t happen

I feel behind in the implementing-http2 department. I didn’t get my new work laptop yet.

Next weekDaniel's work place

More of the same, land more patches and figure out more code. Grab more smallish bugs others have filed and work on fixing them as more practice.

Also, there’s a HTTPbis meeting in Zürich on Wednesday to Friday that I won’t go to (I’ll spare you the explanation why) but I’ll try to participate remotely.

Parallel Spaghetti – decoded

Here’s the decoding procedure for the Parallel Spaghetti Decode challenge.

Step 1, the answers to all the questions. You will notice that I did have some fun in D6 and E2, but since they were boxes that weren’t on the right track anyway I thought you’d still enjoy them.

Step 2, let me illustrate how the above answers will take you through the maze. The correct path is made up out of yellow boxes and the correct answers are shown with red arrows leading forward. Click it for full resolution version.

The parallel spaghetti challenge correct track shown

Step 3, those different colors in the “Word” column give you the words used for the two questions. If you rearrange them, the two questions become:

which tr command line option specifies delete characters

and

what curl command line option specifies POST requests

So, it took about 14 minutes at our event for Oscar Andersson to bring the correct answer to me:

-d

source code survival rate

The curl project has its roots in the late 1996, but we haven’t kept track of all of the early code history. We imported our code to Sourceforge late 1999 and that’s how far back we can see in our current git repository. The exact date is “Wed Dec 29 14:20:26 UTC 1999”. So, almost 14 years of development.

Warning: this blog post contains more useless info and graphs than many mortals can handle. Be aware!

How much old code remain in the current source tree? Or perhaps put differently: how is the refresh rate of the code? We fix bugs, we change things, we add features. Surely we’ll slowly over time rewrite the old code and replace it with new more shiny and better working code? I decided to check this. Here’s what I found!

The tools

We have all code in git. ‘git blame’ is the primary tool I used as it lists all lines of all source code and tells us when it was added. I did some additional perl scripting around it.

The code

I decided to check all code in the src/ and lib/ directories in the curl and libcurl source tree. The source code is used to create both the curl tool and the libcurl library and back in 1999 there was no libcurl like today so we do get a slightly better coverage of history this way.

In total this sums up to some 112000 lines in the current .c and .h files.

To count the total amount of commits done to those specific files through history I ran:

git log --oneline src/*.[ch] lib/*.[ch] | wc -l

6047 commits in total. (if I don’t specify the files and count all commits in the repo it ends up at 16954)

git stats

We run gitstats on the curl repo every day so you can go there for some more and current stats. Right now it tells us that average number of commits is 4.7 per active day (that means days when actually something was committed), or 3.4 per all days over the entire time. There was git activity 3576 days in total. By 224 authors.

Surviving commits

How much of the code would you think still remains that were present already that December day 1999?

How much of the code in the current code base would you think was written the last few years?

Commit vs Author vs Date

I wanted to see how much old code that exists, or perhaps how the age of the code is represented in the current code base. I decided to therefore base my logic on the author time that git tracks. It is basically the time when the author of a change commits it to his/her local tree as then the change can be applied later on by a committer that can be someone else, but the author time remains the same. Sometimes a committer commits multiple patches at once, possibly at a much later time etc so I figured the author time would be a better time stamp. I also decided to track the date instead of just the commit hash so that I can sort the changes properly and also make interesting graphs that are based on that time. I use the time with a second precision so changes done a second apart will be recorded as two separate changes while two commits done with the same author time stamp will be counted as the same time.

I had my script run ‘git blame –line-porcelain’ for all files and had my script sum up all changes done on the same time.

Some totals

The code base contains changes written at 4147 different times. Converted to UTC times, they happened on 2076 unique days. On 167 unique months. That’s every month since the beginning.

We’re talking about 312 files.

Number of lines changed over time

A graph with changes over time. The Y axis is number of lines that were changed on that particular time. (click for higher res)

Lines changed over time

Ok you object, that doesn’t look very appealing. So here’s the same data but with all the changes accumulated over time.

accumulated

Do you think the same as I do? Isn’t it strangely linear? It seems that the number of added lines that remain in the code today is virtually the same over time! But fair enough, the changes in the X axis are not distributed according to the time/date they represent so we shouldn’t be fooled by the time, but certainly we can see that changes in general only bring in a certain amount of surviving modified lines.

Another way to count the changes is then to check all the ~4000 change times of the present code, and see how many days between them there are:

delta

Ah, now finally we’re seeing something. Older code that is still present clearly was made with longer periods in between the changes that have lasted. It makes perfect sense to me, since the many years of development probably have later overwritten a lot of code that was written in between.

Also, it is clearly that among the more recent changes that have survived they were often done on the same day or just a few days away from another lasting change.

Grouped on date ranges

The number of modified lines split up on the individual year the change came in.

year

Interesting! The general trend is clear and not surprising. Two years stand out from the trend, 2004 and 2011. I have not yet investigated what particular larger changes that were made those years that have survived. The bump for 1999 is simply the original import and most of those lines are preprocessor lines like #ifdef and #include or just opening and closing braces { and }.

Splitting up the number of surviving lines on the specific year+month they were added:

month

This helps us analyze the previous chart. As we can see, the rather tall bars from 2004 and 2011 are actually several months wide and explains the bumps in the year-chart. Clearly we made some larger effort on those periods that were good enough to still remain in the code.

Correlate to added or removed lines?

So, can we perhaps see if some years’ more activity in number of added or removed source lines can be tracked back to explain the number of surviving source code lines? I ran “git diff [hash1]..[hash2] –stat — lib/*.[ch] src/*.[ch]” for all years to get a summary of number of added and removed source code lines that year. I added those number to the table with surviving lines and then I made another graph:

year-again

Funnily enough, we see almost an exact correlation there for the first eight years and then the pattern breaks. From the year 2009 the number of removed lines went down but still the amount of surving lines went up quite a bit and then the graphs jump around a bit.

My interpretation of this graph is this boring: the amount of surviving code in absolute numbers is clearly correlating to the amount of added code. And that we removed more code yearly in the 2000-2003 period than what has survived.

But notice how the blue line is closing the gap to the orange/red one over time, which means that percentage wise there’s more surviving code in more recent code! How much?

Here’s the amount of surviving lines/added lines and a second graph looking at surviving lines/(added + removed) to see if the mere source code activity would be a more suitable factor to compare against…

relation survival vs added and removed lines

Code committed within the last 5 years are basically 75% left but then it goes downhill down to the 18% survival rate of the 1999 code import.

If you can think of other good info to dig out, let me know!

1999,1699
2000,1115
2001,3061
2002,2432
2003,2578
2004,7644
2005,4016
2006,5101
2007,7665
2008,7292
2009,9460
2010,11762
2011,19642
2012,11842
2013,16844

Another embedded hacking day

enea We started off this second embedded hacking day (the first one being the one we had in October) when I sent out the invitation email on April 22nd asking people to sign up. We limited the number of participants to 40, and within two hours all seats had been taken! Later on I handed out more tickets so we ended up with 49 people on the list and interestingly enough only 13 of these were signed up for the previous event as well so there were quite a lot of newcomers.Daniel Stenberg, a penguin

Arrival

At 10 in the morning on Saturday June 1st, the first people had already arrived and more visitors were dropping in one by one. They would get a goodie-bag from our gracious host with t-shirt (it is the black one you can see me wearing on the penguin picture on the left), some information and a giveaway thing. This time we unfortunately did not have a single female among the attendees, but the all-male crowd would spread out in the room and find seating, power and switches to use. People brought their laptops and we soon could see a very wide range of different devices, development boards and early design ideas showing up on the tables. Blinking leds and cables everywhere. Exactly the way we like it!

A table full of hackers and equipment!

Giveaway

A USB wifi thing

We decided pretty early on the planning for this event that we wouldn’t give away a Raspberry Pi again like we did last time. Not that it was a bad thing to give away, it was actually just a perfect gift, but simply because we had already done that and wanted to do something else and we reasoned that by now a lot of this audience already have a Raspberry pi or similar device.

So, we then came up with a little device that could improve your Raspberry Pi or similar board: a USB wifi thing with Linux drivers so that you easily can add wifi capabilities to your toy projects!

And in order to provide something that you can actually hack on during the event, we decided to give away an Arduino Nano version. Unfortunately, the delivery gods were not with us or perhaps we had forgot to sacrifice the correct animal or something, so this second piece didn’t arrive in time. Instead we gathered people’s postal addresseAns and once the package arrives in a couple of days we will send it out to all attendees. Sort of a little bonus present afterwards. Not the ideal situation, but hey, we did our best and I think this is at least a decent work-around.

So the fun begun

In the big conference room next to the large common room, I said welcome to everyone at 11:00 before I handed over to Magnus from Xilinx to talk about Xilinx Zynq and combining ARM and FPGAs. Magnus Lindblad, Xilinx The crowd proved itself from the first minute and Magnus got a flood of questions immediately. Possibly it was also due to the lovely combo that Magnus is primarily a HW-guy while the audience perhaps was mostly SW-persons but with an interest in lowlevel stuff and HW and how to optimize embedded systems etc.

Audience listening to Magnus

After this initial talk, lunch was served.

Contest

I got lots of positive feedback the last time on the contest I made then, so I made one this time around as well and it was fun again. See my separate post on the contest details.

Flying

After the dust had settled and everyones’ pulses had started to go back to normal again after the contest, Björn Stenberg “took the stage” at 14:00 and educated us all in how you can use 7 Arduinos when flying an R/C plane.

Björn talks about open source flying

Björn Stenberg, a penguin

It seemed as if Björn’s talk really hit home among many people in the audience and there was much talking and extra interest in Björn’s large pile of electronics and “stuff” that he had brought with him to show off. The final video Björn showed during his talk can be found here.

Stuff to eat

Buns for the masses!People actually want to get something done too during a day like this so we can’t make it all filled up with talks. Enea provided candy, drinks and buns. And of course coffee and water during the entire day.

Even with buns and several coffee refills, I think people were slowly getting soft in their brains when the afternoon struck and to really make people wake up, we hit them with Erik Alapää’s excellent talk…

Aliasing in C and C++

Or as Erik specified the full title: “Aliasing in C99/C++11 and data transfer between hard real-time systems on modern RISC processors”…

Erik helped put the light on some sides of the C programming language that perhaps aren’t the most used or understood. How aliasing can be used and what pitfalls it can send us down into!

Erik Alapää on C aliasing

Kjell Ericson's blinking ledsPersonally I don’t really had a lot of time or comfort to get much done this day other than making sure everything ran smooth and that everyone was happy and the schedule was kept. My original hopes was to get some time to do some debugging on a few of my projects during the day but I failed that ambition…

We made sure to videofilm all the talks so we should hopefully be able to provide online versions of them later on.

Real-time Linux

I took the last speaker slot for the day. I think lots of brains were soft by then, and a few people had already started to drop off. I talked for a while generically about how the real-time problem (or perhaps low-latency) is being handled with Linux these days and explained a bit about PREEMPT_RT and full dynamic ticks and what the differences of the methods are.

Daniel Stenberg talks Real-time Linux

The end

At 20:00 we forced everyone out of the facilities. A small team of us grabbed a bite and a couple of beers to digest the day and to yap just a little bit more before we split up for the evening and took off home…

Thank you everyone who was there for making it another great event. Thank you all speakers for giving the event the extra brightness! Thank you Enea for sponsoring, hosting and providing all the goodies in such an elegant manner! It is indeed possible that we make a 3rd embedded hacking day in the future…

Embedded hacking contest #2, decoded

Okay, so here are the correct answers to the embedded hacking #2 contest (click for larger pictures):

The contests correct answers marked

The fact that you get the clues as hexadecimal uppercase ASCII was pretty quickly clear to everybody. I found it interesting to hear how people attacked the problem of decoding the hex into letters. Most people seem to have made a lookup-table fairly soon, and at least one contestant I talked to made a mistake in his table that turned W into X instead! This year’s winner did the conversion completely without a written down table…

So all the pieces are decoded like this:

The final question

Of course, now a pedant would argue that FORK() isn’t correct, but I decided to use all uppercase just to make the conversion slightly easier. At least I think converting only uppercase ASCII as hex is easier. So the question is “What does fork() return in the child process?”

The answer to the question is 0 (zero). Short and simple. See fork’s man page.

Linus Nielsen Feltzing is the happy winner!

After 13 minutes and 20 seconds since I clicked start on the timer, Linus Nielsen Feltzing approached me with a little note with the correct answer and we had a winner!

The very happy Linus was very disappointed in the previous competition when he was very close to winning but was beaten just within seconds by last time’s winner.

Now, the Chromebook that Enea donated to the winner of the contest was handed over to Linus. (The Samsung Cortex-A15 version.)

Embedded hacking contest #2

eneaI created another contest for the Embedded hacking event we just pulled off again, organized with foss-sthlm and Enea. Remember that I made one previously at our former hacking day?

The lesson from that time was that the puzzle ingredient then was slightly too difficult so people had to work a bit too long. It made many people give up and the ones who didn’t had to spend a significant time on solving it.

This time, I decided to use the same basic principle: ask N questions that all provide hints for the (N+1)th question, so that the first one to give me the answer to that final question is the winner. It makes it very easy for me to judge and it is a rather neat competition style game. I decided 10 questions should be enough.

To reduce some of the complexity from last time, I decided to provide the individual clues in the correct chronological order but instead add another twist: they aren’t in plain text! But since they’re chronological, the participants can go back and quite “easily” try other alternatives if there are some strange words appearing in the output. I made sure that all alternatives always have fine English alternatives so that if you pick the wrong answer it might still sound or look like English for a while…

I was very happy to see over 30 persons in the room that decided to accept the challenge. I suspect the prize did its part in attracting people to give it a go.

The rules in slightly longer terms as I put them (click it to see a higher resolution version):

the rules

And I clarified how the questions work:

the-questions

I then started my timer, and I showed all the questions on the projector to everyone. I gave them around 40 seconds per question. It thus took almost seven minutes to go through them and then I left a final slide up showing all questions:

The 10 questions

To allow readers to give this contest a go first before checking the answers. See the full answer and explanation.

A room full of competitive hackers

Why no curl 8

no 8In this little piece I’ll explain why there won’t be any version 8 of curl and libcurl in a long time. I won’t rule out that it might happen at some point in the future. Just that it won’t happen anytime soon and explain the reasons why.

Seven point twenty nine, really?

We’ve done 29 minor releases and many more patch releases since version seven was born, on August 7 2000. We did in fact bump the ABI number a couple of times so we had the chance of bumping the version number as well, but we didn’t take the chance back then and these days we have a much harder commitment and determinism to not break the ABI.

There’s really no particular downside with having a minor version 29. Given our current speed and minor versioning rules, we’ll bump it 4-6 times/year and we won’t have any practical problems until we reach 256. (This particular detail is because we provide the version number info with the API using 8 bits per major, minor and patch field and 8 bits can as you know only hold values up to 255.) Assuming we bump minor number 6 times per year, we’ll reach the problematic limit in about 37 years in the fine year 2050. Possibly we’ll find a reason to bump to version 8 before that.

Prepare yourself for seven point an-increasingly-higher-number for a number of years coming up!

Is bumping the ABI number that bad?

Yes!

We have a compatibility within the ABI number so that a later version always work with a program built to use the older version. We have several hundred million users. That means an awful lot of programs are built to use this particular ABI number. Changing the number has a ripple effect so that at some point in time a new version has to replace all the old ones and applications need to be rebuilt – and at worst also possibly have to be rewritten in parts to handle the ABI/API changes. The amount of work done “out there” on hundreds or thousands of applications for a single little libcurl tweak can be enormous. The last time we bumped the ABI, we got a serious amount of harsh words and critical feedback and since then we’ve gotten many more users!

Don’t sensible systems handle multiple library versions?

Yes in theory they do, but in practice they don’t.

If you build applications they have the ABI number stored for which lib to use, so if you just keep the different versions of the libraries installed in the file system you’ll be fine. Then the older applications will keep using the old version and the ones you rebuild will be made to use the new version. Everything is fine and dandy and over time all rebuilt applications will use the latest ABI and you can delete the older version from the system.

In reality, libraries are provided by distributions or OS vendors and they ship applications that link to a specific version of the underlying libraries. These distributions only want one version of the lib, so when an ABI bump is made all the applications that use the lib will be rebuilt and have to be updated.

Most importantly, there’s no pressing need!

If we would find ourselves cornered without ability to continue development without a bump then of course we would take the pain it involves. But as things are right now, we have a few things we don’t really like with the current API and ABI but in general it works fine and there’s no major downsides or great pains involved. We simply do not have any particularly good reason to bump version number or ABI version. Things work pretty good with the current way.

The future is of course unknown and at some point we’ll face a true limitation in the API that we need to bridge over with a bump, but it can also take a long while until we hit that snag.

Update April 6th: this article has been read by many and I’ve read a lot of comments and some misunderstandings about it. Here’s some additional clarifications:

  1. this isn’t stuff we’ve suddenly realized now. This is truths and facts we’ve learned over a long time and this post just makes it more widely available and easier to find. We already worked with this knowledge. I decided to blog about it since it struck me we didn’t have it documented anywhere.
  2. not doing version 8 (in a long time) does not mean we’re done or that the pace of development slows down. We keep doing releases bimonthly and we keep doing an average of 30 something bugfixes in each release.

some missing github features

github-social-coding

I think github is a lovely resource for collaborating on source code with my friends all over the globe. Among other things, we host the primary curl repository there and we’ve been doing so for almost three years now. This experience has led me to discover a bunch of things I miss in the service…

github is clearly aimed at repositories run by one person or a small set of persons, while in the projects I run I try to involve as many as possible in wide collaboration and I put efforts into informing everyone to get the widest possible attention and feedback. I may have created the account and “own” the repository, but I want the work to be done by a large team and I want everything that happens to it to be seen by a large audience. This is not always possible to do easily with the existing github services.

To further this spirit and to widen cooperation more, I would like to see the following improvements:

  • pull requests can’t be disabled nor can i control to which email address to send the notification. In our project I want all patches posted to the mailing list for review, archiving and discussions before I get a pull request, and I don’t use GitHub’s merge feature since it is hardly ever good enough (I want fast-forward and I usually feel a need to edit the commit message ever so slightly etc). I want the pull request to get translated into a patch review submission to the mailing list.
  • similarly, I cannot redirect where notifications are sent when someone comments a commit or a source line and this is highly annoying since we merge a lot of outsiders’ patches etc and as they may still read the mailing list we want the discussion there! Many times the contributors don’t have GitHub accounts and of course we don’t want to require that.
  • after the death of the CIA.vc service, the current IRC notification service offered by GitHub is nothing but inferior. The stupid bot has to join, tell its message and leave again. It is not an IRC friendly behavior and I can’t make it announce exactly what I’d like it to say…
  • I wish it had much better email notification on commits that would allow me to customize what it sends out without forcing me to write a full blown replacement. I want a unified diff included!

I realize GitHub has features that offer me to create an “organization” to host a repository instead of it being owned by me as a person, but I don’t think that should be a requirement to get this functionality. And I don’t know if GitHub truly offers better group functionality then either.

the new bug tracker on sourceforge

A while ago Sourceforge gave me the offer to upgrade curl’s bug tracker to “the new one” they offer. They do offer some arguments to why you would want to do this but they don’t elaborate much on the transition for existing projects. Since I’ve been annoyed and disappointed on the old one for years I decided to dive right in. I decided to post this blog entry to possibly encourage others as well, or at least explain how upgrading worked for us.

I’ll start by explaining a bit about what’s so bad about the old Sourceforge bug tracker. Anyone who has tried to use it for anything “real” most likely already know about these things and then I figure my list can be used for a comparison if we’ve gotten annoyed by the same things.

  1. They use a global bug id which makes all bug entries get very large numbers that aren’t in sequence and are fairly hard to remember.
  2. You can’t respond to bug reports by mail, so you are forced to use the heavy ad-filled web site.
  3. Ridiculous URLs to the bug tracker and each individual bug entry. I created a bounce CGI years ago on the curl web site to avoid having to use the overly long ones anywhere.
  4. When sending out email notifications, it prepends the new comments while having the older ones below which basically is an odd-order top-posting style a lot of people and projects have a hard time to get accustomed to.
The new tracker addresses all of these issues and I agreed to allow it to make curl use their new tracker. And this is the outcome:
  • All the existing bug tracker entries were converted. They all now get numbered sequentially in a private number series so no more bug #31234234 and instead the 1100 or so bug reports became bug #1 to bug #1169.
  • The new bug entries have a different set of meta-data but the ‘status’ and ‘owner’ etc seemed to get translated pretty good. The new ‘milestone’ got populated wrongly for me, but it didn’t matter much to me because I simply cleared it.
  • There’s no visible way to translate from old style bug numbers to the new bug numbers. When I go to the URL for the old number it redirects me to the new bug so clearly sourceforge has created a look-up table it can use.
  • There’s now a sensible public URL to point out the “home” for the curl bug tracking.

Annoying things with the new tracker:

  • It splits up a the comments to a single report into several “pages” far too early and forces you to click through annoying “page 2” or “next page” links to see the latest comments.

Summary: the upgrade was totally worth it. A much better bug tracker with much more useful interfaces, both the web interface and the ability to respond to it by email etc. And still room for improvements!