Tag Archives: statistics

Keyboard key frequency

A while ago I wrote about my hunt for a new keyboard, and in my follow-up conversations with friends around that subject I quickly came to the conclusion I should get myself better analysis and data on how I actually use a keyboard and the individual keys on it. And if you know me, you know I like (useless) statistics.

Func KB-460 keyboardSo, I tried out the popular and widely used Linux key-logger software ‘logkeys‘ and immediately figured out that it doesn’t really support the precision and detail level I wanted so I forked the project and modified the code to work the way I want it: keyfreq was born. Code on github. (I forked it because I couldn’t find any way to send back my modifications to the upstream project, I don’t really feel a need for another project.)

Then I fired up the logging process and it has been running in the background for a while now, logging every key stroke with a time stamp.

Counting key frequency and how it gets distributed very quickly turns into basically seeing when I’m active in front of the computer and it also gave me thoughts around what a high key frequency actually means in terms of activity and productivity. Does a really high key frequency really mean that I was working intensely or isn’t that purpose more a sign of mail sending time? When I debug problems or research details, won’t those periods result in slower key activity?

In the end I guess that over time, the key frequency chart basically says that if I have pressed a lot of keys during a period, I was working on something then. Hours or days with a very low average key frequency are probably times when I don’t work as much.

The weekend key frequency is bound to be slightly wrong due to me sometimes doing weekend hacking on other computers where I don’t log the keys since my results are recorded from a single specific keyboard only.

Conclusions

So what did I learn? Here are some conclusions and results from 1276614 keystrokes done over a period of the most recent 52 calendar days.

I have a 105-key keyboard, but during this period I only pressed 90 unique keys. Out of the 90 keys I pressed, 3 were pressed more than 5% of the time – each. In fact, those 3 keys are more than 20% of all keystrokes. Those keys are: <Space>, <Backspace> and the letter ‘e’.

<Space> stands out from all the rest as it has been used more than 10%.

Only 29 keys were used more than 1% of the presses, giving this a really long tail with lots of keys hardly ever used.

Over this logged time, I have registered key strokes during 46% of all hours. Counting only the hours in which I actually used the keyboard, the average number of key strokes were 2185/hour, 36 keys/minute.

The average week day (excluding weekend days), I registered 32486 key presses. The most active sinngle minute during this logging period, I hit 405 keys. The most active single hour I managed to do 7937 key presses. During weekends my activity is much lower, and then I average at 5778 keys/day (7.2% of all activity were weekends).

When counting most active hours over the day, there are 14 hours that have more than 1% activity and there are 5 with less than 1%, leaving 5 hours with no keyboard activity at all (02:00- 06:59). Interestingly, the hour between 23-24 at night is the single most busy hour for me, with 12.5% of all keypresses during the period.

Random “anecdotes”

Longest contiguous time without keys: 26.4 hours

Longest key sequence without backspace: 946

There are 7 keys I only pressed once during this period; 4 of them are on the numerical keypad and the other three are F10, F3 and <Pause>.

More

I’ll try to keep the logging going and see if things change over time or if there later might end up things that can be seen in the data when looked over a longer period.

curlers rest on Sundays and during July

We now run gitstats on the curl git repository daily and provides fun graphs.

We have almost 11 years of source code history covered and I personally have done some ~68% of all commits. Given this long history it is fun to see some very clear trends. Like this first one: look at the distribution of commits per weekday over the entire period. The amount of commits done during weekends are significantly lower than during the work week, and the Sunday amount is clearly even lower than Saturday:

day_of_week

Similarly, we can see how the activity is spread out over calendar months. This shows an obvious correlation to the slower periods in my life, which means that July is vacation times and the numbers show it:

month_of_year

curl: ten years of more code and contributors

It feels like I’ve been doing curl forever, while in fact it is “only” in its early teens. I decided to dig up some numbers on how the development have been within the project over the last decade. How have things changed during the 10 most recent years.

To spice up the numbers, I generated some graphs based on them and to then make the graphs nice and presentable I moved them all over to a single graph using my super gimp powers.

Bugs, Linus of code and contributors over time in curl

Click the image to get a full resolution version. But even the small one shows the data I wanted to illustrate: we gain contributors in roughly the same speed as we grow in lines of code. And at the same time we get roughly the same amount of bug reports over the years, apparently independently from the amount of code and contributors! Note that I separate the bug fixed bars from the bug report bars because bug fixed is the amount of bugfixes mentioned in release notes while the bug reports is the count in the web based bug tracker. As seen we fixed a lot more bugs than we get submitted in the bug tracker.

I should add that the reason the green contributor line starts out a little slow and gets a speed bump after a while, is that I changed my way of working at that point and got much better at tracking exactly all contributors. The general angle on the curve for the last 4-5 years is however what I think is the interesting part of it. How it is basically the same angle as the source code increase.

The bug report counter is merely taken from our bug tracker at sourceforge, which is a very inexact count as a very large amount of bugs are reported on the mailing lists only.

Data from the curl release table, tells that during these 10 years we’ve done 77 releases in which we fixed 1414 bugs. That’s 18.4 bug fixes per release and one release roughly every 47 days on average. 141 bug fixes per year on average.

To see how this development has changed over time I decided to compare those numbers against those for the most recent 2.5 years. During this most recent 25% of the period we’ve done releases every 60 days on average but counting 155 bug fixes per year. Which made that the average number of bug fixes per release have gone up to 26; one bugfix every 2.3 days.

A more negative interpretation on this could be that we’re only capable of a certain amount of bug fixes per time so no matter how much code we get we fix bugs at roughly the same rate. The fact that we don’t get any increasing amount of bug reports of course speaks against this theory.

A view of a popular post

So I post frequently on this blog, but I’m not a particularly interesting person myself, I’m not really a master at writing and phrasing articles to make them thrilling and irresistible and I basically only deal with really geeky and technical subjects. It means there’s an average of perhaps 200 views per day.

The other day I wrote my multipath tcp post, and someone submitted it to reddit. It turned out to become my most read posting on my blog ever. By far. I think the “views per day” graph looks pretty cool:

visitor graph from daniel.haxx.se/blog

Some stats on curl development

Counting curl 6.0 and up to curl 7.19.3 we’ve done 78 releases during the 9.4 years it took.

In this time, we’ve mentioned 1259 bugfixes and 389 notable changes.

This makes one bugfix done every 2.7 days. One release done every 43rd day with an average of 16 bugfixes done in each. The longest interval ever between two curl releases was 139 days, back in 2000 when we worked to release the first version 7 release (known as 7.1).

To compare with how our work has been more recently, doing the same math limited to the 20 latest releases only (the 3.3 years since and including 7.15.0) shows that we’re still on 2.7 days per bugfix (although we know that the code base has grown steadily for years) but we’re now on 61 days between releases and 21 bugfixes/release…

All this info and more will be visible on a web page on the curl site soonish, I’m still working on polishing it up.

What other useful or useless but interesting numbers could be extracted from this?

4 ohloh improvements I’d like

I am a stats junkie so I like my stats in large amounts. But I like the stats to be right and as accurate as possible, and when I look at what ohloh produces I like the concepts and ideas in general, I just think their implementation is lacking in a few vital areas that need improvement:

1. There are no dependencies or hierarchies between packages, so “I use this” counters get worthless since people mark end-user packages they use. Low-level support packages and libraries that are used indirectly don’t get many “use counts”

2. Doing very few commits in a very well used project with few authors gives you way way more points than doing a bus-load of commits in something less used with many fellow contributors. This makes the top-list of people very skewed as some of the top-64 people only did a few hundred commits ever. I doubt many mortals would consider someone who only ever did 300 commits to be a top community person. At the very moment I write this, the #1 ranked person has done 20 commits during 5 months…!

3. Too few versioning systems are supported, leaving out huge chunks of the open source world. Bazaar, mercurial and a few more are a bit too popular to be ignored without the results getting skewed.

4. I’d like to see the “number of users” of products as a percentage, as the total number of users they show include all contributors to all projects. Out of the 140,000 users (which undoubtedly include a lot of duplicates), it would surprise me if more than 10,000 have actually registered what products they use. I’ve tried to find the exact number but I failed. So 3,000 users don’t mean 3,000 out of 140,000 but 3,000 out of 10,000…

ohloh vs statcvs

I’ve played a bit with statcvs lately and I generated reports for the curl repository. It turned out rather interesting (well, assuming you’re a statistics geek such as me) especially in comparison to the data and stats ohloh.net presents for the same code:

Executive summary:

  • I’ve done 82% of all code changes.
  • We seem to grow at roughly the same pace (both number of code lines and number of files) over the last years.
  • The lines of code per file count seems rather fixed

Oh, that initial big bump at late 1999/early 2000 was due to a lot of “wrong” files such as configure, config.guess etc were committed and subsequently removed. It is a bit annoying to have there as it ruins the data somewhat but I’ve not managed to fool statcvs into ignoring that part…

Rockbox downloads April 2008

I counted the Rockbox downloads from build.rockbox.org during April 2008, and while the results weren’t very different from the past results, I thought I’d still show them. This month, 99874 downloads were counted and we had 30 different packages downloaded. Back in January, we still only had 26 versions. The top-5 are identical to the last list.

The most popular newcomer since my last count is the Olympus Mrobe 100 which has more than twice the number of downloads compared to the second newcomer iAudio m3.

The list shows model and number of downloads. The newcomers since the last count are shown bold.

  1. sansae200 22038
  2. ipodvideo 18289
  3. ipodvideo64mb 12392
  4. ipodnano 12261
  5. sansac200 4176
  6. h300 3071
  7. ipodcolor 2932
  8. ipodmini2g 2875
  9. gigabeatf 2848
  10. ipod4gray 2651
  11. h120 2506
  12. iaudiox5 2498
  13. ipod3g 1717
  14. ipodmini1g 1496
  15. ipod1g2g 1411
  16. h10 1361
  17. h10_5gb 1268
  18. mrobe100 1116
  19. player 564
  20. iaudiom3 528
  21. recorder 500
  22. iaudiom5 284
  23. h100 275
  24. recorder8mb 233
  25. recorderv2 157
  26. cowond2 138
  27. fmrecorder 116
  28. ondiofm 108
  29. ondiosp 58
  30. mrobe500 7

Swedish Broadband Usage

The other day I fell over this interesting report published by ITIF called Explaining International Broadband Leadership (108 pages 3MB PDF) that listed USA and 30 OECD countries and their broadband usage and the report came to numerous conclusions and advice why the US is falling behind in the ranks and so on. Quite interesting read in general.

In their ranking table, Sweden is listed at #6. I immediately noticed the column called “Household penetration” (subscribers per household). Hm, isn’t that the amount of households that have broadband? It says 0.54 for Sweden. 54% broadband users among the households 2007?

We have this organization in Sweden called “Statistiska CentralbyrÃ¥n” in Swedish and “Statistics Sweden” in english. They basically work with gathering and presenting statistics on Sweden and Swedish related matters. They’ve produced a huge report (in Swedish – 1MB, 256 pages PDF) called “Private citizens’ use of computers and internet 2007” (my translation). They mention that during spring 2007, 71% of the Swedes used broadband internet from their homes. (Over 80% had internet access in their homes, which makes 12% of the users not using broadband…)

Isn’t there a shockingly huge difference between 54 and 71? And this is just a quick number I could check myself for my country. How off is then the other countries’ values? The ITIF report doesn’t even try to describe how they got their numbers so it isn’t easy to see how they got this. The Swedish report does in fact also contain a comparison with other European countries, and the numbers shown for them don’t match the ones in the ITIF report either! (But the order of top broadband using countries is roughly the same.)

I’m also a bit curious on how they got the numbers for the “average download speed in Mbps” column, but I don’t have any numbers to cross-check for that.

Rockbox Downloads Jan 2008

It’s time again for a check and analysis of the download trends of the build.rockbox.org web site, with comparisons with how things were at my previous count from October 2007.

Rockbox!

During this month, 112034 downloads were counted, which is almost a 10% increase since october’s 102127 – and as you’ll see below almost the entire increase was basically due to a boosted interest in the Sansa E200. There’s been no new port offered for download during this time, there are still 26 packages. The downloads were distributed as follows (the position changes are within () and the previous period’s download counts are within []):

  1. (+1) sansae200 27325 [18788]
  2. (-1) ipodvideo 21453 [20721]
  3. (+1) ipodvideo64mb 13904 [12780]
  4. (-1) ipodnano 13419 [13228]
  5. (+7) sansac200 3490 [2841]
  6. (-) gigabeatf 3410 [3522]
  7. (+1) ipodcolor 3316 [3287]
  8. (-3) h300 3306 [3614]
  9. (+2) ipod4gray 3249 [2896]
  10. (-1) ipodmini2g 3087 [3083]
  11. (-4) iaudiox5 2933 [3340]
  12. (-2) h120 2521 [2924]
  13. (+1) ipod3g 1993 [1624]
  14. (-1) ipodmini1g 1713 [1647]
  15. (+1) h10_5gb 1458 [1524]
  16. (-1) h10 1413 [1624]
  17. (-) ipod1g2g 1246 [1384]
  18. (-) player 730 [834]
  19. (-) recorder 558 [692]
  20. (-) iaudiom5 380 [422]
  21. (+1) h100 328 [345]
  22. (-1) recorder8mb 292 [354]
  23. (+1) fmrecorder 189 [222]
  24. (-1) recorderv2 175 [222]
  25. (-) ondiofm 96 [113]
  26. (-) ondiosp 50 [96]

Of course, if we count the two different ipod video builds combined, it alone is 35357 downloads (31.6%)! Apart from the E200 climb, I think the only significant change in the table above is the other SanDisk player in the selection, the Sansa C200 series which climed 7 positions due to its 23% download increase.

The top-5 downloads are all portalplayer based, and here’s a more complete look at how the builds are split up on main architectures (october’s shares within parentheses):

  1. portalplayer 97066 downloads 86.6% (83.6%)
  2. coldfire 9468 downloads 8.45% (10.4%)
  3. samsung 3410 downloads 3.0% (3.4%)
  4. sh1 2533 downloads 1.9% (2.5%)

The harddrive based builds are still more popular, but the flash ones are gaining:

  1. HDD models 67654 downloads 60.4% (65.7%)
  2. flash models 44380 downloads 39.6% (34.5%)

The top-8 downloads are for targets featuring color LCDs, and thy certainly are popular when checking download spread on target LCD types:

  1. Color 92494 downloads (82.6%)
  2. Greyscale 17450 downloads (15.6%)
  3. Monocrome 1360 downloads (1.2%)
  4. Charcell 730 downloads (0.7%)

Like last time, this doesn’t include any custom builds, builds from download.rockbox.org nor release builds from www.rockbox.org. Take all this as indications, not absolute facts.