Category Archives: Open Source

Open Source, Free Software, and similar

Embedded Linux Contest

During our embedded Linux hacking event in Stockholm on October 20th I ran a little contest for the ones who wanted to participate. I created it entirely by myself to allow as many people as possibly to participate with them knowing me or me knowing them etc limiting the fun.

For your amusement I include the full contest here. If you want to try it out, then make sure you don’t attempt to google for any answers or otherwise use a machine/computer as a help.

img1

img2

img3

Here I just want to mention that, as is shown in the above example question, ‘ace‘ is the correct character sequence and the letters should then be kept in that order in the final question. Also note that a character sequence can legally contain a dash as well. You will get 16 similar sequences of 1 to 3 letters, and those 16 sequences should be moved around to form the 17th question.

img4

… at this point I fired off all the questions one by one at about 15-20 seconds per question. In this blog post I’ll take a shortcut and instead show you the final page I made that showed all questions at once, which I then left displayed for the remainder of the competition  time. Click the image to get a full resolution version that is perfectly readable:

all questions at once

the winners of the contestMy take away from this contest is that it was harder than I anticipated and took a longer time to crack than I thought. I gave away a few additional clues and hints as the time went by, but in the end I believe there were several persons who were very close to breaking it at almost the same time. In the end, Klas and Jonas presented the correct answer first and won the bottle of Champagne. I’m sure you appreciate their efforts after having tried this yourself!

The answers? Are you really sure? The correct answers and the final question with its answer is available

I had a great time creating the competition and I believe the competitors appreciated it.

Additional trivia: I created the letter sequences for the other alternatives by writing other English phrases and chopped them up, so that they were from actual English and hence possibly more believable.

Embedded Linux hacking day

eneaOn September 10th, I sent out the invite to the foss-sthlm community for an embedded hacking event just before lunch. In just four hours, the 40 available tickets had been claimed and the waiting list started to get filled up as well… I later increased the amount to 46, we had some cancellations and I handed out more tickets and we had 46 people signed up at the day of the event (I believe 3 of these didn’t show up). At the day the event started, we still had another 20 people in the waiting list with hopes of getting a spot!

(All photos in this post are scaled down versions, click the picture to see a slightly higher resolution version!)

In Enea we had found an excellent sponsor for this event. They provided the place, the food, the raspberry pis, the coffe, the tshirts, the infrastructure and everything else that had to be there to make it an awesome day.

the-roomWe started off the event at 10:00 on October 20 in the Enea offices in Kista, Stockholm Sweden. People dropped in one by one and were handed their welcome present containing a raspberry pi board, a 2GB SD card and a USB-to-serial cable to interface/power the board with. People then found their seats in the room.

There were fruit, candy, water and coffee to start off and keep the mood high. We experienced some initial wifi and internet access problems but luckily we had no less than two dedicated Enea IT support people present and they could swiftly fix the little hiccups that occurred.

coffee machines

Once everyone seemed to have landed, I welcomed everyone and just gave a short overview of what to expect from the day, where the toilets are and so on.

In order to try to please everyone who couldn’t be with us at this event, due to plans or due to simply not having got one of the attractive 40 “tickets”, Björn the cameramanEnea helped us arrange a video camera which we used during the entire day to film all talks and the contest. I can’t promise any delivery time for them but I’ll work on getting them made public as soon as possible. I’ll make a separate blog post when there’s something to see. (All talks were in Swedish!)

At 11:30 I started off the day for real by holding the first presentation. We used one of the conference rooms for this, just next to the big room where everyone say hacking. This day we had removed all tables and only had chairs in the room movie theater style and it turned out we could fit just about all attendees in the room this way. I think that was good as I think almost everyone sat down to hear and see me:

Open Source in Embedded Systems

daniel talks open source I did a rather non-technical talk about a couple of trends in the embedded operating systems market and how I see the upcoming future and then some additional numbers etc. The full presentation (with most of the text in Swedish) can be found on slideshare.

I got good questions and I think it turned out an interesting discussion on how things run and work these days.

After my talk (which I of course did longer than planned) we served lunch. Three different sallads, bread and stuff were brought out. Several people approached me to say how they appreciated the food so I must say that Enea managed really well on that account too!

Development and trends in multicore CPUs

jonas talks about CPUs
Jonas Svennebring from Freescale was up next and talked about current multicore CPU development trends and what the challenges are for the manufacturers are today. It was a very good and very technical talk and he topped it off by showing off his board with T4240 running, Freecale’s latest flagship chip that is just now about to become available for companies outside of Freescale.

T4240 from FreeescaleOn this photo on the left you see the power supply in the foreground and the ATX board with a huge fan and cooler on top of the actual T4240 chip.

T4240 is claimed to have a new world record in coremark performance, features 12 hyper-threaded ppc cores in up to 1.8GHz.

There were some good questions to Jonas and he delivered good and well thought out answers. Then people walked out in the big room again to continue getting some actual hacking done.

We then took the opportunity to hand out the very nice-looking tshirts to all attendees, again kindly done so by Enea.

The Contest

The next interruption was the contest. Designed entirely by me to allow everyone to participate, even my friends and Enea employees etc. On the photo on the right you can see I now wear the tshirt of the day.

the contest
The contest was hard. I knew it was hard as I wanted it really make it a race that was only for the ones who really get embedded linux and have their brain laid out properly!

I posted the entire contest in separate blog post, but the gist of it was that I presented 16 questions with 3 answer alternatives. Each alternative had a sequence of letters. So after 16 questions you had 16 letter sequences you had to put in the right order to get a 17th question. The first one to give a correct answer to that 17th question would win.

A whole bunch of people gave up immediately but there was a core group who really fought hard, long and bravely and in the end we got a winner. The winner had paired up so the bottle of champagne went jointly to Klas and Jonas. It was a very close call as others were within seconds of figuring it out too.

I think the competition was harder than I thought. Possibly a little too hard…

Your own code on others’ hardware

linus talksLinus from Haxx (who shouldn’t be much of a stranger to readers of this blog) then gave some insights on how he reversed engineered mp3 players for the Rockbox project. Reverse engineering is a subject that attracts many people and I believe it has some sort of magic aura around it. Again many good questions and interested people in the room.

Linus bare targets as seen during his talk On the photo on the right you can see Linus’ stripped down hardware which he explained he had ripped off all components from in order to properly hunt down how things were connected on the PCB.

Coffee

We did not keep the time schedule so we had to get the coffee break in after Linus, and there were buns and so on.

Yocto

Björn from Haxx then educated the room on the Yocto Project. What it is, why it is, who it is and a little about how it is designed and how it works etc.

bjorn talks on yocto

I think perhaps people started to get a little soft in their brain as we had now blasted through all but one of the talks, and as a speaker finale we had Henrik…

u-boot on Allwinner A10

Henrik Nordström did a walk-through explaining some u-boot basics and then explained what he had done for the Allwinner targets and related info.

Henrik talks u-boot
I believe the talks were kind of the glue that made people stick around. Once Henrik was done and there was no more talks planned for the day, it was obvious that it was sort of the signal for people to start calling it a day even though there was still over one hour left until the official end time (20:00).

Henriks hardware
Of course I don’t blame anyone for that. I had hardly had any time myself to sit down or do anything relaxing during the day so I was kind of exhausted myself…

Summary

I got a lot of very positive comments from people when they left the facilities with big smiles on their faces, asking for more of these sorts of events in the future.

The back of the Enea tshirtI am very happy with the overly positive response, with the massive interest from our community to come to such an event and again, Enea was an awesome sponsor for this.

Talk audienceI didn’t get anything done on the raspberry pi during this day. As a matter of fact I never even got around to booting my board, but I figure that wasn’t a top priority for me this day.

The crowd size felt really perfect for these facilities and 40 something also still keeps the spirit of familiarity and it doesn’t feel like a “big” event or so.

Will I work on making another event similar to this again? Sure. It might not happen immediately, but I don’t see why it can’t be made again under similar circumstances.

Credits

rpi accessed with tabletAll photos on this page were taken by me, Björn Stenberg, Kjell Ericson, Mats Lidell and Mia Åkerström.

Thanks to Jonas, Björn, Linus and Henrik for awesome talks.

Thanks to Enea for sponsoring this event, and Mia then in particular for being a good organizer.

WSAPoll is broken

Microsoft admits the WSApoll function is broken but won’t do anything about it. Unless perhaps if customers keep nagging them.

Doing portable socket programming has always meant using a bunch of #ifdefs and similar. A program needs to be built on many systems and slowly get adjusted to work really well all over. For ages, for example, Windows only supported select() and not poll() while all sensible systems[*] out there supported poll(). There are several reasons to prefer poll to select when writing code.

Then one day in 2006, Chad Charlin, a developer at Microsoft wrote the following when talking about the new WSApoll() function they introduced in Windows Vista:

Among the many improvements to the Winsock API shipping in Vista is the new WSAPoll function. Its primary purpose is to simplify the porting of a sockets application that currently uses poll() by providing an identical facility in Winsock for managing groups of sockets.

Great! Starting September 2006 curl started using it (shipped in the release curl and libcurl 7.16.0). It seemed like a huge step forward, and as Chad wrote:

If you have experience developing applications using poll(), WSAPoll will be very familiar. It is designed to behave just like poll().

Emphasis added by me. It was (of course) made to work like poll, and that’s why the API is made like that. Why would you introduce something that is almost like poll() except in minor details?

Since the new function only was available in Vista and later, it took a while until libcurl users in a more wider scale got to use it but over time Windows XP users are slowly shifting away and more and more libcurl Windows users therefore use the WSApoll based builds. Life seemed to be good. Some users noticed funny things and reported bugs we couldn’t repeat (on other platforms) but nothing really stood out and no big alarm bells went off.

During July 2012, a user of libcurl on Windows, Jan Koen Annot experienced such problems and he didn’t just sigh and move on. He rolled up his sleeves and decided to get to the bottom. Perhaps he could fix a bug or two while at it? (It seems reasonable that he thought so, I haven’t actually asked him!) What he found was however not a bug in libcurl. He found out that WSApoll did indeed not work like poll (his initial post to curl-library on the problem)! On August 1st he submitted a support issue to Microsoft about it. On August 7 we pushed the commit to curl that removed our use of WSApoll.

A few days go Jan reported back on how the case has gone, where his journey down the support alleys took him.

It turns out Microsoft already knew about this bug, which they apparently have named “Windows 8 Bugs 309411 – WSAPoll does not report failed connections”. The ticket has been resolved as Won’t Fix… (I haven’t found any public access of this.)

Jan argued for the case that since WSApoll is designed and used as a plain poll() replacement it would make sense to actually make it also work the same way:

First, it will cost much time to find out that some ‘real-life’ issue can be traced back to this WSAPoll bug. In my case we were lucky to have a regression test which triggered when we started using a slightly different cURL-library configuration on Windows. Tracing back that the test was triggered because of this bug in WSAPoll took several hours. Imagine what it would cost, if some customer in the field reported annoying delays, to trace such a vague complaint back to a bug in the WSAPoll function!

Second, even if we know beforehand about this bug in WSAPoll, then it is difficult to determine in which situations in your code you can safely use WSAPoll and in which situations you might suffer from this bug. So a better recommendation would be to simply not use WSAPoll. (…)

Third, porting code which uses the poll() function to the Windows sockets API is made more complex. The introduction of WSAPoll was meant specifically for this, so it should have compatible behavior, without a recommendation to not use it in certain circumstances.

Fourth, your recommendation will only have effect when actively promoted to developers using WSAPoll. A much better approach would be to repair the bug and publish an update. Microsoft has some nice mechanisms in place for that.

So, my conclusion is that, even if in our case the business impact may be low because we found the bug in an early stage, it is still important that Microsoft fixes the bug and publishes an update.

In my eyes all very good and sensible arguments. Perhaps not too surprisingly, these fine reasons didn’t have any particular impact on how Microsoft views this old and known bug that “has been like this forever and people are already used to it.”. It will remain closed, and Microsoft motivated this decision to Jan quite clearly and with arguments one can understand:

A discussion has been conducted around this topic and the taken decision was not to have the fix implemented due to the following reasons:

  • This issue since Vista
  • no other Microsoft customer has asked for a Hotfix since Vista timeframe
  • fixing this old issue might have some application compatibility risk (for those customers who might have somehow taken a dependency on WSAPoll failing with a timeout in the cases of connect failure as opposed to POLLERR).
  • This API will become more irrelevant as the Windows versions increase; the networking APIs will move away from classic select/poll to more advanced I/O completion mechanisms.

Argument one and two are really weak and silly. Microsoft users are very rarely complaining to Microsoft and most wouldn’t even know how to do it. Also, this problem may certainly still affect many users even if nobody has asked for a fix.

The compatibility risk is a valid point, but that’s a bit of a hard argument to have. All bugs that are about behavior will of course risk that users have adapted to the wrong behavior so a bug fix may break those. All of us who write and maintain stable APIs are used to this problem, but sticking to the buggy way of working because it has been doing this for so long is in my eyes only correct if you document this with very large letters and emphasis in all documentation: WSApoll is not fully emulating poll – beware!

The fact that they will focus more on other APIs is also understandable but besides the point. We want reliable APIs that work as documented. Applications that are Windows-only probably already very rarely use WSApoll, it will probably remain being more important for porting socket style programs to Windows.

Jan also especially highlights a funny line from this Microsoft person:

The best way to add pressure for a hotfix to be released would be to have the customers reporting it again on http://connect.microsoft.com.

Okay, so even if they have motives why they won’t fix this bug they seem to hint that if more customers nag them about it they might change their minds. Fair enough. But the users of libcurl who for five years perhaps experienced funny effects are extremely unlikely to ever report and complain to Microsoft about this. They are way more likely to complain to us, or possibly to just work around the issue somehow.

Of course, users of WSApoll can adapt to the differences and make conditional code that handles them and that could be what we end up with in the curl project in the future if we just get volunteers to adapt the code accordingly. In the mean time we’ve just reverted to the old select()-using code instead, since select() does in fact mimic the “real” select much better…

[*] = clearly Mac OS X is not a sensible system since its poll() implementation is even worse than Windows and is mostly broken or just unreliable. Subject for another blog post another time.

Update

In 2023, a user made me aware that the Microsoft documentation now says:

Note  As of Windows 10 version 2004, when a TCP socket fails to connect, (POLLHUP | POLLERR | POLLWRNORM) is indicated.

Maybe it is time to do new tests.

What, me over-analyzing?

fiber cableI got fiber installed to my house a few months ago. 100/100 mbit is very nice.

My first speed checks using the Swedish bredbandskollen service were a bit disappointing since it only showed something like 50mbit down and 80mbit up. I decided to ignore that fact for the moment as things were new and I had some other more annoying issues.

I detected that sometimes, on some specific sites I had problems to get HTTP! The TCP connection would get connected and data would get sent, but it would stall somewhere and then get disconnected. This showed up when for example my wife tried to download a Spotify client and my phone got trouble to download some of my favorite podcasts. Some, not all. Possibly one out of ten or twenty sites showed the problem. Most of the sites I use frequently worked flawlessly and I would only ever see the problem when I tried HTTP.

Puzzling!

I could avoid the problem by setting up a SSH connection to my server and running that as a SOCKS proxy, and so I could still get service even from the sites I apparently couldn’t quite use. I tried to collect the problematic sites and I tried traceroute etc on all the ones that failed in order to get data that would help me and my operator to pinpoint the problem. I reported this problem to my ISP really early on, but they too were puzzled and it never got far in their end.

I was almost convinced they had some kind of traffic shaping thing in the middle that was broken for HTTP somehow.

Time to fix it once and for all

Finally one day I stopped being nice with the support people and I demanded that they would send over a guy and fix it. It wasn’t good enough that they would find that everything is OK from remote. I clearly had problems and I could escape the problems by switching over to my old ADSL so I knew it wasn’t due to my computers’ configs or due to my own firewalls or routers.

I was also convinced my ISP would get me some cable guy coming over when the problem wasn’t really in the cable, but sure it would be a necessary first step towards finding the real error.

They sent a cable guy, and it took him like three minutes to detect a bad signal level on the fiber, meaning that the problem was certainly not in my house anyway. He then drove down to the “station” that terminates my fiber, and after having “polished” that end of the fiber connection too he called me and said that he couldn’t spot any problem anymore and asked me to verify…

Now my checks showed me 80-90 mbit in both directions and the sites that used to give me problems all worked just fine.

It was the cable all along. Bad signal level. “A mystery it worked at all for you” my cable person said.

I’m left with my over-analyzed problem suspecting all sorts of high level stuff. But why did this only affect some sites? How come I could circumvent this with SOCKS? Gah, my brain hurts trying to answer these questions…

Anyway, now it all works and the family is happy again.

SSL verification still often disabled

SSL padlockBack in 2002 I realized that having libcurl not do SSL server verification by default basically meant that everyone writing libcurl apps would inherit that flaw, simply because most people always just let the defaults remain unless they really have to read up on what something does and then modify them. If things work, things will just remain. So when we shipped libcurl 7.10 on the first of October that year, libcurl started verifying server certs by default.

Fast forward about ten years.

Surely SSL clients everywhere now do the right thing?

One day a couple of months ago, I was referred to this bug report for the pyssl module in Python which identifies that it doesn’t verify server certs by default! The default SSL handler in Python doesn’t verify the certificate properly. It makes all python programs that use this without special attention vulnerable for man in the middle attacks.

So let’s look at the state of another popular language: PHP. A plain standard PHP program opens a ssl:// or tls:// stream. Unless the author of said program knows and understands these things, it too runs without verifying server certs. If a program instead decides to use the PHP/CURL binding for HTTPS or similar, it will use libcurl’s default which verifies it (as I explained above).

But not everything is gloomy. Some parts of our community have decided to do the right thing:

I was told (and proven) that Ruby now does the right thing, but I don’t know how recent that is and thus how many older Ruby programs that suffer.

The same problem existed with perl’s major HTTPS using module, the LWP, for a very long time. The perl camp however already modified LWP to do verification by default with the release of libwww-perl 6.00, released in March 2011.

Side-note: in the curl project we make it easy for everyone on the Internet to use Firefox’s excellent CA cert bundle to verify server certs by providing the Firefox CA cert collection converted to PEM – the preferred format for OpenSSL, GnuTLS and others.

Conclusion:

Even today, lots and lots of applications and scripts will remain insecure – even though they probably think they’re fairly safe when they switch to a HTTPS or SSL using protocol –  and might be subject for man-in-the-middle attacks without even being able to spot it. I think it is pretty sad.

Introducing curl_multi_wait

Facebook contributes fix to libcurl’s multi interface to overcome problem with more than 1024 file descriptors.

When we introduced the multi interface to libcurl about (what feels like) one hundred years ago, we went with simple in some ways. One way it shows: an application that wants to do many transfers in parallel asks libcurl to do it, and then it extracts the set of file descriptors (sockets!) from libcurl (using curl_multi_fdset) to wait for as plain fd_sets. fd_set is the variable type made for select(). This API choice made applications pretty much forced to use select. select() has  its fair share of problems, where possibly the biggest one is that it has problems with file descriptors > 1024.

Later on we introduced an enhanced version of the multi interface for libcurl that allows an application to use whatever method it pleases. I tend to refer to that variation as the multi_socket API after its main function curl_multi_socket_action. That’s the high performance, event-driven API.

As you may be aware, event-driven code make things a bit more complicated at times so many people still prefer to use the older and simpler multi interface and thus they were forced to remain using select(). But now that era has ended. Now…

curl_multi_wait() is introduced!

This poll(3)-like function basically works as a replacement for curl_multi_fdset() + select(). Starting in libcurl 7.28.0 (strictly speaking in commit de24d7bd4c03ea3), this is a function that any application can use for this purpose, and thus avoid the problem with many file descriptors.

This new function doesn’t use any struct from the “real” poll() or associated headers to make sure that it works even for systems without a real poll() implementation. It instead uses private curl versions of both the struct and the defines used. An application can of course also tell curl_multi_wait to wait for a set of private file descriptors, just like poll() or select().

The patch set that brought this function was provided by Sara Golemon, a friend from from a related project

cURL

ptest because “make test” is insufficient

CAUTION: test in progressMuch thanks to autoconf and automake we have an established more or less standardized way to build and install tools, libraries and other software. We build them with ‘make’ and we install them with ‘make install‘. This works great and it works equally fine even when we build stuff cross-compiled.

For testing however, the established concept and procedure is not as good. For testing we have ‘make test’ or ‘make check’ which typically first builds whatever needs to get built for the tests to run and then it runs all tests.

This is not good enough

Why? Because in lots of use-cases we build software using a cross-compiler on a build system that can’t run the executables. Therefore we need to first build the tests, then install the tests (somewhere that is reachable from the target system) and finally execute them. These steps need to be possible to run independently since at least the building and installing will sometimes happen on a different host than the execution of the tests.

yocto-projectIntroducing ptest

Within the yocto project, Björn Stenberg has pushed for ptest to be the basis of this new reform and concept. The responses he’s gotten so far has been positive and there’s a pending updated patch to be posted to the upstream oe-core list soon.

The work does not end there

Even if or when this can be incorporated into OpenEmbedded and Yocto – and I really think it is a matter of when since I believe we can work out all the flaws and quirks until virtually everyone involved is happy. The bulk of the changes however, really should be done upstream, in hundreds and thousands of open source packages. We (as upstream open source projects) need to start doing testing in at least two different steps, where one step build everything that needs to be built for the tests and then a second step that run the suite. The two steps could then in a cross-compiled scenario get executed first on the host system and then on the target system.

I expect that this will mean a whole bunch of patches and scripts to have to be maintained within OpenEmbedded for a while, when things will be tried to get merged into upstream projects and I also foresee that a certain percentage of all projects just won’t accept this new approach and will reject all patches in this vein.

Output format

I think the most controversial part of these suggested “universal” changes is the common test suite output format. The common format is of course required so that we can “supervise” the output and results from any package without having to know any specifics.

While the ptest output format follows the automake test output syntax, I expect many projects that have selected a particular output format to rather stick with that. Hopefully we can then make projects introduce a separate make target or option that runs the test suite with the standard output format.

One little step forward

Building full-fledged Linux distributions cross-compiled that are completely tested on target will remain being hard work for a while more. But we are improving things, one step at a time.

Of course, the name ‘ptest’ is what the system is currently called by Björn within the yocto/OE environment. It is not supposed to be a catchy name for this idea outside of there. The ‘P’ refers to package, as opposed to for example system test and to make it less generic than simply test.

Screen scraping expert witness

This is a slightly edited version of a genuine email I received in May 2012:

Dear Mr. Stenberg –

I recently came across the text you co-authored with Michael Schrenk, Webbots, Spiders, and Screen Scrapers, and was wondering if you might be interested in being a paid expert witness in a lawsuit we’re handling.

One of the major claims in the suit is unauthorized computer access in the form of a massive, multi-year campaign of screen scraping, and we’re looking for a qualified expert who can make the activity make sense to a jury (in the unlikely event that this matter reaches trial; fewer than 2% of cases do, in federal court).

We’re in Los Angeles, California, as is the case (and naturally would cover travel expenses, an hourly or per diem expert witness fee, etc). If you’re interested (or even if you’re not), please let me know? You can reach me via email or at (xyz) xyz-xyzx.

Many thanks,
[withheld]

Link to the book.

I responded to this mail saying that I’d rather not due to the distance and travel it’d require, but I never heard back from them again so I have no idea whatever happened in this case or who got to be the expert in the end…

curl ‘techJobs’ SanFran

I got this awesome curl sighting from the US101, near SFO posted to me (click for a slightly larger version) a few days ago – showing curl on a very large billboard in an ad for Dice.

It is clearly meant to look like a unix style command line that invokes curl. The command line would only be syntactically correct if the user truly has two host names named “techJobs” and “SanFran” and curl would attempt to get their root HTTP document off them on port 80.

It gets followed by that C++/C99 comment that really is an odd added context.

To me, all the sign shows is a company that desperately tries to look techy but in trying too hard it fails really miserably. I’m really honored my work found its way to such a high profile spot though!

Follow-up, September 11th:

Much to my surprise, today I received an email from Dice responding to my “critique” above. I have to say that this response was much more than I expected and I tip my hat to them for this response:

Hi Daniel,

I’m the marketing director for Dice.com and I wanted to reach out to you to thank you for spotting our billboard error on the 101. We are deeply embarrassed by this mistake to say the least. In a classic coding scenario, our QA failed us. Unfortunately for us, we bought this spot long-term and we are trying to figure out how quickly we can replace the content.

We’ve been working this year to do a better job communicating in a more authentic way in our marketing efforts. Obviously, making a mistake like this makes it look pathetically inauthentic. At the end of the day, getting it wrong is inexcusable, particularly in such a visible way.

(photo by Stan van de Burgt, emphasis added by me)