Embedded Linux hacking day

eneaOn September 10th, I sent out the invite to the foss-sthlm community for an embedded hacking event just before lunch. In just four hours, the 40 available tickets had been claimed and the waiting list started to get filled up as well… I later increased the amount to 46, we had some cancellations and I handed out more tickets and we had 46 people signed up at the day of the event (I believe 3 of these didn’t show up). At the day the event started, we still had another 20 people in the waiting list with hopes of getting a spot!

(All photos in this post are scaled down versions, click the picture to see a slightly higher resolution version!)

In Enea we had found an excellent sponsor for this event. They provided the place, the food, the raspberry pis, the coffe, the tshirts, the infrastructure and everything else that had to be there to make it an awesome day.

the-roomWe started off the event at 10:00 on October 20 in the Enea offices in Kista, Stockholm Sweden. People dropped in one by one and were handed their welcome present containing a raspberry pi board, a 2GB SD card and a USB-to-serial cable to interface/power the board with. People then found their seats in the room.

There were fruit, candy, water and coffee to start off and keep the mood high. We experienced some initial wifi and internet access problems but luckily we had no less than two dedicated Enea IT support people present and they could swiftly fix the little hiccups that occurred.

coffee machines

Once everyone seemed to have landed, I welcomed everyone and just gave a short overview of what to expect from the day, where the toilets are and so on.

In order to try to please everyone who couldn’t be with us at this event, due to plans or due to simply not having got one of the attractive 40 “tickets”, Björn the cameramanEnea helped us arrange a video camera which we used during the entire day to film all talks and the contest. I can’t promise any delivery time for them but I’ll work on getting them made public as soon as possible. I’ll make a separate blog post when there’s something to see. (All talks were in Swedish!)

At 11:30 I started off the day for real by holding the first presentation. We used one of the conference rooms for this, just next to the big room where everyone say hacking. This day we had removed all tables and only had chairs in the room movie theater style and it turned out we could fit just about all attendees in the room this way. I think that was good as I think almost everyone sat down to hear and see me:

Open Source in Embedded Systems

daniel talks open source I did a rather non-technical talk about a couple of trends in the embedded operating systems market and how I see the upcoming future and then some additional numbers etc. The full presentation (with most of the text in Swedish) can be found on slideshare.

I got good questions and I think it turned out an interesting discussion on how things run and work these days.

After my talk (which I of course did longer than planned) we served lunch. Three different sallads, bread and stuff were brought out. Several people approached me to say how they appreciated the food so I must say that Enea managed really well on that account too!

Development and trends in multicore CPUs

jonas talks about CPUs
Jonas Svennebring from Freescale was up next and talked about current multicore CPU development trends and what the challenges are for the manufacturers are today. It was a very good and very technical talk and he topped it off by showing off his board with T4240 running, Freecale’s latest flagship chip that is just now about to become available for companies outside of Freescale.

T4240 from FreeescaleOn this photo on the left you see the power supply in the foreground and the ATX board with a huge fan and cooler on top of the actual T4240 chip.

T4240 is claimed to have a new world record in coremark performance, features 12 hyper-threaded ppc cores in up to 1.8GHz.

There were some good questions to Jonas and he delivered good and well thought out answers. Then people walked out in the big room again to continue getting some actual hacking done.

We then took the opportunity to hand out the very nice-looking tshirts to all attendees, again kindly done so by Enea.

The Contest

The next interruption was the contest. Designed entirely by me to allow everyone to participate, even my friends and Enea employees etc. On the photo on the right you can see I now wear the tshirt of the day.

the contest
The contest was hard. I knew it was hard as I wanted it really make it a race that was only for the ones who really get embedded linux and have their brain laid out properly!

I posted the entire contest in separate blog post, but the gist of it was that I presented 16 questions with 3 answer alternatives. Each alternative had a sequence of letters. So after 16 questions you had 16 letter sequences you had to put in the right order to get a 17th question. The first one to give a correct answer to that 17th question would win.

A whole bunch of people gave up immediately but there was a core group who really fought hard, long and bravely and in the end we got a winner. The winner had paired up so the bottle of champagne went jointly to Klas and Jonas. It was a very close call as others were within seconds of figuring it out too.

I think the competition was harder than I thought. Possibly a little too hard…

Your own code on others’ hardware

linus talksLinus from Haxx (who shouldn’t be much of a stranger to readers of this blog) then gave some insights on how he reversed engineered mp3 players for the Rockbox project. Reverse engineering is a subject that attracts many people and I believe it has some sort of magic aura around it. Again many good questions and interested people in the room.

Linus bare targets as seen during his talk On the photo on the right you can see Linus’ stripped down hardware which he explained he had ripped off all components from in order to properly hunt down how things were connected on the PCB.

Coffee

We did not keep the time schedule so we had to get the coffee break in after Linus, and there were buns and so on.

Yocto

Björn from Haxx then educated the room on the Yocto Project. What it is, why it is, who it is and a little about how it is designed and how it works etc.

bjorn talks on yocto

I think perhaps people started to get a little soft in their brain as we had now blasted through all but one of the talks, and as a speaker finale we had Henrik…

u-boot on Allwinner A10

Henrik Nordström did a walk-through explaining some u-boot basics and then explained what he had done for the Allwinner targets and related info.

Henrik talks u-boot
I believe the talks were kind of the glue that made people stick around. Once Henrik was done and there was no more talks planned for the day, it was obvious that it was sort of the signal for people to start calling it a day even though there was still over one hour left until the official end time (20:00).

Henriks hardware
Of course I don’t blame anyone for that. I had hardly had any time myself to sit down or do anything relaxing during the day so I was kind of exhausted myself…

Summary

I got a lot of very positive comments from people when they left the facilities with big smiles on their faces, asking for more of these sorts of events in the future.

The back of the Enea tshirtI am very happy with the overly positive response, with the massive interest from our community to come to such an event and again, Enea was an awesome sponsor for this.

Talk audienceI didn’t get anything done on the raspberry pi during this day. As a matter of fact I never even got around to booting my board, but I figure that wasn’t a top priority for me this day.

The crowd size felt really perfect for these facilities and 40 something also still keeps the spirit of familiarity and it doesn’t feel like a “big” event or so.

Will I work on making another event similar to this again? Sure. It might not happen immediately, but I don’t see why it can’t be made again under similar circumstances.

Credits

rpi accessed with tabletAll photos on this page were taken by me, Björn Stenberg, Kjell Ericson, Mats Lidell and Mia Åkerström.

Thanks to Jonas, Björn, Linus and Henrik for awesome talks.

Thanks to Enea for sponsoring this event, and Mia then in particular for being a good organizer.

decopperfied

copper

Today I disconnected my ADSL modem through which I still had the landline telephone service running. I’ve cancelled the service so at the end of the month it will die and I’m now instead signing up to one of them lesser known companies that offer cheap IP-telephony that is network independent: Affinity telecom. Staying with only phone service over the copper was not a good idea since the cheapest service is still expensive over that medium.

Why have a landline phone at all? I’ve decided to stick with a landline telephone for a few reasons: first, our phone number is distributed widely and it is convenient to keep it working and keeping it reach our house instead of a specific person. Secondly, we have two kids who like to phone at times and they can use this fine and it’ll end up cheaper than if they’d use their own mobile phones. And thirdly, the parents-in-law factor. My parents and my wife’s parents etc like calling landlines instead of mobile phones, I think partly due to old habits but also partly because it is cheaper for them…

Ping Communication Voice Catcher 201E

When I researched which service to pick I discovered that not a single operator exists in Sweden that only charges for usage. They all have a minimum monthly fee, so I went with one of the cheapest I could find on prisjakt for my usage pattern.

I simply yanked the RJ11 from the ADSL modem and inserted into my new “Ping Communications Voice Catcher 201E” device (see picture) that I received. The instructions that come with the device say I should connect it directly to internet and then let my existing LAN traffic route through it. I don’t want to add yet another box between me and the internet and I fear that a cheapo box like this might cause problems in one way or another if I do, so I just plugged this new toy into my router and after I while I could happily confirm that it worked just as nicely. It got an IP from my DHCP server and I can call my old-style analog phone now (I love number portability) and I can use it to call out to my mobile.

I’m curious to see how good/bad this is going to work…

Update: It didn’t work to make the box a regular DHCP client. Even if I made it into a DMZ it still wouldn’t work to accept calls. I would only get a signal and everything, but once I answered the phone there was no sound. In the end I moved it to sit between “internet” and my local router and now my phone seems to be doing fine.

WSAPoll is broken

Microsoft admits the WSApoll function is broken but won’t do anything about it. Unless perhaps if customers keep nagging them.

Doing portable socket programming has always meant using a bunch of #ifdefs and similar. A program needs to be built on many systems and slowly get adjusted to work really well all over. For ages, for example, Windows only supported select() and not poll() while all sensible systems[*] out there supported poll(). There are several reasons to prefer poll to select when writing code.

Then one day in 2006, Chad Charlin, a developer at Microsoft wrote the following when talking about the new WSApoll() function they introduced in Windows Vista:

Among the many improvements to the Winsock API shipping in Vista is the new WSAPoll function. Its primary purpose is to simplify the porting of a sockets application that currently uses poll() by providing an identical facility in Winsock for managing groups of sockets.

Great! Starting September 2006 curl started using it (shipped in the release curl and libcurl 7.16.0). It seemed like a huge step forward, and as Chad wrote:

If you have experience developing applications using poll(), WSAPoll will be very familiar. It is designed to behave just like poll().

Emphasis added by me. It was (of course) made to work like poll, and that’s why the API is made like that. Why would you introduce something that is almost like poll() except in minor details?

Since the new function only was available in Vista and later, it took a while until libcurl users in a more wider scale got to use it but over time Windows XP users are slowly shifting away and more and more libcurl Windows users therefore use the WSApoll based builds. Life seemed to be good. Some users noticed funny things and reported bugs we couldn’t repeat (on other platforms) but nothing really stood out and no big alarm bells went off.

During July 2012, a user of libcurl on Windows, Jan Koen Annot experienced such problems and he didn’t just sigh and move on. He rolled up his sleeves and decided to get to the bottom. Perhaps he could fix a bug or two while at it? (It seems reasonable that he thought so, I haven’t actually asked him!) What he found was however not a bug in libcurl. He found out that WSApoll did indeed not work like poll (his initial post to curl-library on the problem)! On August 1st he submitted a support issue to Microsoft about it. On August 7 we pushed the commit to curl that removed our use of WSApoll.

A few days go Jan reported back on how the case has gone, where his journey down the support alleys took him.

It turns out Microsoft already knew about this bug, which they apparently have named “Windows 8 Bugs 309411 – WSAPoll does not report failed connections”. The ticket has been resolved as Won’t Fix… (I haven’t found any public access of this.)

Jan argued for the case that since WSApoll is designed and used as a plain poll() replacement it would make sense to actually make it also work the same way:

First, it will cost much time to find out that some ‘real-life’ issue can be traced back to this WSAPoll bug. In my case we were lucky to have a regression test which triggered when we started using a slightly different cURL-library configuration on Windows. Tracing back that the test was triggered because of this bug in WSAPoll took several hours. Imagine what it would cost, if some customer in the field reported annoying delays, to trace such a vague complaint back to a bug in the WSAPoll function!

Second, even if we know beforehand about this bug in WSAPoll, then it is difficult to determine in which situations in your code you can safely use WSAPoll and in which situations you might suffer from this bug. So a better recommendation would be to simply not use WSAPoll. (…)

Third, porting code which uses the poll() function to the Windows sockets API is made more complex. The introduction of WSAPoll was meant specifically for this, so it should have compatible behavior, without a recommendation to not use it in certain circumstances.

Fourth, your recommendation will only have effect when actively promoted to developers using WSAPoll. A much better approach would be to repair the bug and publish an update. Microsoft has some nice mechanisms in place for that.

So, my conclusion is that, even if in our case the business impact may be low because we found the bug in an early stage, it is still important that Microsoft fixes the bug and publishes an update.

In my eyes all very good and sensible arguments. Perhaps not too surprisingly, these fine reasons didn’t have any particular impact on how Microsoft views this old and known bug that “has been like this forever and people are already used to it.”. It will remain closed, and Microsoft motivated this decision to Jan quite clearly and with arguments one can understand:

A discussion has been conducted around this topic and the taken decision was not to have the fix implemented due to the following reasons:

  • This issue since Vista
  • no other Microsoft customer has asked for a Hotfix since Vista timeframe
  • fixing this old issue might have some application compatibility risk (for those customers who might have somehow taken a dependency on WSAPoll failing with a timeout in the cases of connect failure as opposed to POLLERR).
  • This API will become more irrelevant as the Windows versions increase; the networking APIs will move away from classic select/poll to more advanced I/O completion mechanisms.

Argument one and two are really weak and silly. Microsoft users are very rarely complaining to Microsoft and most wouldn’t even know how to do it. Also, this problem may certainly still affect many users even if nobody has asked for a fix.

The compatibility risk is a valid point, but that’s a bit of a hard argument to have. All bugs that are about behavior will of course risk that users have adapted to the wrong behavior so a bug fix may break those. All of us who write and maintain stable APIs are used to this problem, but sticking to the buggy way of working because it has been doing this for so long is in my eyes only correct if you document this with very large letters and emphasis in all documentation: WSApoll is not fully emulating poll – beware!

The fact that they will focus more on other APIs is also understandable but besides the point. We want reliable APIs that work as documented. Applications that are Windows-only probably already very rarely use WSApoll, it will probably remain being more important for porting socket style programs to Windows.

Jan also especially highlights a funny line from this Microsoft person:

The best way to add pressure for a hotfix to be released would be to have the customers reporting it again on http://connect.microsoft.com.

Okay, so even if they have motives why they won’t fix this bug they seem to hint that if more customers nag them about it they might change their minds. Fair enough. But the users of libcurl who for five years perhaps experienced funny effects are extremely unlikely to ever report and complain to Microsoft about this. They are way more likely to complain to us, or possibly to just work around the issue somehow.

Of course, users of WSApoll can adapt to the differences and make conditional code that handles them and that could be what we end up with in the curl project in the future if we just get volunteers to adapt the code accordingly. In the mean time we’ve just reverted to the old select()-using code instead, since select() does in fact mimic the “real” select much better…

[*] = clearly Mac OS X is not a sensible system since its poll() implementation is even worse than Windows and is mostly broken or just unreliable. Subject for another blog post another time.

Update

In 2023, a user made me aware that the Microsoft documentation now says:

Note  As of Windows 10 version 2004, when a TCP socket fails to connect, (POLLHUP | POLLERR | POLLWRNORM) is indicated.

Maybe it is time to do new tests.

What, me over-analyzing?

fiber cableI got fiber installed to my house a few months ago. 100/100 mbit is very nice.

My first speed checks using the Swedish bredbandskollen service were a bit disappointing since it only showed something like 50mbit down and 80mbit up. I decided to ignore that fact for the moment as things were new and I had some other more annoying issues.

I detected that sometimes, on some specific sites I had problems to get HTTP! The TCP connection would get connected and data would get sent, but it would stall somewhere and then get disconnected. This showed up when for example my wife tried to download a Spotify client and my phone got trouble to download some of my favorite podcasts. Some, not all. Possibly one out of ten or twenty sites showed the problem. Most of the sites I use frequently worked flawlessly and I would only ever see the problem when I tried HTTP.

Puzzling!

I could avoid the problem by setting up a SSH connection to my server and running that as a SOCKS proxy, and so I could still get service even from the sites I apparently couldn’t quite use. I tried to collect the problematic sites and I tried traceroute etc on all the ones that failed in order to get data that would help me and my operator to pinpoint the problem. I reported this problem to my ISP really early on, but they too were puzzled and it never got far in their end.

I was almost convinced they had some kind of traffic shaping thing in the middle that was broken for HTTP somehow.

Time to fix it once and for all

Finally one day I stopped being nice with the support people and I demanded that they would send over a guy and fix it. It wasn’t good enough that they would find that everything is OK from remote. I clearly had problems and I could escape the problems by switching over to my old ADSL so I knew it wasn’t due to my computers’ configs or due to my own firewalls or routers.

I was also convinced my ISP would get me some cable guy coming over when the problem wasn’t really in the cable, but sure it would be a necessary first step towards finding the real error.

They sent a cable guy, and it took him like three minutes to detect a bad signal level on the fiber, meaning that the problem was certainly not in my house anyway. He then drove down to the “station” that terminates my fiber, and after having “polished” that end of the fiber connection too he called me and said that he couldn’t spot any problem anymore and asked me to verify…

Now my checks showed me 80-90 mbit in both directions and the sites that used to give me problems all worked just fine.

It was the cable all along. Bad signal level. “A mystery it worked at all for you” my cable person said.

I’m left with my over-analyzed problem suspecting all sorts of high level stuff. But why did this only affect some sites? How come I could circumvent this with SOCKS? Gah, my brain hurts trying to answer these questions…

Anyway, now it all works and the family is happy again.

SSL verification still often disabled

SSL padlockBack in 2002 I realized that having libcurl not do SSL server verification by default basically meant that everyone writing libcurl apps would inherit that flaw, simply because most people always just let the defaults remain unless they really have to read up on what something does and then modify them. If things work, things will just remain. So when we shipped libcurl 7.10 on the first of October that year, libcurl started verifying server certs by default.

Fast forward about ten years.

Surely SSL clients everywhere now do the right thing?

One day a couple of months ago, I was referred to this bug report for the pyssl module in Python which identifies that it doesn’t verify server certs by default! The default SSL handler in Python doesn’t verify the certificate properly. It makes all python programs that use this without special attention vulnerable for man in the middle attacks.

So let’s look at the state of another popular language: PHP. A plain standard PHP program opens a ssl:// or tls:// stream. Unless the author of said program knows and understands these things, it too runs without verifying server certs. If a program instead decides to use the PHP/CURL binding for HTTPS or similar, it will use libcurl’s default which verifies it (as I explained above).

But not everything is gloomy. Some parts of our community have decided to do the right thing:

I was told (and proven) that Ruby now does the right thing, but I don’t know how recent that is and thus how many older Ruby programs that suffer.

The same problem existed with perl’s major HTTPS using module, the LWP, for a very long time. The perl camp however already modified LWP to do verification by default with the release of libwww-perl 6.00, released in March 2011.

Side-note: in the curl project we make it easy for everyone on the Internet to use Firefox’s excellent CA cert bundle to verify server certs by providing the Firefox CA cert collection converted to PEM – the preferred format for OpenSSL, GnuTLS and others.

Conclusion:

Even today, lots and lots of applications and scripts will remain insecure – even though they probably think they’re fairly safe when they switch to a HTTPS or SSL using protocol –  and might be subject for man-in-the-middle attacks without even being able to spot it. I think it is pretty sad.

this vs that and ssh through proxy

Taken from the web stats for daniel.haxx.se during September 2012. The top-10 search phrases used to end up on a page on this site:

  1. ssh proxy (198)
  2. curl vs wget (145)
  3. ftp vs http (92)
  4. wget vs curl (91)
  5. ssh through proxy (72)
  6. http vs ftp (67)
  7. curl wget (55)
  8. wget curl (53)
  9. http ftp (46)
  10. difference between ftp and http (45)

The top-3 most visited pages on my site during the same month were:

  1. SSH Through or Over Proxy (viewed 4800 times)
  2. curl vs Wget (viewed 3000 times)
  3. FTP vs HTTP (viewed 2300 times)

I guess this tells me something. I’m not sure what…

Three years of Haxx

Haxx logoAt October first, another full year of work at Haxx has been spent since I last summed up the past year (my previous posts about Haxx’s first year and second year). Three years working for Haxx full-time, and it has been another great year with lots of fun, challenges and us enjoying being independent.

During this year I ended my previous engagement with that large chip company and got a new assignment for the same customer both Björn and Linus were working for at the time. It has been a big adventure for me as I dove straight into unknown territories and I’ve spent my work days since then as a product manager, making an embedded Linux distribution. In this role I’ve travelled to US, China and South Korea during the year and I’m serving as a member of an advisory board in a related organization on behalf of my customer! I recently agreed to extending this contract to at least April 2013. Partly due to this new assignment I’ve not worked very much on foss-sthlm activities recently, but after the summer I’ve really made an effort to get this back up to speed.

Birthdaycake

Later during the year, Linus changed assignment to a new customer when we signed a sort of partnership contract with a leading global embedded software company and he then continued to do a whole series of little projects for them. After the summer Linus has grabbed a couple of curl related projects, partly still in progress.

Björn stuck around at the same customer during the entire year, and he’s been working as an engineer and developer in the team that actually makes the product I am a manager for.

Haxx towelThis year we made more Haxx merchandise. Towels, stickers and jackets have now been sent out in the world to make our name more visible in a few weird corners of the universe.

We visisted FSCONS 2011 and FOSDEM 2012, two really nice conferences for FOSS fans like us and we got to meet a lot of friends and like-minded people there.

We continue to see a demand on the market for highly skilled embedded developers, including embedded Linux and open source related activities. We wouldn’t mind extending our merry team, so we decided to document a list of requirements of what to have in order to get hired by us. So far not a single person has applied…

Snaxx 27

A pint of guinnessGoing strong after 12 years in the making. For the 27th time we’re gathering friends in the Stockholm Sweden area who are interested in technology, open source, beers, slightly inaccurate Monty Python quotes, reverse engineering electronics and similar very important topics. We might also have a beer or two and talk rubbish.

On October 31st 2012 we invite all and every of our tech oriented friends to visit

Snaxx-27

We figured the 27th time would be the perfect time to do something new, so we now host the information on the fine snaxx.se domain.

Introducing curl_multi_wait

Facebook contributes fix to libcurl’s multi interface to overcome problem with more than 1024 file descriptors.

When we introduced the multi interface to libcurl about (what feels like) one hundred years ago, we went with simple in some ways. One way it shows: an application that wants to do many transfers in parallel asks libcurl to do it, and then it extracts the set of file descriptors (sockets!) from libcurl (using curl_multi_fdset) to wait for as plain fd_sets. fd_set is the variable type made for select(). This API choice made applications pretty much forced to use select. select() has its fair share of problems, where possibly the biggest one is that it has problems with file descriptors > 1024.

Later on we introduced an enhanced version of the multi interface for libcurl that allows an application to use whatever method it pleases. I tend to refer to that variation as the multi_socket API after its main function curl_multi_socket_action. That’s the high performance, event-driven API.

As you may be aware, event-driven code make things a bit more complicated at times so many people still prefer to use the older and simpler multi interface and thus they were forced to remain using select(). But now that era has ended. Now…

curl_multi_wait() is introduced!

This poll(3)-like function basically works as a replacement for curl_multi_fdset() + select(). Starting in libcurl 7.28.0 (strictly speaking in commit de24d7bd4c03ea3), this is a function that any application can use for this purpose, and thus avoid the problem with many file descriptors.

This new function doesn’t use any struct from the “real” poll() or associated headers to make sure that it works even for systems without a real poll() implementation. It instead uses private curl versions of both the struct and the defines used. An application can of course also tell curl_multi_wait to wait for a set of private file descriptors, just like poll() or select().

The patch set that brought this function was provided by Sara Golemon, a friend from from a related project

cURL

curl, open source and networking