Category Archives: Network

Internet. Networking.

cURL and libcurl, Network, Open Source, Web

http2 in curl

March 9, 2014 Daniel Stenberg

While the first traces of http2 support in curl was added already back in September 2013 it hasn’t been until recently it actually was made useful. There’s been a lot of http2 related activities in the curl team recently and in the late January 2014 we could run our first command line inter-op tests against public http2 (draft-09) servers on the Internet.

There’s a lot to be said about http2 for those not into its nitty gritty details, but I’ll focus on the curl side of this universe in this blog post. I’ll do separate posts and presentations on http2 “internals” later.

A quick http2 overview

http2 (without the minor version, as per what the IETF work group has decided on) is a binary protocol that allows many logical streams multiplexed over the same physical TCP connection, it features compressed headers in both directions and it has stream priorities and more. It is being designed to maintain the user concepts and paradigms from HTTP 1.1 so web sites don’t have to change contents and web authors won’t need to relearn a lot. The web will not break because of http2, it will just magically work a little better, a little smoother and a little faster.

In libcurl we build http2 support with the help of the excellent library called nghttp2, which takes care of all the binary protocol details for us. You’ll also have to build it with a new enough version of the SSL library of your choice, as http2 over TLS will require use of some fairly recent TLS extensions that not many older releases have and several TLS libraries still completely lack!

The need for an extension is because with speaking TLS over port 443 which HTTPS implies, the current and former web infrastructure assumes that we will speak HTTP 1.1 over that, while we now want to be able to instead say we want to talk http2. When Google introduced SPDY then pushed for a new extension called NPN to do this, which when taken through the standardization in IETF has been forked, changed and renamed to ALPN with roughly the same characteristics (I don’t know the specific internals so I’ll stick to how they appear from the outside).

So, NPN and especially ALPN are fairly recent TLS extensions so you need a modern enough SSL library to get that support. OpenSSL and NSS both support NPN and ALPN with a recent enough version, while GnuTLS only supports ALPN. You can build libcurl to use any of these threes libraries to get it to talk http2 over TLS.

http2 using libcurl

(This still describes what’s in curl’s git repository, the first release to have this level of http2 support is the upcoming 7.36.0 release.)

Users of libcurl who want to enable http2 support will only have to set CURLOPT_HTTP_VERSION to CURL_HTTP_VERSION_2_0 and that’s it. It will make libcurl try to use http2 for the HTTP requests you do with that handle.

For HTTP URLs, this will make libcurl send a normal HTTP 1.1 request with an offer to the server to upgrade the connection to version 2 instead. If it does, libcurl will continue using http2 in the clear on the connection and if it doesn’t, it’ll continue using HTTP 1.1 on it. This mode is what Firefox and Chrome will not support.

For HTTPS URLs, libcurl will use NPN and ALPN as explained above and offer to speak http2 and if the server supports it. there will be http2 sweetness from than point onwards. Or it selects HTTP 1.1 and then that’s what will be used. The latter is also what will be picked if the server doesn’t support ALPN and NPN.

Alt-Svc and ALTSVC are new things planned to show up in time for http2 draft-11 so we haven’t really thought through how to best support them and provide their features in the libcurl API. Suggestions (and patches!) are of course welcome!

http2 with curl

Hardly surprising, the curl command line tool also has this power. You use the –http2 command line option to switch on the libcurl behavior as described above.

Translated into old-style

To reduce transition pains and problems and to work with the rest of the world to the highest possible degree, libcurl will (decompress and) translate received http2 headers into http 1.1 style headers so that applications and users will get a stream of headers that look very much the way you’re used to and it will produce an initial response line that says HTTP 2.0 blabla.

Building (lib)curl to support http2

See the README.http2 file in the lib/ directory.

This is still a draft version of http2!

I just want to make this perfectly clear: http2 is not out “for real” yet. We have tried our http2 support somewhat at the draft-09 level and Tatsuhiro has worked on the draft-10 support in nghttp2. I expect there to be at least one more draft, but perhaps even more, before http2 becomes an official RFC. We hope to be able to stay on the frontier of http2 and deliver support for the most recent draft going forward.

PS. If you try any of this and experience any sort of problems, please speak to us on the curl-library mailing list and help us smoothen out whatever problem you got!

cURL and libcurl, Mozilla, Network

HTTPbis design team meeting London

March 8, 2014 Daniel Stenberg

I’m writing this just hours after the HTTPbis design team meeting in London 2014 has ended.

Around 30 people attended the meeting i Mozilla’s central London office. The fridge was filled up with drinks, the shelves were full of snacks and goodies. The day could begin. This is the Saturday after the IETF89 week so most people attending had already spent the whole or parts of the week before here in London doing other HTTP and network related work. The HTTPbis sessions at the IETF itself were productive and had already pushed us forward.

We started at 9:30 and we quickly got to work. Mark Nottingham guided us through the day with usual efficiency.

We all basically hang out in a huge room, some in chairs, some in sofas and a bunch of people on the floor or just standing up. We had mikes passed around and the http2 discussions were flowing back and forth depending on the topics and what people felt about them. Some of the issues that were nailed down this time and will end up detailed in the upcoming draft-11 are (strictly speaking, we only discussed the things and formed opinions, as by IETF guidelines we can’t decide things on an offline meeting like this):

Priorities of streams will have a dependency graph or direction, making individualÂ streams less or more important than other
A client can send headers without compression and tell the proxy that the header shouldn’t be compressed – used a way to mitigate some of the compression security problems
There will be no TLS renegotiation allowed mid-session. Basically a client will have to tear down the connection and negotiate again if suddenly a need to use a client certificate arises.
Alt-Svc is the way forward so ALTSVC will appear a new frame in draft-11. This is the way to signal to an application that there is another “route” to the same content on the same server. This will allow for what is popularly known as “opportunistic encryption” or at least one sort of that. In short, you can do “plain-text” HTTP over a TLS connection using this…
We decided that a server should support gzip contents from clients

There were some other things too handled, but I believe those are the main changes. When the afternoon started to turn long, beers and other beverages were brought out and we did enjoy a relaxing social finale of the day before we split up in smaller groups and headed out in the busy London night to get dinner…

Thanks everyone for a great day. I also appreciated meeting several people in real-life I never met before, only discussed with and read emails from online and of course some old friends I hadn’t seen in a long time!

Oh, there’s also a new rough time frame for http2 going forward. Nearest in time would be the draft-11 at the end of March and another interim in the beginning of June (Boston?).

As a reminder, here’s what was happened for draft-10, and here is http2 draft-10.

Out of all people present today, I believe Mozilla was the company with the largest team (8 attendees) – funnily enough none of us Mozillians there actually work in this office or even in this country.

Network, Web

HTTP2 – the next step

January 24, 2014 Daniel Stenberg

The HTTPbis working group of the IETF had an interim meeting in Zurich January 22nd to 24th. I participated from remote and I listened in on the discussions over webex and followed the jabber room while the meetings were going on, addressing HTTP2 protocol issues one by one ironing out quirks and progressing forward.

I won’t bore you with details why I wasn’t present in Zurich.

Here’s a couple of quick and brief reflections from my perspective:

Listening in from remote like this is not at all adequately compensating for not being there. A room full of people discussing something is really hard to follow from remote and completely impossible to interact with. It is better than not being able to listen in at all, but it is certainly not a replacement for being there.

It is amazing how much faster people can come to conclusions and fix issues when being in the same room. Issues that have been lingering in the tracker for a very long time could be dealt with and closed within minutes. Things like what to call the protocol in ALPN or to remove the ability to switched off flow control. Not all issues of course…

HTTP2 draft-09 that soon will become draft-10 to reflect the updates from this meeting and more, is from my perspective quite far in its process. It is clearly at a point that seems to be OK with most people and the discussions are now just about details. Of course the devil is in the details and I’m not saying it can’t take a long time to settle on them, but the structure and main concepts of the protocol are probably now established.

There were not very many proxy or server people at the interim. Most of the people seemed to be client-side oriented and some service oriented. I’m personally client-side biased myself but I hope this doesn’t lead to us deciding on things that the “other side” will have problems with down the line.

Firefox nightly supports HTTP2 draft-09 (for https:// URLs) and twitter supports it server side. Enable it in the browser by entering “about:config” in the URL bar and change the config entry called “network.http.spdy.enabled.http2draft” to true. Done.

Some of the biggest HTTP2 changes brought up compared to what draft-09 says include:

no more ability to switch off flow control
the prioritization field/concept “weighted dependency tree approach”
>= TLS 1.2 with ephemeral ciphers
MUST not use TLS compression
tolerant to TLS false start or at least must accept/buffer application layer data
padding

There was also a whole lot of discussions about TLS for http:// urls, proxys, MITM for SSL, opportunistic encryption and more but I believe most of those issues remained at the same position as before – I missed out on parts of the last afternoon so I may have missed some details. It’ll all be revealed in draft-10 and the mailing list I’m sure!

Update: the minutes from the interim meeting.

cURL and libcurl, Development, Mozilla, Network

My first Mozilla week

January 17, 2014 Daniel Stenberg 1 Comment

Working from home

I get up in the morning, shave, eat breakfast and make sure all family members get off as they should. Most days I walk my son to school (some 800 meters) and then back again. When they’re all gone, the house is quiet and then me and my cup of coffee go upstairs and my work day begins.

Systems and accounts

I have spent time this week to setup accounts and sign up for various lists and services. Created profiles, uploaded pictures, confirmed passwords. I’ve submitted stuff and I’ve signed things. There’s quite a lot of systems in use.

My colleagues

I’ve met a few. The Necko team isn’t very big but the entire company is huge and there are just so many people and names. I haven’t yet had any pressing reason to meet a lot of people nor learn a lot of names. I feel like I’m starting out this really slowly and gradually.

Code base

Firefox is a large chunk of code. It takes some 20 minutes to rebuild on my 3.5GHz quad-core Core-i7 with SSD. I try to pull code and rebuild every morning now so that I can dogfood and live on the edge. I also have a bunch of local patches now, some of them which I want to have stewing in my own browser for a while so that I know they at least don’t have any major negative impact!

Figuring out the threading, XPCOM, the JavaScript stuff and everything is a massive task. I really cannot claim to have done more than just scratched the surface so far, but at least I am scratching and I’ve “etagged” the whole lot and I’ve spent some time reading and reviewing code. Attaching a gdb to a running Firefox and checking out behavior and how it looks has also helped.

Netwerk code size

“Netwerk” is the directory name of the source tree where most of the network code is located. It is actually not so ridiculously large as one could fear. Counting only C++ and header files, it sums up to about 220K lines of code. Of course not everything interesting is in this tree, but still. Not mindbogglingly large.

Video conferencing

I’ll admit I’ve not participated in this sort of large scale video conferences before this. With Vidyo and all the different people and offices signed up at once – it is a quite impressive setup actually. My only annoyance so far is that I didn’t get the sound for Vidyo to work for me in Linux with my headphones. The other end could hear me but I couldn’t hear them! I had to defer to using Vidyo on a windows laptop instead.

Doing the video conferencing on a laptop instead of on my desktop machine has its advantages when I do them during the evenings when the rest of the family is at home since then I can move my machine somewhere and sit down somewhere where they won’t disturb me and I won’t disturb them.

Bugzilla

The bug tracker is really in the center for this project, or at least for how I view it and work with it right now. During my first week I’ve so far filed two bug reports and I’ve submitted a suggested patch for a third bug. One of my bugs (Bug 959100 – ParseChunkRemaining doesn’t detect chunk size overflow) has been reviewed fine and is now hopefully about to be committed.

I’ve requested commit access (#961018) as a “level 1” and I’ve signed the committer’s agreement. Level 1 is entry level and only lets me push to the Try server but still, I fully accept that there’s a process to follow and I’m in no hurry. I’ll get to level 3 soon enough I’m sure.

Mercurial

What can I say. After having used it a bit this week without any particularly fancy operations, I prefer git so much more. Of course I’m also much more used to git, but I find that for a lot of the stuff where both have similar concepts I prefer to git way. Oh well, its just a tool. I’ll get around. Possibly I’ll try out the git mirror soon and see if that provides a more convenient environment for me.

curl

What impact did all this new protocol and network code stuff during my work days have on my curl activities?

I got inspired to fix both the chunked encoding parser and the cookie parser’s handling of max-age in libcurl.

What didn’t happen

I feel behind in the implementing-http2 department. I didn’t get my new work laptop yet.

Next week

More of the same, land more patches and figure out more code. Grab more smallish bugs others have filed and work on fixing them as more practice.

Also, there’s a HTTPbis meeting in Zürich on Wednesday to Friday that I won’t go to (I’ll spare you the explanation why) but I’ll try to participate remotely.

cURL and libcurl, Haxx, Network, Open Source, Web

I go Mozilla

December 20, 2013 Daniel Stenberg 4 Comments

In January 2014, I start working for Mozilla

I’ve worked in open source projects for some 20 years and I’ve maintained curl and libcurl for over 15 years. I’m an internet protocol geek at heart and Mozilla seems like a perfect place for me to continue to explore this interest of mine and combine it with real open source in its purest form.

I plan to use my experiences from all my years of protocol fiddling and making stuff work on different platforms against random server implementations into the networking team at Mozilla and work on improving Firefox and more.

I’m putting my current embedded Linux focus to the side and I plunge into a worldwide known company with worldwide known brands to do open source within the internet protocols I enjoy so much. I’ll be working out of my home, just outside Stockholm Sweden. Mozilla has no office in my country and I have no immediate plans of moving anywhere (with a family, kids and all established here).

I intend to bring my mindset on protocols and how to do things well into the Mozilla networking stack and world and I hope and expect that I will get inspiration and input from Mozilla and take that back and further improve curl over time. My agreement with Mozilla also gives me a perfect opportunity to increase my commitment to curl and curl development. I want to maintain and possibly increase my involvement in IETF and the httpbis work with http2 and related stuff. With one foot in Firefox and one in curl going forward, I think I may have a somewhat unique position and attitude toward especially HTTP.

I’ve not yet met another Swedish Mozillian but I know I’m not the only one located in Sweden. I guess I now have a reason to look them up and say hello when suitable.

Björn and Linus will continue to drive and run Haxx with me taking a step back into the shadows (Haxx-wise). I’ll still be part of the collective Haxx just as I was for many years before I started working full-time for Haxx in 2009. My email address, my sites etc will remain on haxx.se.

I’m looking forward to 2014!

cURL and libcurl, Network

Monitoring my voip line

April 6, 2013 Daniel Stenberg

Ping Communication Voice Catcher 201E My “landline” phone in my house is connected over voip through my fiber and I’m using the service provided by Affinity Telecom. A company I never heard of before and I can only presume it is a fairly small one.

Everything is working out fine, apart from one annoying little glitch: every other month or so my phone reports itself as either busy to a caller (or just as if nobody picks up the phone) and the pingcom NetPhone Adapter 201E voip box I have needs to be restarted for the phone line to get back to normal (I haven’t figured out if the box or the service provider is the actual villain).

In my household we usually discover the problem after several days of this situation since we don’t get many calls and we don’t make many calls. (The situation is usually even notable on the voip box’s set of led lights as they are flashing when they are otherwise solid but the box is not put in a place where we notice that either.) Several days of the phone beeping busy to callers is a bit annoying and I’ve decided to try to remedy that somehow. Luckily the box has a web interface that allows me to admin it and check status etc, and after all, I know a tool I can use to script HTTP to the thing, extract the status and send me a message when it needs some love!

Okay, so I just need to “login” to the box and get the status page and extract the info for the phone line and I’m done. I’ve done this dozens if not hundreds of times on sites all over the net the last decade. I merrily transferred the device info page “http://pingcom/Status/Device_Info.shtml” with curl and gave it a glance…

Oh. My. God. This is a little excerpt from the javascript magic that handles the password I enter to login to the web interface:

    /*
     * Get the salt from the router
     */
    (code gets salt from a local URL)

    var salt = xml_doc.textdoc;
    /*
     * Append the password to the salt
     */
     var input = salt + password;
    /*
     * MD5 hash of the salt.
     */
    var hash = hex_md5(input);
    /*
     * Append the MD5 hash to the salt.
    */
    var login_hash = salt.concat(hash);
    /*
     * Send the login hash to the server.
     */
    login_request = new ajax_xmlhttp("/post_login.xml?user=" + escape(username) + "&hash=" +
         escape(login_hash), function(xml_doc)

    [cut]

Ugha! So it downloads a salt, does hashing, salting and md5ing on the data within the browser itself before it sends it off to the server. That’s is so annoying and sure I can probably replicate that logic in a script language of my choice but it is going to take some trial and error until the details are all sorted out.

Ok, so I do the web form login with my browser again and start to look at what requests it does and so on in order to be able to mimic them with curl instead. I then spot that when viewing that device info page, it makes a whole series of HTTP requests that aren’t for pictures and not for the main HTML… Hm, at a closer look it fetches data from a bunch of URLs ending with “.cgi”! And look, among those URLs there’s one in particular that is called “voip_line_state.cgi”. Let me try to get just that and see what that might offer and what funny auth dance I may need for it…

curl http://pingcom/voip_line_state.cgi

And what do you know? It returns a full XML of the voip status, entirely without any login or authentication required:

<LineStatus channel_count="2">
  <Channel index="0" enabled="1">
    <SIP state="Up">
      <Name>0123456789</Name>
      <Server>sip.example.org</Server>
    </SIP>
    <Call state="Idle"></Call>
  </Channel>
  <Channel index="1" enabled="0"></Channel>
</LineStatus>

Lovely! That ‘Idle’ string in there in the <Call> tag is the key. I now poll the status and check to see the state in order to mail myself when it looks wrong. Still needs to be proven to actually trigger during the problem but hey, why wouldn’t it work?

The final tip is probably the lovely tool xml2, which converts an XML input to a “flat” output. That output is perfect to use grep or sed on to properly catch the correct situation, and it keeps me from resorting to the error-prone concept of grepping or regexing actual XML. After xml2 the above XML looks like this:

/LineStatus/@channel_count=2
/LineStatus/Channel/@index=0
/LineStatus/Channel/@enabled=1
/LineStatus/Channel/SIP/@state=Up
/LineStatus/Channel/SIP/Name=012345679
/LineStatus/Channel/SIP/Server=sip.example.org
/LineStatus/Channel/Call/@state=Idle
/LineStatus/Channel
/LineStatus/Channel/@index=1
/LineStatus/Channel/@enabled=0

Now I’ll just have to wait until the problem hits again to see that my scripts actually work… Once proven to detect the situation, my next step will probably be to actuallyÂ maneuverÂ the web interface and reboot it. I’ll get back to that later..

cURL and libcurl, Network, Web

Better pipelining in libcurl 7.30.0

March 26, 2013 Daniel Stenberg 2 Comments

Back in October 2006, we added support for HTTP pipelining to libcurl. The implementation was naive and simple: it basically preferred to pipeline everything on the single connection to a given host if it could. It works only with the multi interface and if you do a second request to the same host it will try to pipeline that.

pipelines

Over the years the feature was bugfixed and improved slightly, which proved that at least a couple of applications actually used it – but it was never any particularly big hit among libcurl’s vast amount of features.

Related background information that gives details on some of the problems with pipelining in the wild can be found in Mark Nottingham’s Making HTTP Pipelining Usable on the Open Web internet-draft, Mozilla’s bug report “HTTP pipelining by default” and Chrome’s pipelining docs.

Now, more than six years later, Linus Nielsen Feltzing (a colleague and friend at Haxx) strikes back with a much improved and almost completely revamped HTTP pipelining support (merged into master just hours before the new-feature window closed for the pending 7.30.0 release). This time, the implementation features and provides:

a configurable number of connections and pipelines to each unique host name
a round-robin approach that favors starting new connections first, and then pipeline on existing connections once the maximum number of connections to the host is reached
a max-depth value that when filled makes the code not add any more requests on that connection/pipeline
a pipe penalization system that avoids adding new requests to pipes that are known to be receiving very large contents and thus possibly would stall subsequent requests for an extended period of time
a server blacklist that allows the application to specify a list of servers for which HTTP pipelining should not be attempted – real world tests has proven that some servers are too broken to be allowed to play the game

The code also adds a feature that helps out applications to do massive amount of requests in a controlled manner:

A hard maximum amount of connections, that when reached makes libcurl queue up easy handles internally until they can create a new connection or re-use a previously used one. That allows an application to for example set the limit to 50 and then add 400 handles to the multi handle but it will still only use 50 connections as a maximum so over time when requests get completed it will start new transfers on the requests that are waiting in line and thus shrinking the queue and keeping the maximum amount of connections until there’s less than 50 left to do…

Previously that kind of queuing had to be done by the application itself, but now with the much more extensive pipelining support it really isn’t as easy for an application to know when the new request can get pipelined or create a new connection so this logic is now provided by libcurl itself. It is likely going to be appreciated and used also by non-pipelining applications…

This implementation is accompanied with a bunch of new test cases for libcurl (and even a new HTTP test server for the purpose), and it has been tested in the wild for a while with libcurl as the engine in a web browser implementation (the company doing that has requested to remain anonymous). We believe it is in fairly decent state, but as this is a large step and the first release it is shipped with I expect there to be some hiccups along the way.

Two things to take note of:

pipelining is only available for users of libcurl’s multi interface, and only if explicitly enabled with CURLMOPT_PIPELINING
the curl command line tool does not use the multi interface and thus it will not use pipelining

Network, Technology

decopperfied

October 18, 2012 Daniel Stenberg 5 Comments

Today I disconnected my ADSL modem through which I still had the landline telephone service running. I’ve cancelled the service so at the end of the month it will die and I’m now instead signing up to one of them lesser known companies that offer cheap IP-telephony that is network independent: Affinity telecom. Staying with only phone service over the copper was not a good idea since the cheapest service is still expensive over that medium.

Why have a landline phone at all? I’ve decided to stick with a landline telephone for a few reasons: first, our phone number is distributed widely and it is convenient to keep it working and keeping it reach our house instead of a specific person. Secondly, we have two kids who like to phone at times and they can use this fine and it’ll end up cheaper than if they’d use their own mobile phones. And thirdly, the parents-in-law factor. My parents and my wife’s parents etc like calling landlines instead of mobile phones, I think partly due to old habits but also partly because it is cheaper for them…

When I researched which service to pick I discovered that not a single operator exists in Sweden that only charges for usage. They all have a minimum monthly fee, so I went with one of the cheapest I could find on prisjakt for my usage pattern.

I simply yanked the RJ11 from the ADSL modem and inserted into my new “Ping Communications Voice Catcher 201E” device (see picture) that I received. The instructions that come with the device say I should connect it directly to internet and then let my existing LAN traffic route through it. I don’t want to add yet another box between me and the internet and I fear that a cheapo box like this might cause problems in one way or another if I do, so I just plugged this new toy into my router and after I while I could happily confirm that it worked just as nicely. It got an IP from my DHCP server and I can call my old-style analog phone now (I love number portability) and I can use it to call out to my mobile.

I’m curious to see how good/bad this is going to work…

Update: It didn’t work to make the box a regular DHCP client. Even if I made it into a DMZ it still wouldn’t work to accept calls. I would only get a signal and everything, but once I answered the phone there was no sound. In the end I moved it to sit between “internet” and my local router and now my phone seems to be doing fine.

cURL and libcurl, Network

WSAPoll is broken

October 10, 2012 Daniel Stenberg

Microsoft admits the WSApoll function is broken but won’t do anything about it. Unless perhaps if customers keep nagging them.

Doing portable socket programming has always meant using a bunch of #ifdefs and similar. A program needs to be built on many systems and slowly get adjusted to work really well all over. For ages, for example, Windows only supported select() and not poll() while all sensible systems[*] out there supported poll(). There are several reasons to prefer poll to select when writing code.

Then one day in 2006, Chad Charlin, a developer at Microsoft wrote the following when talking about the new WSApoll() function they introduced in Windows Vista:

Among the many improvements to the Winsock API shipping in Vista is the new WSAPoll function. Its primary purpose is to simplify the porting of a sockets application that currently uses poll() by providing an identical facility in Winsock for managing groups of sockets.

Great! Starting September 2006 curl started using it (shipped in the release curl and libcurl 7.16.0). It seemed like a huge step forward, and as Chad wrote:

If you have experience developing applications using poll(), WSAPoll will be very familiar. It is designed to behave just like poll().

Emphasis added by me. It was (of course) made to work like poll, and that’s why the API is made like that. Why would you introduce something that is almost like poll() except in minor details?

Since the new function only was available in Vista and later, it took a while until libcurl users in a more wider scale got to use it but over time Windows XP users are slowly shifting away and more and more libcurl Windows users therefore use the WSApoll based builds. Life seemed to be good. Some users noticed funny things and reported bugs we couldn’t repeat (on other platforms) but nothing really stood out and no big alarm bells went off.

During July 2012, a user of libcurl on Windows, Jan Koen Annot experienced such problems and he didn’t just sigh and move on. He rolled up his sleeves and decided to get to the bottom. Perhaps he could fix a bug or two while at it? (It seems reasonable that he thought so, I haven’t actually asked him!) What he found was however not a bug in libcurl. He found out that WSApoll did indeed not work like poll (his initial post to curl-library on the problem)! On August 1st he submitted a support issue to Microsoft about it. On August 7 we pushed the commit to curl that removed our use of WSApoll.

A few days go Jan reported back on how the case has gone, where his journey down the support alleys took him.

It turns out Microsoft already knew about this bug, which they apparently have named “Windows 8 Bugs 309411 – WSAPoll does not report failed connections”. The ticket has been resolved as Won’t Fix… (I haven’t found any public access of this.)

Jan argued for the case that since WSApoll is designed and used as a plain poll() replacement it would make sense to actually make it also work the same way:

First, it will cost much time to find out that some ‘real-life’ issue can be traced back to this WSAPoll bug. In my case we were lucky to have a regression test which triggered when we started using a slightly different cURL-library configuration on Windows. Tracing back that the test was triggered because of this bug in WSAPoll took several hours. Imagine what it would cost, if some customer in the field reported annoying delays, to trace such a vague complaint back to a bug in the WSAPoll function!

Second, even if we know beforehand about this bug in WSAPoll, then it is difficult to determine in which situations in your code you can safely use WSAPoll and in which situations you might suffer from this bug. So a better recommendation would be to simply not use WSAPoll. (…)

Third, porting code which uses the poll() function to the Windows sockets API is made more complex. The introduction of WSAPoll was meant specifically for this, so it should have compatible behavior, without a recommendation to not use it in certain circumstances.

Fourth, your recommendation will only have effect when actively promoted to developers using WSAPoll. A much better approach would be to repair the bug and publish an update. Microsoft has some nice mechanisms in place for that.

So, my conclusion is that, even if in our case the business impact may be low because we found the bug in an early stage, it is still important that Microsoft fixes the bug and publishes an update.

In my eyes all very good and sensible arguments. Perhaps not too surprisingly, these fine reasons didn’t have any particular impact on how Microsoft views this old and known bug that “has been like this forever and people are already used to it.”. It will remain closed, and Microsoft motivated this decision to Jan quite clearly and with arguments one can understand:

A discussion has been conducted around this topic and the taken decision was not to have the fix implemented due to the following reasons:

This issue since Vista

no other Microsoft customer has asked for a Hotfix since Vista timeframe

fixing this old issue might have some application compatibility risk (for those customers who might have somehow taken a dependency on WSAPoll failing with a timeout in the cases of connect failure as opposed to POLLERR).

This API will become more irrelevant as the Windows versions increase; the networking APIs will move away from classic select/poll to more advanced I/O completion mechanisms.

Argument one and two are really weak and silly. Microsoft users are very rarely complaining to Microsoft and most wouldn’t even know how to do it. Also, this problem may certainly still affect many users even if nobody has asked for a fix.

The compatibility risk is a valid point, but that’s a bit of a hard argument to have. All bugs that are about behavior will of course risk that users have adapted to the wrong behavior so a bug fix may break those. All of us who write and maintain stable APIs are used to this problem, but sticking to the buggy way of working because it has been doing this for so long is in my eyes only correct if you document this with very large letters and emphasis in all documentation: WSApoll is not fully emulating poll – beware!

The fact that they will focus more on other APIs is also understandable but besides the point. We want reliable APIs that work as documented. Applications that are Windows-only probably already very rarely use WSApoll, it will probably remain being more important for porting socket style programs to Windows.

Jan also especially highlights a funny line from this Microsoft person:

The best way to add pressure for a hotfix to be released would be to have the customers reporting it again on http://connect.microsoft.com.

Okay, so even if they have motives why they won’t fix this bug they seem to hint that if more customers nag them about it they might change their minds. Fair enough. But the users of libcurl who for five years perhaps experienced funny effects are extremely unlikely to ever report and complain to Microsoft about this. They are way more likely to complain to us, or possibly to just work around the issue somehow.

Of course, users of WSApoll can adapt to the differences and make conditional code that handles them and that could be what we end up with in the curl project in the future if we just get volunteers to adapt the code accordingly. In the mean time we’ve just reverted to the old select()-using code instead, since select() does in fact mimic the “real” select much better…

[*] = clearly Mac OS X is not a sensible system since its poll() implementation is even worse than Windows and is mostly broken or just unreliable. Subject for another blog post another time.

Update

In 2023, a user made me aware that the Microsoft documentation now says:

Note As of Windows 10 version 2004, when a TCP socket fails to connect, (POLLHUP | POLLERR | POLLWRNORM) is indicated.

Maybe it is time to do new tests.

Network, Open Source

What, me over-analyzing?

October 7, 2012 Daniel Stenberg 4 Comments

fiber cable I got fiber installed to my house a few months ago. 100/100 mbit is very nice.

My first speed checks using the Swedish bredbandskollen service were a bit disappointing since it only showed something like 50mbit down and 80mbit up. I decided to ignore that fact for the moment as things were new and I had some other more annoying issues.

I detected that sometimes, on some specific sites I had problems to get HTTP! The TCP connection would get connected and data would get sent, but it would stall somewhere and then get disconnected. This showed up when for example my wife tried to download a Spotify client and my phone got trouble to download some of my favorite podcasts. Some, not all. Possibly one out of ten or twenty sites showed the problem. Most of the sites I use frequently worked flawlessly and I would only ever see the problem when I tried HTTP.

Puzzling!

I could avoid the problem by setting up a SSH connection to my server and running that as a SOCKS proxy, and so I could still get service even from the sites I apparently couldn’t quite use. I tried to collect the problematic sites and I tried traceroute etc on all the ones that failed in order to get data that would help me and my operator to pinpoint the problem. I reported this problem to my ISP really early on, but they too were puzzled and it never got far in their end.

I was almost convinced they had some kind of traffic shaping thing in the middle that was broken for HTTP somehow.

Time to fix it once and for all

Finally one day I stopped being nice with the support people and I demanded that they would send over a guy and fix it. It wasn’t good enough that they would find that everything is OK from remote. I clearly had problems and I could escape the problems by switching over to my old ADSL so I knew it wasn’t due to my computers’ configs or due to my own firewalls or routers.

I was also convinced my ISP would get me some cable guy coming over when the problem wasn’t really in the cable, but sure it would be a necessary first step towards finding the real error.

They sent a cable guy, and it took him like three minutes to detect a bad signal level on the fiber, meaning that the problem was certainly not in my house anyway. He then drove down to the “station” that terminates my fiber, and after having “polished” that end of the fiber connection too he called me and said that he couldn’t spot any problem anymore and asked me to verify…

Now my checks showed me 80-90 mbit in both directions and the sites that used to give me problems all worked just fine.

It was the cable all along. Bad signal level. “A mystery it worked at all for you” my cable person said.

I’m left with my over-analyzed problem suspecting all sorts of high level stuff. But why did this only affect some sites? How come I could circumvent this with SOCKS? Gah, my brain hurts trying to answer these questions…

Anyway, now it all works and the family is happy again.

daniel.haxx.se