Tag Archives: windows

deleting system32\curl.exe

Let me tell you a story about how Windows users are deleting files from their installation and as a consequence end up in tears.

Background

The real and actual curl tool has been shipped as part of Windows 10 and Windows 11 for many years already. It is called curl.exe and is located in the System32 directory.

Microsoft ships this bundled with its Operating system. They get the code from the curl project but Microsoft builds, tests, ships and are in all ways responsible for their operating system.

NVD inflation

As I have blogged about separately earlier, the next brick in the creation of this story is the fact that National Vulnerability Database deliberately inflates the severity levels of security flaws in its vast database. They believe scaremongering serves their audience.

In one particular case, CVE-2022-43552 was reported by the curl project in December 2022. It is a use-after-free flaw that we determined to be severity low and not higher mostly because of the very limited time window you need to make something happen for it to be exploited or abused. NVD set it to medium which admittedly was just one notch higher (this time).

This is not helpful.

“Security scanners”

Lots of Windows users everywhere runs security scanners on their systems with regular intervals in order to verify that their systems are fine. At some point after December 21, 2022, some of these scanners started to detect installations of curl that included the above mentioned CVE. Nessus apparently started this on February 23.

This is not helpful.

Panic

Lots of Windows users everywhere then started to panic when these security applications warned them about their vulnerable curl.exe. Many Windows users are even contractually “forced” to fix (all) such security warnings within a certain time period or risk bad consequences and penalties.

How do you fix this?

I have been asked numerous times about how to fix this problem. I have stressed at every opportunity that it is a horrible idea to remove the system curl or to replace it with another executable. It is very easy to download a fresh curl install for Windows from the curl site – but we still strongly discourage everyone from replacing system files.

But of course, far from everyone asked us. A seemingly large enough crowd has proceeded and done exactly what we would stress they should not: they deleted or replaced their C:\Windows\System32\curl.exe.

The real fix is of course to let Microsoft ship an update and make sure to update then. The exact update that upgrades curl to version 8.0.1 is called KB5025221 and shipped on April 11. (And yes, this is the first time you get the very latest curl release shipped in a Windows update)

The people who deleted or replaced the curl executable noticed that they cannot upgrade because the Windows update procedure detects that the Windows install has been tampered with and it refuses to continue.

I do not know how to restore this to a state that Windows update is happy with. Presumably if you bring back curl.exe to the exact state from before it could work, but I do not know exactly what tricks people have tested and ruled out.

Bad advice

I have been pointed to responses on the Microsoft site answers.microsoft.com done by “helpful volunteers” that specifically recommend removing the curl.exe executable as a fix.

This is not helpful.

I don’t want to help spreading that idea so I will not link to any such post. I have reported this to Microsoft contacts and I hope they can maybe edit or comment those posts soon.

We are not responsible

I just want to emphasize that if you install and run Windows, your friendly provider is Microsoft. You need to contact Microsoft for support and help with Windows related issues. The curl.exe you have in System32 is only provided indirectly by the curl project and we cannot fix this problem for you. We in fact fixed the problem in the source code already back in December 2022.

If you have removed curl.exe or otherwise tampered with your Windows installation, the curl project cannot help you.

Credits

Image by Alexa from Pixabay

Discussions

Hacker news

Warning: curl users on Windows using FILE://

The Windows operating system will automatically, and without any way for applications to disable it, try to establish a connection to another host over the network and access it (over SMB or other protocols), if only the correct file path is accessed.

When first realizing this, the curl team tried to filter out such attempts in order to protect applications for inadvertent probes of for example internal networks etc. This resulted in CVE-2019-15601 and the associated security fix.

However, we’ve since been made aware of the fact that the previous fix was far from adequate as there are several other ways to accomplish more or less the same thing: accessing a remote host over the network instead of the local file system.

The conclusion we have come to is that this is a weakness or feature in the Windows operating system itself, that we as an application and library cannot protect users against. It would just be a whack-a-mole race we don’t want to participate in. There are too many ways to do it and there’s no knob we can use to turn off the practice.

We no longer consider this to be a curl security flaw!

If you use curl or libcurl on Windows (any version), disable the use of the FILE protocol in curl or be prepared that accesses to a range of “magic paths” will potentially make your system try to access other hosts on your network. curl cannot protect you against this.

We have updated relevant curl and libcurl documentation to make users on Windows aware of what using FILE:// URLs can trigger (this commit) and posted a warning notice on the curl-library mailing list.

Previous security advisory

This was previously considered a curl security problem, as reported in CVE-2019-15601. We no longer consider that a security flaw and have updated that web page with information matching our new findings. I don’t expect any other CVE database to update since there’s no established mechanism for updating CVEs!

Credits

Many thanks to Tim Sedlmeyer who highlighted the extent of this issue for us.

openssl engine code injection in curl

This flaw is known as CVE-2019-5443.

If you downloaded and installed a curl executable for Windows from the curl project before June 21st 2019, go get an updated one. Now.

On Windows, using OpenSSL

The official curl builds for Windows – that the curl project offers – are built cross-compiled on Linux. They’re made to use OpenSSL by default as the TLS backend, the by far most popular TLS backend by curl users.

The curl project has provided official curl builds for Windows on and off through history, but most recently this has been going on since August 2018.

OpenSSL engines

These builds use OpenSSL. OpenSSL has a feature called “engines”. Described by the project itself like this:

“a component to support alternative cryptography implementations, most commonly for interfacing with external crypto devices (eg. accelerator cards). This component is called ENGINE”

More simply put, an “engine” is a plugin for OpenSSL that can be loaded and run dynamically. The particular engine is activated either built-in or by loading a config file that specifies what to do.

curl and OpenSSL engines

When using curl built with OpenSSL, you can specify an “engine” to use, which in turn allows users to use their dedicated hardware when doing TLS related communications with curl.

By default, the curl tool allows OpenSSL to load a config file and figure out what engines to load at run-time but it also provides a build option to make it possible to build curl/libcurl without the ability to load that config file at run time – which some users want, primarily for security reasons.

The mistakes

The primary mistake in the curl build for Windows that we offered, was that the disabling of the config file loading had a typo which actually made it not disable it (because the commit message had it wrong). The feature was therefore still present and would load the config file if present when curl was invoked, contrary to the intention.

The second mistake comes a little more from the OpenSSL side: by default if you build OpenSSL cross-compiled like we do, the default paths where it looks for the above mentioned config file is under the c:\usr\local tree. It is in fact even complicated and impossible to fix this path in the build without a patch.

What the mistakes enable

A non-privileged user or program (the attacker) with access to the host to put a config file in the directory where curl would look for a config file (and create the directory first as it probably didn’t already exist) and the suitable associated engine code.

Then, when an privileged user subsequently executes curl, it will run with more power and run the code, the engine, the attacker had put there. An engine is a piece of compiled code, it can do virtually anything on the machine.

The fix

Already three days ago, on June 21st, a fixed version of the curl executable for Windows was uploaded to the curl web site (“curl 7.65.1_2”). All older versions that had been provided in the past were removed to reduce the risk of someone still using an old lingering download link.

The fix now makes the curl build switch off the loading of the config file, as was already intended. But also, the OpenSSL build that is used for the build is now modified to only load the config file from a privileged path that isn’t world writable (C:/Windows/System32/OpenSSL/).

Widespread mistake

This problem is very widespread among projects on Windows that use OpenSSL. The curl project coordinated this publication with the postgres project and have worked with OpenSSL to make them improve their default paths. We have also found a few other openssl-using projects that already have fixed their builds for this flaw (like stunnel) but I think we have reason to suspect that there are more vulnerable projects out there still not fixed.

If you know of a project that uses OpenSSL and ships binaries for Windows, give them a closer look and make sure they’re not vulnerable to this.

The cat is already out of the bag

When we got this problem reported, we soon realized it had already been publicly discussed and published for other projects even before we got to know about it. Due to this, we took it to publication as quick as possible to minimize user impact as much as we can.

Only on Windows and only with OpenSSL

This flaw only exists on curl for Windows and only if curl was built to use OpenSSL with this bad path and behavior.

Microsoft ships curl as part of Windows 10, but it does not use OpenSSL and is not vulnerable.

Credits

This flaw was reported to us by Rich Mirch.

The build was fixed by Viktor Szakats.

The image on the blog post comes from pixabay.

curl 7.62.0 MOAR STUFF

This is a feature-packed release with more new stuff than usual.

Numbers

the 177th release
10 changes
56 days (total: 7,419)

118 bug fixes (total: 4,758)
238 commits (total: 23,677)
5 new public libcurl functions (total: 80)
2 new curl_easy_setopt() options (total: 261)

1 new curl command line option (total: 219)
49 contributors, 21 new (total: 1,808)
38 authors, 19 new (total: 632)
  3 security fixes (total: 84)

Security

New since the previous release is the dedicated curl bug bounty program. I’m not sure if this program has caused any increase in reports as it feels like a little too early to tell.

CVE-2018-16839 – an integer overflow case that triggers on 32 bit machines given extremely long input user name argument, when using POP3, SMTP or IMAP.

CVE-2018-16840 – a use-after-free issue. Immediately after having freed a struct in the easy handle close function, libcurl might write a boolean to that struct!

CVE-2018-16842 – is a vulnerability in the curl command line tool’s “warning” message display code which can make it read outside of a buffer and send unintended memory contents to stderr.

All three of these issues are deemed to have low severity and to be hard to exploit.

New APIs!

We introduce a brand new URL API, that lets applications parse and generate URLs, using libcurl’s own parser. Five new public functions in one go there! The link goes to the separate blog entry that explained it.

A brand new function is introduced (curl_easy_upkeep) to let applications maintain idle connections while no transfers are in progress! Perfect to maintain HTTP/2 connections for example that have a PING frame that might need attention.

More changes

Applications using libcurl’s multi interface will now get multiplexing enabled by default, and HTTP/2 will be selected for HTTPS connections. With these new changes of the default behavior, we hope that lots of applications out there just transparently and magically will start to perform better over time without anyone having to change anything!

We shipped DNS-over-HTTPS support. With DoH, your internet client can do secure and private name resolves easier. Follow the link for the full blog entry with details.

The good people at MesaLink has a TLS library written in rust, and in this release you can build libcurl to use that library. We haven’t had a new TLS backend supported since 2012!

Our default IMAP handling is slightly changed, to use the proper standards compliant “UID FETCH” method instead of just “FETCH”. This might introduce some changes in behavior so if you’re doing IMAP transfers, I advice you to mind your step into this upgrade.

Starting in 7.62.0, applications can now set the buffer size libcurl will use for uploads. The buffers used for download and upload are separate and applications have been able to specify the download buffer size for a long time already and now they can finally do it for uploads too. Most applications won’t need to bother about it, but for some edge case uses there are performance gains to be had by bumping this size up. For example when doing SFTP uploads over high latency high bandwidth connections.

curl builds that use libressl will now at last show the correct libressl version number in the “curl -V” output.

Deprecating legacy

CURLOPT_DNS_USE_GLOBAL_CACHE is deprecated! If there’s not a massive complaint uproar, this means this option will effectively be made pointless in April 2019. The global cache isn’t thread-safe and has been called obsolete in the docs since 2002!

HTTP pipelining support is deprecated! Starting in this version, asking for pipelining will be ignored by libcurl. We strongly urge users to switch to and use HTTP/2, which in 99% of the cases is the better alternative to HTTP/1.1 Pipelining. The pipelining code in libcurl has stability problems. The impact of disabled pipelining should be minimal but some applications will of course notice. Also note the section about HTTP/2 and multiplexing by default under “changes” above.

To get an overview of all things marked for deprecation in curl and their individual status check out this page.

Interesting bug-fixes

TLS 1.3 support for GnuTLS landed. Now you can build curl to support TLS 1.3 with most of the TLS libraries curl supports: GnuTLS, OpenSSL, BoringSSL, libressl, Secure Transport, WolfSSL, NSS and MesaLink.

curl got Windows VT Support and UTF-8 output enabled, which should make fancy things like “curl wttr.in” to render nice outputs out of the box on Windows as well!

The TLS backends got a little cleanup and error code use unification so that they should now all return the same error code for the same problem no matter which backend you use!

When you use curl to do URL “globbing” as for example “curl http://localhost/[1-22]” to fetch a range or a series of resources and accidentally mess up the range, curl would previously just say that it detected an error in the glob pattern. Starting now, it will also try to show exactly where in which pattern it found the error that made it stop processing it.

CI

The curl for Windows CI builds on AppVeyor are now finally also running the test suite! Actually making sure that the Windows build is intact in every commit and PR is a huge step forward for us and our aim to keep curl functional. We also build several additional and different build combinations on Windows in the CI than we did previously. All in an effort to reduce regressions.

We’ve added four new checks to travis (that run on every pull-request and commit):

  1. The “tidy” build runs clang-tidy on all sources in src/ and lib/.
  2. a –disable-verbose build makes sure this configure option still builds curl warning-free
  3. the “distcheck” build now scans all files for accidental unicode BOM markers
  4. a MesaLink-using build verifies this configuration

CI build times

We’re right now doing 40 builds on every commit, spending around 12 hours of CPU time for a full round. With >230 landed commits in the tree that originated from 150-something pull requests,  with a lot of them having been worked out using multiple commits, we’ve done perhaps 500 full round CI builds in these 56 days.

This of course doesn’t include all the CPU time developers spend locally before submitting PRs or even the autobuild system that currently runs somewhere in the order of 50 builds per day. If we assume an average time spent for each build+test to take 20 minutes, this adds another 930 hours of CI hours done from the time of the previous release until this release.

To sum up, that’s about 7,000 hours of CI spent in 56 days, equaling about 520% non-stop CPU time!

We are grateful for all the help we get!

Next release

The next release will ship on December 12, 2018 unless something urgent happens before that.

Note that this date breaks the regular eight week release cycle and is only six weeks off. We do this since the originally planned date would happen in the middle of Christmas when “someone” plans to be off traveling…

The next release will probably become 7.63.0 since we already have new changes knocking on the door waiting to get merged that will warrant another minor number bump. Stay tuned for details!

Blessed curl builds for Windows

The curl project is happy to introduce official and blessed curl builds for Windows for download on the curl web site.

This means we have a set of recommended curl packages that we advice users on Windows to download.

On Linux, macOS, cygwin and pretty much all the other alternatives you have out there, you don’t need to go to random sites on the Internet and download a binary package provided by a (to you) unknown stranger to get curl for your system. Unfortunately that is basically what we have forced Windows users into doing for a few years since our previous maintainer of curl builds for Windows dropped off the project.

These new official curl builds for Windows are the same set of builds Viktor Szakats has been building and providing to the community for a long time already. Now just with the added twist that he feeds his builds and information about them to the main curl site so that users can get them from the same site and thus lean on the same trust they already have in the curl brand in general.

These builds are reproducible, provided with sha256 hashes and a link to the full build log. Everything is public and transparently done.

All the hard work to get these builds in this great shape was done by Viktor Szakats.

Go get it!

much faster curl uploads on Windows with a single tiny commit

These days, operating system kernels provide TCP/IP stacks that can do really fast network transfers. It’s not even unusual for ordinary people to have gigabit connections at home and of course we want our applications to be able take advantage of them.

I don’t think many readers here will be surprised when I say that fulfilling this desire turns out much easier said than done in the Windows world.

Autotuning?

Since Windows 7 / 2008R2, Windows implements send buffer autotuning. Simply put, the faster transfer and longer RTT the connection has, the larger the buffer it uses (up to a max) so that more un-acked data can be outstanding and thus enable the system to saturate even really fast links.

Turns out this useful feature isn’t enabled when applications use non-blocking sockets. The send buffer isn’t increased at all then.

Internally, curl is using non-blocking sockets and most of the code is platform agnostic so it wouldn’t be practical to switch that off for a particular system. The code is pretty much independent of the target that will run it, and now with this latest find we have also started to understand why it doesn’t always perform as well on Windows as on other operating systems: the upload buffer (SO_SNDBUF) is fixed size and simply too small to perform well in a lot of cases

Applications can still enlarge the buffer, if they’re aware of this bottleneck, and get better performance without having to change libcurl, but I doubt a lot of them do. And really, libcurl should perform as good as it possibly can just by itself without any necessary tuning by the application authors.

Users testing this out

Daniel Jelinski brought a fix for this that repeatedly poll Windows during uploads to ask for a suitable send buffer size and then resizes it on the go if it deems a new size is better. In order to figure out that if this patch is indeed a good idea or if there’s a downside for some, we went wide and called out for users to help us.

The results were amazing. With speedups up to almost 7 times faster, exactly those newer Windows versions that supposedly have autotuning can obviously benefit substantially from this patch. The median test still performed more than twice as fast uploads with the patch. Pretty amazing really. And beyond weird that this crazy thing should be required to get ordinary sockets to perform properly on an updated operating system in 2018.

Windows XP isn’t affected at all by this fix, and we’ve seen tests running as VirtualBox guests in NAT-mode also not gain anything, but we believe that’s VirtualBox’s “fault” rather than Windows or the patch.

Landing

The commit is merged into curl’s master git branch and will be part of the pending curl 7.61.1 release, which is due to ship on September 5, 2018. I think it can serve as an interesting case study to see how long time it takes until Windows 10 users get their versions updated to this.

Table of test runs

The Windows versions, and the test times for the runs with the unmodified curl, the patched one, how much time the second run needed as a percentage of the first, a column with comments and last a comment showing the speedup multiple for that test.

Thank you everyone who helped us out by running these tests!

Version Time vanilla Time patched New time Comment speedup
6.0.6002 15.234 2.234 14.66% Vista SP2 6.82
6.1.7601 8.175 2.106 25.76% Windows 7 SP1 Enterprise 3.88
6.1.7601 10.109 2.621 25.93% Windows 7 Professional SP1 3.86
6.1.7601 8.125 2.203 27.11% 2008 R2 SP1 3.69
6.1.7601 8.562 2.375 27.74% 3.61
6.1.7601 9.657 2.684 27.79% 3.60
6.1.7601 11.263 3.432 30.47% Windows 2008R2 3.28
6.1.7601 5.288 1.654 31.28% 3.20
10.0.16299.309 4.281 1.484 34.66% Windows 10, 1709 2.88
10.0.17134.165 4.469 1.64 36.70% 2.73
10.0.16299.547 4.844 1.797 37.10% 2.70
10.0.14393 4.281 1.594 37.23% Windows 10, 1607 2.69
10.0.17134.165 4.547 1.703 37.45% 2.67
10.0.17134.165 4.875 1.891 38.79% 2.58
10.0.15063 4.578 1.907 41.66% 2.40
6.3.9600 4.718 2.031 43.05% Windows 8 (original) 2.32
10.0.17134.191 3.735 1.625 43.51% 2.30
10.0.17713.1002 6.062 2.656 43.81% 2.28
6.3.9600 2.921 1.297 44.40% Windows 2012R2 2.25
10.0.17134.112 5.125 2.282 44.53% 2.25
10.0.17134.191 5.593 2.719 48.61% 2.06
10.0.17134.165 5.734 2.797 48.78% run 1 2.05
10.0.14393 3.422 1.844 53.89% 1.86
10.0.17134.165 4.156 2.469 59.41% had to use the HTTPS endpoint 1.68
6.1.7601 7.082 4.945 69.82% over proxy 1.43
10.0.17134.165 5.765 4.25 73.72% run 2 1.36
5.1.2600 10.671 10.157 95.18% Windows XP Professional SP3 1.05
10.0.16299.547 1.469 1.422 96.80% in a VM runing on Linux 1.03
5.1.2600 11.297 11.046 97.78% XP 1.02
6.3.9600 5.312 5.219 98.25% 1.02
5.2.3790 5.031 5 99.38% Windows 2003 1.01
5.1.2600 7.703 7.656 99.39% XP SP3 1.01
10.0.17134.191 1.219 1.531 125.59% FTP 0.80
TOTAL 205.303 102.271 49.81% 2.01
MEDIAN 43.51% 2.30

Removing the PowerShell curl alias?

PowerShell is a spiced up command line shell made by Microsoft. According to some people, it is a really useful and good shell alternative.

Already a long time ago, we got bug reports from confused users who couldn’t use curl from their PowerShell prompts and it didn’t take long until we figured out that Microsoft had added aliases for both curl and wget. The alias had the shell instead invoke its own command called “Invoke-WebRequest” whenever curl or wget was entered. Invoke-WebRequest being PowerShell’s own version of a command line tool for fiddling with URLs.

Invoke-WebRequest is of course not anywhere near similar to neither curl nor wget and it doesn’t support any of the command line options or anything. The aliases really don’t help users. No user who would want the actual curl or wget is helped by these aliases, and users who don’t know about the real curl and wget won’t use the aliases. They were and remain pointless. But they’ve remained a thorn in my side ever since. Me knowing that they are there and confusing users every now and then – not me personally, since I’m not really a Windows guy.

Fast forward to modern days: Microsoft released PowerShell as open source on github yesterday. Without much further ado, I filed a Pull-Request, asking the aliases to be removed. It is a minuscule, 4 line patch. It took way longer to git clone the repo than to make the actual patch and submit the pull request!

It took 34 minutes for them to close the pull request:

“Those aliases have existed for multiple releases, so removing them would be a breaking change.”

To be honest, I didn’t expect them to merge it easily. I figure they added those aliases for a reason back in the day and it seems unlikely that I as an outsider would just make them change that decision just like this out of the blue.

But the story didn’t end there. Obviously more Microsoft people gave the PR some attention and more comments were added. Like this:

“You bring up a great point. We added a number of aliases for Unix commands but if someone has installed those commands on WIndows, those aliases screw them up.

We need to fix this.”

So, maybe it will trigger a change anyway? The story is ongoing…

curl on windows versions

I had to ask. Just to get a notion of which Windows versions our users are actually using, so that we could get an indication which versions we still should make an effort to keep working on. As people download and run libcurl on their own, we just have no other ways to figure this out.

As always when asking a question to our audience, we can’t really know which part of our users that responded and it is probably more safe to assume that it is not a representative distribution of our actual user base but it is simply as good as it gets. A hint.

I posted about this poll on the libcurl mailing list and over twitter. I had it open for about 48 hours. We received 86 responses. Click the image below for the full res version:

windows-versions-used-for-curlSo, Windows 10, 8 and 7 are very well used and even Vista and XP clocked in fairly high on 14% and 23%. Clearly those are Windows versions we should strive to keep supported.

For Windows versions older than XP I was sort of hoping we’d get a zero, but as you can see in the graph we have users claiming to use curl on as old versions as Windows NT 4. I even checked, and it wasn’t the same two users that checked all those three oldest versions.

The “Other” marks were for Windows 2008 and 2012, and bonus points for the user who added “Other: debian 7”. It is interesting that I specifically asked for users running curl on windows to answer this survey and yet 26% responded that they don’t use Windows at all..

Changing networks with Firefox running

Short recap: I work on network code for Mozilla. Bug 939318 is one of “mine” – yesterday I landed a fix (a patch series with 6 individual patches) for this and I wanted to explain what goodness that should (might?) come from this!

diffstat

diffstat reports this on the complete patch series:

29 files changed, 920 insertions(+), 162 deletions(-)

The change set can be seen in mozilla-central here. But I guess a proper description is easier for most…

The bouncy road to inclusion

This feature set and associated problems with it has been one of the most time consuming things I’ve developed in recent years, I mean in relation to the amount of actual code produced. I’ve had it “landed” in the mozilla-inbound tree five times and yanked out again before it landed correctly (within a few hours), every time of course reverted again because I had bugs remaining in there. The bugs in this have been really tricky with a whole bunch of timing-dependent and race-like problems and me being unfamiliar with a large part of the code base that I’m working on. It has been a highly frustrating journey during periods but I’d like to think that I’ve learned a lot about Firefox internals partly thanks to this resistance.

As I write this, it has not even been 24 hours since it got into m-c so there’s of course still a risk there’s an ugly bug or two left, but then I also hope to fix the pending problems without having to revert and re-apply the whole series…

Many ways to connect to networks

Firefox Nightly screenshotIn many network setups today, you get an environment and a network “experience” that is crafted for that particular place. For example you may connect to your work over a VPN where you get your company DNS and you can access sites and services you can’t even see when you connect from the wifi in your favorite coffee shop. The same thing goes for when you connect to that captive portal over wifi until you realize you used the wrong SSID and you switch over to the access point you were supposed to use.

For every one of these setups, you get different DHCP setups passed down and you get a new DNS server and so on.

These days laptop lids are getting closed (and the machine is put to sleep) at one place to be opened at a completely different location and rarely is the machine rebooted or the browser shut down.

Switching between networks

Switching from one of the networks to the next is of course something your operating system handles gracefully. You can even easily be connected to multiple ones simultaneously like if you have both an Ethernet card and wifi.

Enter browsers. Or in this case let’s be specific and talk about Firefox since this is what I work with and on. Firefox – like other browsers – will cache images, it will cache DNS responses, it maintains connections to sites a while even after use, it connects to some sites even before you “go there” and so on. All in the name of giving the users an as good and as fast experience as possible.

The combination of keeping things cached and alive, together with the fact that switching networks brings new perspectives and new “truths” offers challenges.

Realizing the situation is new

The changes are not at all mind-bending but are basically these three parts:

  1. Make sure that we detect network changes, even if just the set of available interfaces change. Send an event for this.
  2. Make sure the necessary parts of the code listens and understands this “network topology changed” event and acts on it accordingly
  3. Consider coming back from “sleep” to be a network changed event since we just cannot be sure of the network situation anymore.

The initial work has been made for Windows only but it allows us to smoothen out any rough edges before we continue and make more platforms support this.

The network changed event can be disabled by switching off the new “network.notify.changed” preference. If you do end up feeling a need for that, I really hope you file a bug explaining the details so that we can work on fixing it!

Act accordingly

So what is acting properly? What if the network changes in a way so that your active connections suddenly can’t be used anymore due to the new rules and routing and what not? We attack this problem like this: once we get a “network changed” event, we “allow” connections to prove that they are still alive and if not they’re torn down and re-setup when the user tries to reload or whatever. For plain old HTTP(S) this means just seeing if traffic arrives or can be sent off within N seconds, and for websockets, SPDY and HTTP2 connections it involves sending an actual ping frame and checking for a response.

The internal DNS cache was a bit tricky to handle. I initially just flushed all entries but that turned out nasty as I then also killed ongoing name resolves that caused errors to get returned. Now I instead added logic that flushes all the already resolved names and it makes names “in transit” to get resolved again so that they are done on the (potentially) new network that then can return different addresses for the same host name(s).

This should drastically reduce the situation that could happen before when Firefox would basically just freeze and not want to do any requests until you closed and restarted it. (Or waited long enough for other timeouts to trigger.)

The ‘N seconds’ waiting period above is actually 5 seconds by default and there’s a new preference called “network.http.network-changed.timeout” that can be altered at will to allow some experimentation regarding what the perfect interval truly is for you.

Firefox BallInitially on Windows only

My initial work has been limited to getting the changed event code done for the Windows back-end only (since the code that figures out if there’s news on the network setup is highly system specific), and now when this step has been taken the plan is to introduce the same back-end logic to the other platforms. The code that acts on the event is pretty much generic and is mostly in place already so it is now a matter of making sure the event can be generated everywhere.

My plan is to start on Firefox OS and then see if I can assist with the same thing in Firefox on Android. Then finally Linux and Mac.

I started on Windows since Windows is one of the platforms with the largest amount of Firefox users and thus one of the most prioritized ones.

More to do

There’s separate work going on for properly detecting captive portals. You know the annoying things hotels and airports for example tend to have to force you to do some login dance first before you are allowed to use the internet at that location. When such a captive portal is opened up, that should probably qualify as a network change – but it isn’t yet.

WSAPoll is broken

Microsoft admits the WSApoll function is broken but won’t do anything about it. Unless perhaps if customers keep nagging them.

Doing portable socket programming has always meant using a bunch of #ifdefs and similar. A program needs to be built on many systems and slowly get adjusted to work really well all over. For ages, for example, Windows only supported select() and not poll() while all sensible systems[*] out there supported poll(). There are several reasons to prefer poll to select when writing code.

Then one day in 2006, Chad Charlin, a developer at Microsoft wrote the following when talking about the new WSApoll() function they introduced in Windows Vista:

Among the many improvements to the Winsock API shipping in Vista is the new WSAPoll function. Its primary purpose is to simplify the porting of a sockets application that currently uses poll() by providing an identical facility in Winsock for managing groups of sockets.

Great! Starting September 2006 curl started using it (shipped in the release curl and libcurl 7.16.0). It seemed like a huge step forward, and as Chad wrote:

If you have experience developing applications using poll(), WSAPoll will be very familiar. It is designed to behave just like poll().

Emphasis added by me. It was (of course) made to work like poll, and that’s why the API is made like that. Why would you introduce something that is almost like poll() except in minor details?

Since the new function only was available in Vista and later, it took a while until libcurl users in a more wider scale got to use it but over time Windows XP users are slowly shifting away and more and more libcurl Windows users therefore use the WSApoll based builds. Life seemed to be good. Some users noticed funny things and reported bugs we couldn’t repeat (on other platforms) but nothing really stood out and no big alarm bells went off.

During July 2012, a user of libcurl on Windows, Jan Koen Annot experienced such problems and he didn’t just sigh and move on. He rolled up his sleeves and decided to get to the bottom. Perhaps he could fix a bug or two while at it? (It seems reasonable that he thought so, I haven’t actually asked him!) What he found was however not a bug in libcurl. He found out that WSApoll did indeed not work like poll (his initial post to curl-library on the problem)! On August 1st he submitted a support issue to Microsoft about it. On August 7 we pushed the commit to curl that removed our use of WSApoll.

A few days go Jan reported back on how the case has gone, where his journey down the support alleys took him.

It turns out Microsoft already knew about this bug, which they apparently have named “Windows 8 Bugs 309411 – WSAPoll does not report failed connections”. The ticket has been resolved as Won’t Fix… (I haven’t found any public access of this.)

Jan argued for the case that since WSApoll is designed and used as a plain poll() replacement it would make sense to actually make it also work the same way:

First, it will cost much time to find out that some ‘real-life’ issue can be traced back to this WSAPoll bug. In my case we were lucky to have a regression test which triggered when we started using a slightly different cURL-library configuration on Windows. Tracing back that the test was triggered because of this bug in WSAPoll took several hours. Imagine what it would cost, if some customer in the field reported annoying delays, to trace such a vague complaint back to a bug in the WSAPoll function!

Second, even if we know beforehand about this bug in WSAPoll, then it is difficult to determine in which situations in your code you can safely use WSAPoll and in which situations you might suffer from this bug. So a better recommendation would be to simply not use WSAPoll. (…)

Third, porting code which uses the poll() function to the Windows sockets API is made more complex. The introduction of WSAPoll was meant specifically for this, so it should have compatible behavior, without a recommendation to not use it in certain circumstances.

Fourth, your recommendation will only have effect when actively promoted to developers using WSAPoll. A much better approach would be to repair the bug and publish an update. Microsoft has some nice mechanisms in place for that.

So, my conclusion is that, even if in our case the business impact may be low because we found the bug in an early stage, it is still important that Microsoft fixes the bug and publishes an update.

In my eyes all very good and sensible arguments. Perhaps not too surprisingly, these fine reasons didn’t have any particular impact on how Microsoft views this old and known bug that “has been like this forever and people are already used to it.”. It will remain closed, and Microsoft motivated this decision to Jan quite clearly and with arguments one can understand:

A discussion has been conducted around this topic and the taken decision was not to have the fix implemented due to the following reasons:

  • This issue since Vista
  • no other Microsoft customer has asked for a Hotfix since Vista timeframe
  • fixing this old issue might have some application compatibility risk (for those customers who might have somehow taken a dependency on WSAPoll failing with a timeout in the cases of connect failure as opposed to POLLERR).
  • This API will become more irrelevant as the Windows versions increase; the networking APIs will move away from classic select/poll to more advanced I/O completion mechanisms.

Argument one and two are really weak and silly. Microsoft users are very rarely complaining to Microsoft and most wouldn’t even know how to do it. Also, this problem may certainly still affect many users even if nobody has asked for a fix.

The compatibility risk is a valid point, but that’s a bit of a hard argument to have. All bugs that are about behavior will of course risk that users have adapted to the wrong behavior so a bug fix may break those. All of us who write and maintain stable APIs are used to this problem, but sticking to the buggy way of working because it has been doing this for so long is in my eyes only correct if you document this with very large letters and emphasis in all documentation: WSApoll is not fully emulating poll – beware!

The fact that they will focus more on other APIs is also understandable but besides the point. We want reliable APIs that work as documented. Applications that are Windows-only probably already very rarely use WSApoll, it will probably remain being more important for porting socket style programs to Windows.

Jan also especially highlights a funny line from this Microsoft person:

The best way to add pressure for a hotfix to be released would be to have the customers reporting it again on http://connect.microsoft.com.

Okay, so even if they have motives why they won’t fix this bug they seem to hint that if more customers nag them about it they might change their minds. Fair enough. But the users of libcurl who for five years perhaps experienced funny effects are extremely unlikely to ever report and complain to Microsoft about this. They are way more likely to complain to us, or possibly to just work around the issue somehow.

Of course, users of WSApoll can adapt to the differences and make conditional code that handles them and that could be what we end up with in the curl project in the future if we just get volunteers to adapt the code accordingly. In the mean time we’ve just reverted to the old select()-using code instead, since select() does in fact mimic the “real” select much better…

[*] = clearly Mac OS X is not a sensible system since its poll() implementation is even worse than Windows and is mostly broken or just unreliable. Subject for another blog post another time.

Update

In 2023, a user made me aware that the Microsoft documentation now says:

Note  As of Windows 10 version 2004, when a TCP socket fails to connect, (POLLHUP | POLLERR | POLLWRNORM) is indicated.

Maybe it is time to do new tests.