Tag Archives: Network

Fewer mallocs in curl

Today I landed yet another small change to libcurl internals that further reduces the number of small mallocs we do. This time the generic linked list functions got converted to become malloc-less (the way linked list functions should behave, really).

Instrument mallocs

I started out my quest a few weeks ago by instrumenting our memory allocations. This is easy since we have our own memory debug and logging system in curl since many years. Using a debug build of curl I run this script in my build dir:

#!/bin/sh
export CURL_MEMDEBUG=$HOME/tmp/curlmem.log
./src/curl http://localhost
./tests/memanalyze.pl -v $HOME/tmp/curlmem.log

For curl 7.53.1, this counted about 115 memory allocations. Is that many or a few?

The memory log is very basic. To give you an idea what it looks like, here's an example snippet:

MEM getinfo.c:70 free((nil))
MEM getinfo.c:73 free((nil))
MEM url.c:294 free((nil))
MEM url.c:297 strdup(0x559e7150d616) (24) = 0x559e73760f98
MEM url.c:294 free((nil))
MEM url.c:297 strdup(0x559e7150d62e) (22) = 0x559e73760fc8
MEM multi.c:302 calloc(1,480) = 0x559e73760ff8
MEM hash.c:75 malloc(224) = 0x559e737611f8
MEM hash.c:75 malloc(29152) = 0x559e737a2bc8
MEM hash.c:75 malloc(3104) = 0x559e737a9dc8

Check the log

I then studied the log closer and I realized that there were many small memory allocations done from the same code lines. We clearly had some rather silly code patterns where we would allocate a struct and then add that struct to a linked list or a hash and that code would then subsequently add yet another small struct and similar - and then often do that in a loop.  (I say we here to avoid blaming anyone, but of course I myself am to blame for most of this...)

Those two allocations would always happen in pairs and they would be freed at the same time. I decided to address those. Doing very small (less than say 32 bytes) allocations is also wasteful just due to the very large amount of data in proportion that will be used just to keep track of that tiny little memory area (within the malloc system). Not to mention fragmentation of the heap.

So, fixing the hash code and the linked list code to not use mallocs were immediate and easy ways to remove over 20% of the mallocs for a plain and simple 'curl http://localhost' transfer.

At this point I sorted all allocations based on size and checked all the smallest ones. One that stood out was one we made in curl_multi_wait(), a function that is called over and over in a typical curl transfer main loop. I converted it over to use the stack for most typical use cases. Avoiding mallocs in very repeatedly called functions is a good thing.

Recount

Today, the script from above shows that the same "curl localhost" command is down to 80 allocations from the 115 curl 7.53.1 used. Without sacrificing anything really. An easy 26% improvement. Not bad at all!

But okay, since I modified curl_multi_wait() I wanted to also see how it actually improves things for a slightly more advanced transfer. I took the multi-double.c example code, added the call to initiate the memory logging, made it uses curl_multi_wait() and had it download these two URLs in parallel:

http://www.example.com/
http://localhost/512M

The second one being just 512 megabytes of zeroes and the first being a 600 bytes something public html page. Here's the count-malloc.c code.

First, I brought out 7.53.1 and built the example against that and had the memanalyze script check it:

Mallocs: 33901
Reallocs: 5
Callocs: 24
Strdups: 31
Wcsdups: 0
Frees: 33956
Allocations: 33961
Maximum allocated: 160385

Okay, so it used 160KB of memory totally and it did over 33,900 allocations. But ok, it downloaded over 512 megabytes of data so it makes one malloc per 15KB of data. Good or bad?

Back to git master, the version we call 7.54.1-DEV right now - since we're not quite sure which version number it'll become when we release the next release. It can become 7.54.1 or 7.55.0, it has not been determined yet. But I digress, I ran the same modified multi-double.c example again, ran memanalyze on the memory log again and it now reported...

Mallocs: 69
Reallocs: 5
Callocs: 24
Strdups: 31
Wcsdups: 0
Frees: 124
Allocations: 129
Maximum allocated: 153247

I had to look twice. Did I do something wrong? I better run it again just to double-check. The results are the same no matter how many times I run it...

33,961 vs 129

curl_multi_wait() is called a lot of times in a typical transfer, and it had at least one of the memory allocations we normally did during a transfer so removing that single tiny allocation had a pretty dramatic impact on the counter. A normal transfer also moves things in and out of linked lists and hashes a bit, but they too are mostly malloc-less now. Simply put: the remaining allocations are not done in the transfer loop so they're way less important.

The old curl did 263 times the number of allocations the current does for this example. Or the other way around: the new one does 0.37% the number of allocations the old one did...

As an added bonus, the new one also allocates less memory in total as it decreased that amount by 7KB (4.3%).

Are mallocs important?

In the day and age with many gigabytes of RAM and all, does a few mallocs in a transfer really make a notable difference for mere mortals? What is the impact of 33,832 extra mallocs done for 512MB of data?

To measure what impact these changes have, I decided to compare HTTP transfers from localhost and see if we can see any speed difference. localhost is fine for this test since there's no network speed limit, but the faster curl is the faster the download will be. The server side will be equally fast/slow since I'll use the same set for both tests.

I built curl 7.53.1 and curl 7.54.1-DEV identically and ran this command line:

curl http://localhost/80GB -o /dev/null

80 gigabytes downloaded as fast as possible written into the void.

The exact numbers I got for this may not be totally interesting, as it will depend on CPU in the machine, which HTTP server that serves the file and optimization level when I build curl etc. But the relative numbers should still be highly relevant. The old code vs the new.

7.54.1-DEV repeatedly performed 30% faster! The 2200MB/sec in my build of the earlier release increased to over 2900 MB/sec with the current version.

The point here is of course not that it easily can transfer HTTP over 20 Gigabit/sec using a single core on my machine - since there are very few users who actually do that speedy transfers with curl. The point is rather that curl now uses less CPU per byte transferred, which leaves more CPU over to the rest of the system to perform whatever it needs to do. Or to save battery if the device is a portable one.

On the cost of malloc: The 512MB test I did resulted in 33832 more allocations using the old code. The old code transferred HTTP at a rate of about 2200MB/sec. That equals 145,827 mallocs/second - that are now removed! A 600 MB/sec improvement means that curl managed to transfer 4300 bytes extra for each malloc it didn't do, each second.

Was removing these mallocs hard?

Not at all, it was all straight forward. It is however interesting that there's still room for changes like this in a project this old. I've had this idea for some years and I'm glad I finally took the time to make it happen. Thanks to our test suite I could do this level of "drastic" internal change with a fairly high degree of confidence that I don't introduce too terrible regressions. Thanks to our APIs being good at hiding internals, this change could be done completely without changing anything for old or new applications.

(Yeah I haven't shipped the entire change in a release yet so there's of course a risk that I'll have to regret my "this was easy" statement...)

Caveats on the numbers

There have been 213 commits in the curl git repo from 7.53.1 till today. There's a chance one or more other commits than just the pure alloc changes have made a performance impact, even if I can't think of any.

More?

Are there more "low hanging fruits" to pick here in the similar vein?

Perhaps. We don't do a lot of performance measurements or comparisons so who knows, we might do more silly things that we could stop doing and do even better. One thing I've always wanted to do, but never got around to, was to add daily "monitoring" of memory/mallocs used and how fast curl performs in order to better track when we unknowingly regress in these areas.

Addendum, April 23rd

(Follow-up on some comments on this article that I've read on hacker news, Reddit and elsewhere.)

Someone asked and I ran the 80GB download again with 'time'. Three times each with the old and the new code, and the "middle" run of them showed these timings:

Old code:

real    0m36.705s
user    0m20.176s
sys     0m16.072s

New code:

real    0m29.032s
user    0m12.196s
sys     0m12.820s

The server that hosts this 80GB file is a standard Apache 2.4.25, and the 80GB file is stored on an SSD. The CPU in my machine is a core-i7 3770K 3.50GHz.

Someone also mentioned alloca() as a solution for one of the patches, but alloca() is not portable enough to work as the sole solution, meaning we would have to do ugly #ifdef if we would want to use alloca() there.

poll on mac 10.12 is broken

When Mac OS X first launched they did so without an existing poll function. They later added poll() in Mac OS X 10.3, but we quickly discovered that it was broken (it returned a non-zero value when asked to wait for nothing) so in the curl project we added a check in configure for that and subsequently avoided using poll() in all OS X versions to and including Mac OS 10.8 (Darwin 12). The code would instead switch to the alternative solution based on select() for these platforms.

With the release of Mac OS X 10.9 "Mavericks" in October 2013, Apple had fixed their poll() implementation and we've built libcurl to use it since with no issues at all. The configure script picks the correct underlying function to use.

Enter macOS 10.12 (yeah, its not called OS X anymore) "Sierra", released in September 2016. Quickly we discovered that poll() once against did not act like it should and we are back to disabling the use of it in preference to the backup solution using select().

The new error looks similar to the old problem: when there's nothing to wait for and we ask poll() to wait N milliseconds, the 10.12 version of poll() returns immediately without waiting. Causing busy-loops. The problem has been reported to Apple and its Radar number is 28372390. (There has been no news from them on how they plan to act on this.)

poll() is defined by POSIX and The Single Unix Specification it specifically says:

If none of the defined events have occurred on any selected file descriptor, poll() waits at least timeout milliseconds for an event to occur on any of the selected file descriptors.

We pushed a configure check for this in curl, to be part of the upcoming 7.51.0 release. I'll also show you a small snippet you can use stand-alone below.

Apple is hardly alone in the broken-poll department. Remember how Windows' WSApoll is broken?

Here's a little code snippet that can detect the 10.12 breakage:

#include <poll.h>
#include <stdio.h>
#include <sys/time.h>

int main(void)
{
  struct timeval before, after;
  int rc;
  size_t us;

  gettimeofday(&before, NULL);
  rc = poll(NULL, 0, 500);
  gettimeofday(&after, NULL);

  us = (after.tv_sec - before.tv_sec) * 1000000 +
    (after.tv_usec - before.tv_usec);

  if(us < 400000) {
    puts("poll() is broken");
    return 1;
  }
  else {
    puts("poll() works");
  }
  return 0;
}

Follow-up, January 2017

This poll bug has been confirmed fixed in the macOS 10.12.2 update (released on December 13, 2016), but I've found no official mention or statement about this fact.

Summers are for HTTP

stockholm castle and ship
Stockholm City, as photographed by Michael Caven

In July 2015, 40-something HTTP implementers and experts of the world gathered in the city of Münster, Germany, to discuss nitty gritty details about the HTTP protocol during four intense days. Representatives for major browsers, other well used HTTP tools and the most popular HTTP servers were present. We discussed topics like how HTTP/2 had done so far, what we thought we should fix going forward and even some early blue sky talk about what people could potentially see being subjects to address in a future HTTP/3 protocol.

You can relive the 2015 version somewhat from my daily blog entries from then that include a bunch of details of what we discussed: day one, two, three and four.

http workshopThe HTTP Workshop was much appreciated by the attendees and it is now about to be repeated. In the summer of 2016, the HTTP Workshop is again taking place in Europe, but this time as a three-day event slightly further up north: in the capital of Sweden and my home town: Stockholm. During 25-27 July 2016, we intend to again dig in deep.

If you feel this is something for you, then please head over to the workshop site and submit your proposal and show your willingness to attend. This year, I'm also joining the Program Committee and I've signed up for arranging some of the local stuff required for this to work out logistically.

The HTTP Workshop 2015 was one of my favorite events of last year. I'm now eagerly looking forward to this year's version. It'll be great to meet you here!

Stockholm
The city of Stockholm in summer sunshine

daniel weekly

daniel weekly screenshot

My series of weekly videos, in lack of a better name called daniel weekly, reached episode 35 today. I'm celebrating this fact by also adding an RSS-feed for those of you who prefer to listen to me in an audio-only version.

As an avid podcast listener myself, I can certainly see how this will be a better fit to some. Most of these videos are just me talking anyway so losing the visual shouldn't be much of a problem.

A typical episode

I talk about what I work on in my open source projects, which means a lot of curl stuff and occasional stuff from my work on Firefox for Mozilla. I also tend to mention events I attend and HTTP/networking developments I find interesting and grab my attention. Lots of HTTP/2 talk for example. I only ever express my own personal opinions.

It is generally an extremely geeky and technical video series.

Every week I mention a (curl) "bug of the week" that allows me to joke or rant about the bug in question or just mention what it is about. In episode 31 I started my "command line options of the week" series in which I explain one or a few curl command line options with some amount of detail. There are over 170 options so the series is bound to continue for a while. I've explained ten options so far.

I've set a limit for myself and I make an effort to keep the episodes shorter than 20 minutes. I've not succeed every time.

Analytics

The 35 episodes have been viewed over 17,000 times in total. Episode two is the most watched individual one with almost 1,500 views.

Right now, my channel has 190 subscribers.

The top-3 countries that watch my videos: USA, Sweden and UK.

Share of viewers that are female: 3.7%

Changing networks with Linux

A rather long time ago I blogged about my work to better deal with changing networks while Firefox is running, and the change was then pushed for Android and I subsequently pushed the same functionality for Firefox on Mac.

Today I've landed yet another change, which detects network changes on Firefox OS and Linux.

Firefox Nightly screenshotAs Firefox OS uses a Linux kernel, I ended up doing the same fix for both the Firefox OS devices as for Firefox on Linux desktop: I open a socket in the AF_NETLINK family and listen on the stream of messages the kernel sends when there are network updates. This way we're told when the routing tables update or when we get a new IP address etc. I consider this way better than the NotifyIpInterfaceChange() API Windows provides, as this allows us to filter what we're interested in. The windows API makes that rather complicated and in fact a lot of the times when we get the notification on windows it isn't clear to me why!

The Mac API way is what I would consider even more obscure, but then I'm not at all used to their way of doing things and how you add things to the event handlers etc.

The journey to the landing of this particular patch was once again long and bumpy and full of sweat in this tradition that seem seems to be my destiny, and this time I ran into problems with the Firefox OS emulator which seems to have some interesting bugs that cause my code to not work properly and as a result of that our automated tests failed: occasionally data sent over a pipe or socketpair doesn't end up in the receiving end. In my case this means that my signal to the child thread to die would sometimes not be noticed and thus the thread wouldn't exit and die as intended.

I ended up implementing a work-around that makes it work even if the emulator eats the data by also checking a shared should-I-shutdown-now flag every once in a while. For more specific details on that, see the bug.

Changing networks on Mac with Firefox

Not too long ago I blogged about my work to better deal with changing networks while Firefox is running. That job was basically two parts.

A) generic code to handle receiving such a network-changed event and then

B) a platform specific part that was for Windows that detected such a network change and sent the event

Today I've landed yet another fix for part B called bug 1079385, which detects network changes for Firefox on Mac OS X.

mac miniI've never programmed anything before on the Mac so this was sort of my christening in this environment. I mean, I've written countless of POSIX compliant programs including curl and friends that certainly builds and runs on Mac OS just fine, but I never before used the Mac-specific APIs to do things.

I got a mac mini just two weeks ago to work on this. Getting it up, prepared and my first Firefox built from source took all-in-all less than three hours. Learning the details of the mac API world was much more trouble and can't say that I'm mastering it now either but I did find myself at least figuring out how to detect when IP addresses on the interfaces change and a changed address is a pretty good signal that the network changed somehow.

Pretending port zero is a normal one

Speaking the TCP protocol, we communicate between "ports" in the local and remote ends. Each of these port fields are 16 bits in the protocol header so they can hold values between 0 - 65535. (IPv4 or IPv6 are the same here.) We usually do HTTP on port 80 and we do HTTPS on port 443 and so on. We can even play around and use them on various other custom ports when we feel like it.

But what about port 0 (zero) ? Sure, IANA lists the port as "reserved" for TCP and UDP but that's just a rule in a list of ports, not actually a filter implemented by anyone.

In the actual TCP protocol port 0 is nothing special but just another number. Several people have told me "it is not supposed to be used" or that it is otherwise somehow considered bad to use this port over the internet. I don't really know where this notion comes from more than that IANA listing.

Frank Gevaerts helped me perform some experiments with TCP port zero on Linux.

In the Berkeley sockets API widely used for doing TCP communications, port zero has a bit of a harder situation. Most of the functions and structs treat zero as just another number so there's virtually no problem as a client to connect to this port using for example curl. See below for a printout from a test shot.

Running a TCP server on port 0 however, is tricky since the bind() function uses a zero in the port number to mean "pick a random one" (I can only assume this was a mistake done eons ago that can't be changed). For this test, a little iptables trickery was run so that incoming traffic on TCP port 0 would be redirected to port 80 on the server machine, so that we didn't have to patch any server code.

Entering a URL with port number zero to Firefox gets this message displayed:

This address uses a network port which is normally used for purposes other than Web browsing. Firefox has canceled the request for your protection.

... but Chrome accepts it and tries to use it as given.

The only little nit that remains when using curl against port 0 is that it seems glibc's getpeername() assumes this is an illegal port number and refuses to work. I marked that line in curl's output in red below just to highlight it for you. The actual source code with this check is here. This failure is not lethal for libcurl, it will just have slightly less info but will still continue to work. I claim this is a glibc bug.

$ curl -v http://10.0.0.1:0 -H "Host: 10.0.0.1"
* Rebuilt URL to: http://10.0.0.1:0/
* Hostname was NOT found in DNS cache
* Trying 10.0.0.1...
* getpeername() failed with errno 107: Transport endpoint is not connected
* Connected to 10.0.0.1 () port 0 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.38.1-DEV
> Accept: */*
> Host: 10.0.0.1
>
< HTTP/1.1 200 OK
< Date: Fri, 24 Oct 2014 09:08:02 GMT
< Server: Apache/2.4.10 (Debian)
< Last-Modified: Fri, 24 Oct 2014 08:48:34 GMT
< Content-Length: 22
< Content-Type: text/html

 

<html>testpage</html>

Why doing this experiment? Just for fun to to see if it worked.

(Discussion and comments on this post is also found at Reddit.)

Changing networks with Firefox running

Short recap: I work on network code for Mozilla. Bug 939318 is one of "mine" - yesterday I landed a fix (a patch series with 6 individual patches) for this and I wanted to explain what goodness that should (might?) come from this!

diffstat

diffstat reports this on the complete patch series:

29 files changed, 920 insertions(+), 162 deletions(-)

The change set can be seen in mozilla-central here. But I guess a proper description is easier for most...

The bouncy road to inclusion

This feature set and associated problems with it has been one of the most time consuming things I've developed in recent years, I mean in relation to the amount of actual code produced. I've had it "landed" in the mozilla-inbound tree five times and yanked out again before it landed correctly (within a few hours), every time of course reverted again because I had bugs remaining in there. The bugs in this have been really tricky with a whole bunch of timing-dependent and race-like problems and me being unfamiliar with a large part of the code base that I'm working on. It has been a highly frustrating journey during periods but I'd like to think that I've learned a lot about Firefox internals partly thanks to this resistance.

As I write this, it has not even been 24 hours since it got into m-c so there's of course still a risk there's an ugly bug or two left, but then I also hope to fix the pending problems without having to revert and re-apply the whole series...

Many ways to connect to networks

Firefox Nightly screenshotIn many network setups today, you get an environment and a network "experience" that is crafted for that particular place. For example you may connect to your work over a VPN where you get your company DNS and you can access sites and services you can't even see when you connect from the wifi in your favorite coffee shop. The same thing goes for when you connect to that captive portal over wifi until you realize you used the wrong SSID and you switch over to the access point you were supposed to use.

For every one of these setups, you get different DHCP setups passed down and you get a new DNS server and so on.

These days laptop lids are getting closed (and the machine is put to sleep) at one place to be opened at a completely different location and rarely is the machine rebooted or the browser shut down.

Switching between networks

Switching from one of the networks to the next is of course something your operating system handles gracefully. You can even easily be connected to multiple ones simultaneously like if you have both an Ethernet card and wifi.

Enter browsers. Or in this case let's be specific and talk about Firefox since this is what I work with and on. Firefox - like other browsers - will cache images, it will cache DNS responses, it maintains connections to sites a while even after use, it connects to some sites even before you "go there" and so on. All in the name of giving the users an as good and as fast experience as possible.

The combination of keeping things cached and alive, together with the fact that switching networks brings new perspectives and new "truths" offers challenges.

Realizing the situation is new

The changes are not at all mind-bending but are basically these three parts:

  1. Make sure that we detect network changes, even if just the set of available interfaces change. Send an event for this.
  2. Make sure the necessary parts of the code listens and understands this "network topology changed" event and acts on it accordingly
  3. Consider coming back from "sleep" to be a network changed event since we just cannot be sure of the network situation anymore.

The initial work has been made for Windows only but it allows us to smoothen out any rough edges before we continue and make more platforms support this.

The network changed event can be disabled by switching off the new "network.notify.changed" preference. If you do end up feeling a need for that, I really hope you file a bug explaining the details so that we can work on fixing it!

Act accordingly

So what is acting properly? What if the network changes in a way so that your active connections suddenly can't be used anymore due to the new rules and routing and what not? We attack this problem like this: once we get a "network changed" event, we "allow" connections to prove that they are still alive and if not they're torn down and re-setup when the user tries to reload or whatever. For plain old HTTP(S) this means just seeing if traffic arrives or can be sent off within N seconds, and for websockets, SPDY and HTTP2 connections it involves sending an actual ping frame and checking for a response.

The internal DNS cache was a bit tricky to handle. I initially just flushed all entries but that turned out nasty as I then also killed ongoing name resolves that caused errors to get returned. Now I instead added logic that flushes all the already resolved names and it makes names "in transit" to get resolved again so that they are done on the (potentially) new network that then can return different addresses for the same host name(s).

This should drastically reduce the situation that could happen before when Firefox would basically just freeze and not want to do any requests until you closed and restarted it. (Or waited long enough for other timeouts to trigger.)

The 'N seconds' waiting period above is actually 5 seconds by default and there's a new preference called "network.http.network-changed.timeout" that can be altered at will to allow some experimentation regarding what the perfect interval truly is for you.

Firefox BallInitially on Windows only

My initial work has been limited to getting the changed event code done for the Windows back-end only (since the code that figures out if there's news on the network setup is highly system specific), and now when this step has been taken the plan is to introduce the same back-end logic to the other platforms. The code that acts on the event is pretty much generic and is mostly in place already so it is now a matter of making sure the event can be generated everywhere.

My plan is to start on Firefox OS and then see if I can assist with the same thing in Firefox on Android. Then finally Linux and Mac.

I started on Windows since Windows is one of the platforms with the largest amount of Firefox users and thus one of the most prioritized ones.

More to do

There's separate work going on for properly detecting captive portals. You know the annoying things hotels and airports for example tend to have to force you to do some login dance first before you are allowed to use the internet at that location. When such a captive portal is opened up, that should probably qualify as a network change - but it isn't yet.

decopperfied

copper

Today I disconnected my ADSL modem through which I still had the landline telephone service running. I've cancelled the service so at the end of the month it will die and I'm now instead signing up to one of them lesser known companies that offer cheap IP-telephony that is network independent: Affinity telecom. Staying with only phone service over the copper was not a good idea since the cheapest service is still expensive over that medium.

Why have a landline phone at all? I've decided to stick with a landline telephone for a few reasons: first, our phone number is distributed widely and it is convenient to keep it working and keeping it reach our house instead of a specific person. Secondly, we have two kids who like to phone at times and they can use this fine and it'll end up cheaper than if they'd use their own mobile phones. And thirdly, the parents-in-law factor. My parents and my wife's parents etc like calling landlines instead of mobile phones, I think partly due to old habits but also partly because it is cheaper for them...

Ping Communication Voice Catcher 201E

When I researched which service to pick I discovered that not a single operator exists in Sweden that only charges for usage. They all have a minimum monthly fee, so I went with one of the cheapest I could find on prisjakt for my usage pattern.

I simply yanked the RJ11 from the ADSL modem and inserted into my new "Ping Communications Voice Catcher 201E" device (see picture) that I received. The instructions that come with the device say I should connect it directly to internet and then let my existing LAN traffic route through it. I don't want to add yet another box between me and the internet and I fear that a cheapo box like this might cause problems in one way or another if I do, so I just plugged this new toy into my router and after I while I could happily confirm that it worked just as nicely. It got an IP from my DHCP server and I can call my old-style analog phone now (I love number portability) and I can use it to call out to my mobile.

I'm curious to see how good/bad this is going to work...

Update: It didn't work to make the box a regular DHCP client. Even if I made it into a DMZ it still wouldn't work to accept calls. I would only get a signal and everything, but once I answered the phone there was no sound. In the end I moved it to sit between "internet" and my local router and now my phone seems to be doing fine.