keeping tabs on curl’s memory use

One of the harder things to look out for in a software project is slow or gradual decay over a long period of time. Like if we gradually make a library 1% slower or use 2% more memory every other month.

Sometimes it is totally acceptable to make code slower and use more memory because everything we do is a balance and sometimes we want new features or improved performance that might have to use more memory etc.

We don’t want the growth or slowing down to happen without it being an explicit decision and known trade-off. If we know what the trade-off is, we can reconsider and turn down a feature because we deem the cost too high. Or we accept it because the feature is useful.

In the curl project we make an concerned effort to keep memory use and allocations to a minimum and we are proud of our work. But we also continuously try to encourage and involve more contributors and it is easy to sometimes slip and do something in the code that maybe is not the wisest idea – memory wise.

Memory

In curl we have recently introduced a number of different checks to help us remain aware of the exact memory allocation and use situation.

An added complication for us is that curl builds and runs on numerous architectures, with lots of features on and off and with different sets of third party libraries. It means that internal struct sizes are rarely exactly the same on two different builds and code paths differ that may allocate data differently. We must make all memory limit checks with a certain amount of flexibility and margin.

Per test-case

We have introduced a system where we can specify exact limits for a single test case: this test may not do more than N allocations and it may not have more than Z bytes allocated concurrently.

We do this in debug-builds only where we have wrapper functions for all memory functions used in curl so doing this accounting is quite easy.

The idea is to set fairly strict memory limits in a number of selected typical test cases. We don’t use them in all test cases because when we in the future deem we want to allow increased memory use, it could easily become inconvenient and burdensome.

There is also default limits brought with this, so that tests that really need many allocations (more than 1,000) or unusually large amount of memory (more than 1MB concurrently) have to declare that in the test case or fail because of the suspicious behavior.

Primary struct sizes

A second size check was added in a new dedicated test case: it verifies that a number of important internal structs are sized within their allowed limits.

Keeping such struct sizes in check is important because we allocate a certain struct for each easy handle, each multi handle and for each concurrent connection etc. Because applications sometimes want to use a lot of those (from hundreds to several thousands), it is important that we keep them small.

This new test case makes sure that we don’t accidentally enlarge these structs and make users suffer. Maybe as a secondary effect, we can also use this test case and come back in ten years and see how much the sizes changed.

Memory allocated by others

While we work hard on reducing and keeping curl’s own memory use in check, curl also normally uses a number of third party libraries for fundamental parts of its operations: for TLS, compression and more. The memory monitoring and checks I write about in this post are however explicitly designed and intended to not check or include memory allocated and used by such third parties because we cannot easily affect them. It is up to every such library’s dev team to work on their code towards their own goals that may not be the same as ours.

This is of course frustrating at the same time. Downloading https://curl.se/ using the curl tool uses around 134 allocations done from curl and libcurl code. If curl is built with OpenSSL 3.5.0, the total amount of allocations such a command perform is over 54,000. Down from OpenSSL 3.4.1 which used over 200K!

Different TLS libraries clearly have totally different characteristics here. Rustls for example performed the same simple use case needing just 2,176 allocations and a much smaller peak usage at the same time.

My friends working on wolfSSL have several different configure options to tweak and optimize the malloc patterns. The full build I tested with used more allocations than OpenSSL 3.5.0 but less than half the peak amount.

Still worth it

I am a strong believer in each project making their best and keeping their own backyard clean and tidy.

Sure, curl does less than 0.3% of the allocations by itself when downloading https://curl.se using the latest OpenSSL version for TLS. This is still not a reason for us to be sloppy or to lower our guard rails. Instead I hope that we can lead by example.

This is what makes us proud as engineers and it makes our users trust us and appreciate what we ship.

People can use other TLS libraries. TLS library developers can improve their allocation patterns. And perhaps most importantly: in many cases the number of allocations or amount of used memory do not matter much.

Transfer speed checks next?

We want to add similar checks and verification for transfer speeds but that is an entirely different challenge and something that is being worked on separately from these changes.

Credits

Top image by LoggaWiggler from Pixabay

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

daniel.haxx.se