{"id":14145,"date":"2020-05-30T23:22:02","date_gmt":"2020-05-30T21:22:02","guid":{"rendered":"https:\/\/daniel.haxx.se\/blog\/?p=14145"},"modified":"2020-05-30T23:43:10","modified_gmt":"2020-05-30T21:43:10","slug":"on-demand-buffer-alloc-in-libcurl","status":"publish","type":"post","link":"https:\/\/daniel.haxx.se\/blog\/2020\/05\/30\/on-demand-buffer-alloc-in-libcurl\/","title":{"rendered":"on-demand buffer alloc in libcurl"},"content":{"rendered":"\n<p><em>Okay, so I&#8217;ll delve a bit deeper into the libcurl internals than usual here. Beware of low-level talk!<\/em><\/p>\n\n\n\n<p>There&#8217;s a never-ending stream of things to polish and improve in a software project and curl is no exception. Let me tell you what I fell over and worked on the other day.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Smaller than what holds Linux<\/h2>\n\n\n\n<p>We have users who are running curl on tiny devices, often put under the label of Internet of Things, IoT. These small systems typically have maybe a megabyte or two of ram and flash and are often too small to even run Linux. They typically run one of the many different RTOS flavors instead.<\/p>\n\n\n\n<p>It is with these users in mind I&#8217;ve worked on the <a href=\"https:\/\/daniel.haxx.se\/blog\/2019\/05\/11\/tiny-curl\/\">tiny-curl<\/a> effort. To make curl a viable alternative even there. And believe me, the world of RTOSes and IoT is literally filled with really low quality and half-baked HTTP client implementations. Often certainly <em>very<\/em> small but equally as often with really horrible shortcuts or protocol misunderstandings in them.<\/p>\n\n\n\n<p>Going with curl in your IoT device means going with decades of experience and reliability. But for libcurl to be an option for many IoT devices, a libcurl build has to be able to get really small. Both the footprint on storage but also in the required amount of dynamic memory used while executing.<\/p>\n\n\n\n<p>Being feature-packed and attractive for the high-end users and yet at the same time being able to get really small for the low-end is a challenge. And who doesn&#8217;t like a good challenge?<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Reduce reduce reduce<\/h2>\n\n\n\n<p>I&#8217;ve set myself on a quest to make it possible to build libcurl smaller than before and to use less dynamic memory. The first tiny-curl releases were only the beginning and I already then aimed for a libcurl + TLS library within 100K storage size. I believe that goal was met, but I also think there&#8217;s more to gain.<\/p>\n\n\n\n<p>I will make tiny-curl smaller and use less memory by making sure that when we disable parts of the library or disable specific features and protocols at build-time, they should no longer affect storage or dynamic memory sizes &#8211; as far as possible. Tiny-curl is a good step in this direction but the job isn&#8217;t done yet &#8211; there&#8217;s more &#8220;dead meat&#8221; to carve off.<\/p>\n\n\n\n<p>One example is my current work (<a href=\"https:\/\/github.com\/curl\/curl\/pull\/5466\">PR #5466<\/a>) on making sure there&#8217;s much less proxy remainders left when libcurl is built without support for such. This makes it smaller on disk but also makes it use less dynamic memory.<\/p>\n\n\n\n<p>To decrease the maximum amount of allocated memory for a typical transfer, and in fact for all kinds of transfers, we&#8217;ve just switched to a model with on-demand download buffer allocations (<a href=\"https:\/\/github.com\/curl\/curl\/pull\/5472\">PR #5472<\/a>). Previously, the download buffer for a transfer was allocated at the same time as the handle (in the <a href=\"https:\/\/curl.haxx.se\/libcurl\/c\/curl_easy_init.html\">curl_easy_init<\/a> call) and kept allocated until the handle was cleaned up again (with <a href=\"https:\/\/curl.haxx.se\/libcurl\/c\/curl_easy_cleanup.html\">curl_easy_cleanup<\/a>). Now, we instead lazy-allocate it first when the transfer starts, and we free it again immediately when the transfer is over.<\/p>\n\n\n\n<p>It has several benefits. For starters, the previous initial allocation would always first allocate the buffer using the default size, and the user could then set a smaller size that would realloc a new smaller buffer. That double allocation was of course unfortunate, especially on systems that really do want to avoid mallocs and want a minimum buffer size.<\/p>\n\n\n\n<p>The &#8220;price&#8221; of handling many handles drastically went down, as only transfers that are actively in progress will actually have a receive buffer allocated.<\/p>\n\n\n\n<p>A positive side-effect of this refactor, is that we could now also make sure the internal &#8220;closure handle&#8221; actually doesn&#8217;t use any buffer allocation at all now. That&#8217;s the &#8220;spare&#8221; handle we create internally to be able to associate certain connections with, when there&#8217;s no user-provided handles left but we need to for example close down an FTP connection as there&#8217;s a command\/response procedure involved.<\/p>\n\n\n\n<p>Downsides? It means a slight increase in number of allocations and frees of dynamic memory for doing new transfers. We do however deem this a sensible trade-off.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Numbers<\/h2>\n\n\n\n<p>I always hesitate to bring up numbers since it will vary so much depending on your particular setup, build, platform and more. But okay, with that said, let&#8217;s take a look at the numbers I could generate on my dev machine. A now rather dated x86-64 machine running Linux.<\/p>\n\n\n\n<p>For measurement, I perform a standard single transfer getting a 8GB file from http:\/\/localhost, written to stderr:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">curl -s http:\/\/localhost\/8GB -o \/dev\/null<\/pre>\n\n\n\n<p>With all the memory calls instrumented, my script counts the number of memory alloc\/realloc\/free\/etc calls made as well as the maximum total memory allocation used.<\/p>\n\n\n\n<p>The curl tool itself sets the download buffer size to a &#8220;whopping&#8221; 100K buffer (as it actually makes a difference to users doing for example transfers from localhost or other <em>really<\/em> high bandwidth setups or when doing SFTP over high-latency links). libcurl is more conservative and defaults it to 16K.<\/p>\n\n\n\n<p>This command line of course creates a single easy handle and makes a single HTTP transfer without any redirects.<\/p>\n\n\n\n<p>Before the <em>lazy-alloc<\/em> change, this operation would peak at <strong>168978 bytes<\/strong> allocated. As you can see, the 100K receive buffer is a significant share of the memory used.<\/p>\n\n\n\n<p>After the alloc work, the exact same transfer instead ended up using <strong>136188 bytes<\/strong>.<\/p>\n\n\n\n<p>102,400 bytes is for the receive buffer, meaning we reduced the amount of &#8220;extra&#8221; allocated data from 66578 to 33807. <strong>By 49%<\/strong><\/p>\n\n\n\n<p>Even tinier tiny-curl: in a feature-stripped tiny-curl build that does HTTPS GET only with a mere 1K receive buffer, the total maximum amount of dynamically allocated memory is now below 25K.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Caveats<\/h2>\n\n\n\n<p>The numbers mentioned above only count allocations done by curl code. It does not include memory used by system calls or, when used, third party libraries.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Landed<\/h2>\n\n\n\n<p>The changes mentioned in this blog post have landed in the master branch and will ship in the next release: curl 7.71.0.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Okay, so I&#8217;ll delve a bit deeper into the libcurl internals than usual here. Beware of low-level talk! There&#8217;s a never-ending stream of things to polish and improve in a software project and curl is no exception. Let me tell you what I fell over and worked on the other day. Smaller than what holds &hellip; <a href=\"https:\/\/daniel.haxx.se\/blog\/2020\/05\/30\/on-demand-buffer-alloc-in-libcurl\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">on-demand buffer alloc in libcurl<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":5,"featured_media":14190,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[33,35,438,481],"class_list":["post-14145","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-curl","tag-curl-and-libcurl","tag-embedded","tag-internet-of-things","tag-tiny-curl"],"_links":{"self":[{"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/posts\/14145","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/comments?post=14145"}],"version-history":[{"count":26,"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/posts\/14145\/revisions"}],"predecessor-version":[{"id":14198,"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/posts\/14145\/revisions\/14198"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/media\/14190"}],"wp:attachment":[{"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/media?parent=14145"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/categories?post=14145"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/tags?post=14145"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}