{"id":13311,"date":"2021-09-27T09:28:34","date_gmt":"2021-09-27T07:28:34","guid":{"rendered":"https:\/\/daniel.haxx.se\/blog\/?p=13311"},"modified":"2021-09-27T09:38:42","modified_gmt":"2021-09-27T07:38:42","slug":"common-mistakes-when-using-libcurl","status":"publish","type":"post","link":"https:\/\/daniel.haxx.se\/blog\/2021\/09\/27\/common-mistakes-when-using-libcurl\/","title":{"rendered":"Common mistakes when using libcurl"},"content":{"rendered":"\n<p>I&#8217;ve been traveling this road for a while. Here&#8217;s my collection of <strong>15 of the most common mistakes<\/strong> and issues people will run into when writing applications and services that use libcurl. I&#8217;ve also done recorded presentations on this topic that you can watch if you prefer that medium.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"common mistakes when using libcurl\" width=\"474\" height=\"267\" src=\"https:\/\/www.youtube.com\/embed\/0KfDdIAirSI?start=3264&#038;feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<p>Most of these issues are shared among application authors independently of what language the program is written in &#8211; as libcurl bindings tend to be very thin and more or less expose the API in the same way the C API does. Some mistakes are however C and C++ specific.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2000\" height=\"1125\" src=\"https:\/\/daniel.haxx.se\/blog\/wp-content\/uploads\/2021\/09\/slide-libcurl-mistakes.jpg\" alt=\"\" class=\"wp-image-17221\"\/><figcaption>15 mistakes to look out for when using libcurl<\/figcaption><\/figure><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">1. Skipping the documentation!<\/h2>\n\n\n\n<p>Nothing in my list here is magic, hidden or unknown. Everything is documented and well-known. The by far most common mistakes are done by people not reading up, rushing a bit too fast and sometimes making a little too many assumptions. Of course there&#8217;s also occasional copy-and-pasting from bad examples going on. The web is full of questionable source snippets to get inspiration from.<\/p>\n\n\n\n<p>We spend a <em>significant<\/em> amount of time and energy on making sure the documentation is accurate, detailed and thorough. Many mistakes can be avoided by simply reading up a little more first!<\/p>\n\n\n\n<p>All the several hundred man pages and more are available in <a href=\"https:\/\/curl.se\/libcurl\/\">the libcurl section of the curl web site<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">2. Failure to check return codes<\/h2>\n\n\n\n<p>This sounds like such an obvious thing but we keep seeing this happen over and over again: users write code that uses libcurl functions but they don&#8217;t check the return codes.<\/p>\n\n\n\n<p>If libcurl detects an error it will return an error code. Whenever libcurl doesn&#8217;t do what you expected it to do, it very often turns out to have returned an error code to the application that explains the behavior. We work hard at making sure libcurl functions return the correct return codes!<\/p>\n\n\n\n<p>The <a href=\"https:\/\/curl.se\/libcurl\/c\/example.html\">libcurl examples<\/a> we host on the curl web site (and ship in curl tarballs) are mostly done without error checks &#8211; for the sole purpose of making them smaller and easier to read as that removes code that isn&#8217;t strictly about libcurl.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">3. Forgetting the verbose option<\/h2>\n\n\n\n<p><code><a href=\"https:\/\/curl.se\/libcurl\/c\/CURLOPT_VERBOSE.html\">CURLOPT_VERBOSE<\/a><\/code> is the libcurl user&#8217;s best friend. Whenever your transfer fails or somehow doesn&#8217;t do what you expected it to, switching on verbose mode should be one of the first actions as it often gives you a lot of clues about what&#8217;s going on under the hood.<\/p>\n\n\n\n<p>Of course, you can also go further and use <code><a href=\"https:\/\/curl.se\/libcurl\/c\/CURLOPT_DEBUGFUNCTION.html\">CURLOPT_DEBUGFUNCTION<\/a><\/code> to get every more details, but usually you can save that for the more complicated issues.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">4. There&#8217;s a global init function<\/h2>\n\n\n\n<p>You really should call <code><a href=\"https:\/\/curl.se\/libcurl\/c\/curl_global_init.html\">curl_global_init()<\/a><\/code> expclicitly  and early on and understand that it isn&#8217;t thread-safe. (<a href=\"https:\/\/daniel.haxx.se\/blog\/2020\/03\/01\/imagining-a-thread-safe-curl_global_init\/\">We&#8217;re working on that<\/a>.)<\/p>\n\n\n\n<p>libcurl will detect if you missed to call it, and then call it itself, but that&#8217;s not a practice we recommend since then you&#8217;ll have a harder time to do it thread-safe.<\/p>\n\n\n\n<p>And there&#8217;s a corresponding <code><a href=\"https:\/\/curl.se\/libcurl\/c\/curl_global_cleanup.html\">curl_global_cleanup()<\/a><\/code> to call when all your libcurl use is done.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">5. Consider the redirects<\/h2>\n\n\n\n<pre class=\"wp-block-preformatted\">HTTP\/1.1 301 Moved Permanently<br>Server: M4gic server\/3000<br>Retry-After: 0<br>Location: https:\/\/curl.se\/<br>Content-Length: 0<br>Accept-Ranges: bytes<br>Date: Thu, 07 May 2020 08:59:56 GMT<br>Connection: close<\/pre>\n\n\n\n<p>When you let libcurl handle redirects, consider limiting <em>to<\/em> <span style=\"text-decoration: underline;\"><\/span>what protocols you should allow redirects (<code><a href=\"https:\/\/curl.se\/libcurl\/c\/CURLOPT_REDIR_PROTOCOLS.html\">CURLOPT_REDIR_PROTOCOLS<\/a><\/code>), and of course you must remember that crafty users will figure out ways to redirect responses to potentially malicious servers given the chance.<\/p>\n\n\n\n<p><strong>Do not<\/strong> set custom HTTP methods on requests that follow redirects.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">6. Let users set (parts of) the URL<\/h2>\n\n\n\n<p>Don&#8217;t do that. Unless you have considered the consequences and make sure you deal with them appropriately.<\/p>\n\n\n\n<p>If you really insist that you need to let your users set the URL, restrict and carefully filter exact what parts and with what they can change it to.<\/p>\n\n\n\n<p>The reason is of course that libcurl often supports other protocols than the one(s) you had in mind when you write your application. And users can do other crafty things to make host names point to other servers (which of course TLS based protocols will reject), abuse free-form URL input fields to pass on unexpected data (sometimes including newlines and other creative things) to your servers or have your application talk to malicious servers.<\/p>\n\n\n\n<p>You can limit what protocols your application supports with <code><a href=\"https:\/\/curl.se\/libcurl\/c\/CURLOPT_PROTOCOLS.html\">CURLOPT_PROTOCOLS<\/a><\/code> and you can parse URLs with the <a href=\"https:\/\/curl.se\/libcurl\/c\/curl_url_set.html\">curl_url_set()<\/a> function family before you pass them to curl to make sure given URLs make sense!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">7. Setting HTTP method<\/h2>\n\n\n\n<p>Setting the custom HTTP request method with <code><a href=\"https:\/\/curl.se\/libcurl\/c\/CURLOPT_CUSTOMREQUEST.html\">CURLOPT_CUSTOMREQUEST<\/a><\/code> is most often done completely unnecessary, frequently causing problems and only very rarely actually done correctly.<\/p>\n\n\n\n<p>The primarily problems with setting this option are:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>if you also ask libcurl to follow redirects, this custom method will be used in follow-up requests as well, even if the server indicate wanting a different one in the HTTP response code<\/li><li>it doesn&#8217;t actually change libcurl&#8217;s behavior or expectations, it only changes the string libcurl sends in the request.<\/li><\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">8. Disabled certificate checks<\/h2>\n\n\n\n<p>libcurl allows applications to disable TLS certificate checks with the two options <code><a href=\"https:\/\/curl.se\/libcurl\/c\/CURLOPT_SSL_VERIFYPEER.html\">CURLOPT_SSL_VERIFYPEER<\/a><\/code> and <code><a href=\"https:\/\/curl.se\/libcurl\/c\/CURLOPT_SSL_VERIFYHOST.html\">CURLOPT_SSL_VERIFYHOST<\/a><\/code>. This is powerful and at times very handy while developing and\/or experimenting. It is also a very bad thing to ship in your product or deploy in your live service. <\/p>\n\n\n\n<p><em><strong>Disabling the certificate check effectively removes the TLS protection from the connections!<\/strong><\/em><\/p>\n\n\n\n<p>Searching for these option names using source code search engines or just on github will show you hundreds or thousands of applications that leave these checks disabled. Don&#8217;t be like them!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">9. Assume zero terminated data in callbacks<\/h2>\n\n\n\n<p>libcurl has a series of different callbacks in its API. Some of these callbacks delivers data to the application and that data is then typically offered with a pointer and a size of that data.<\/p>\n\n\n\n<p>The documentation very clearly stipulates that this data <em>is not zero terminated<\/em> &#8211; you cannot and should not use C functions on the data  that works on &#8220;C strings&#8221; (that assume a terminating, trailing, zero byte). It seems especially common when the data that is delivered is something like HTTP headers, which is text based data and seems to lure people into assuming a zero terminator.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">10. C++ strings are not C strings<\/h2>\n\n\n\n<p>libcurl is a C library with a C API for maximum portability and availability, yet a large portion of libcurl users are actually writing their programs in C++.<\/p>\n\n\n\n<p>This is not a problem. You can use the libcurl API perfectly fine from C++.<\/p>\n\n\n\n<p>Passing &#8220;strings&#8221; to libcurl must however be done with the C approach: you pass a pointer to a zero terminated buffer. If you pass a reference to a C++ string object, libcurl will not know what it is and it will not get or use the string correctly. It will fail in mysterious ways!<\/p>\n\n\n\n<p>Something like this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/ Keep the URL as a C++ string object\nstd::string str(\"https:\/\/example.com\/\");\n\n\/\/ Pass it to curl as a C string!\ncurl_easy_setopt(curl, CURLOPT_URL, str.c_str());<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">11. Threading mistakes<\/h2>\n\n\n\n<p>libcurl is <a href=\"https:\/\/curl.se\/libcurl\/c\/threadsafe.html\">thread-safe<\/a>, but there are some basic rules and limitations that you need to follow and adhere to, as detailed in the document linked to:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li><a href=\"https:\/\/daniel.haxx.se\/blog\/2020\/03\/01\/imagining-a-thread-safe-curl_global_init\/\">curl_global_init is not thread-safe<\/a><\/li><li>you must not use any libcurl handle concurrently<\/li><li>if you use older TLS libraries, you must setup mutex locks<\/li><\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">12. Understanding CURLOPT_NOSIGNAL<\/h2>\n\n\n\n<p>Signals is a Unix concept where an asynchronous notification is sent to a process or to a specific thread within the same process in order to notify it of an event that occurred. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What does libcurl use signals for?<\/h3>\n\n\n\n<p>When using the synchronous name resolver, libcurl uses alarm() to abort slow name resolves (if a timeout is set), which ultimately sends a SIGALARM to the process and is caught by libcurl<\/p>\n\n\n\n<p>By default, libcurl installs its own sighandler while running, and restores the original one again on return \u2013 for SIGALARM and SIGPIPE.<\/p>\n\n\n\n<p>Closing TLS (with OpenSSL etc) can trigger a SIGPIPE if the connection is dead.<\/p>\n\n\n\n<p><strong>Unless CURLOPT_NOSIGNAL is set!<\/strong> (default)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What does CURLOPT_NOSIGNAL do?<\/h3>\n\n\n\n<p>It prevents libcurl from triggering signals<\/p>\n\n\n\n<p>When disabled, it prevents libcurl from installing its own sighandler and&#8230;<\/p>\n\n\n\n<p>Generated signals must then be handled by the libcurl-using application itself<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">13. Forgetting -DCURL_STATICLIB<\/h2>\n\n\n\n<p>Creating and using libcurl statically is easy and convenient and seems especially popular on Windows<\/p>\n\n\n\n<p>It requires the <code>CURL_STATICLIB<\/code> define to be set when building your application! This is a little unusual requirement and pattern which is probably why people often miss this.<\/p>\n\n\n\n<p>Omission to use that define causes linker errors:<br>&#8220;<code>unknown symbol __imp__curl_easy_init<\/code>\u201d<\/p>\n\n\n\n<p>This requirement is present because Windows need <code>__declspec<\/code> to be present or absent in the headers depending on how it links.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Static builds mean chasing deps<\/h3>\n\n\n\n<p>libcurl can use <em>many<\/em> 3rd party dependencies<\/p>\n\n\n\n<p>When linking statically, all those need to be provided to the linker, so the curl build scripts (as well as your application linking) usually need manual help to find them all<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">14. C++ methods<\/h2>\n\n\n\n<p>C++ class methods look very much  like functions, but <strong>C++ class methods cannot be used as callbacks with libcurl<\/strong><\/p>\n\n\n\n<p>\u2026 since they assume a \u2018this\u2019 pointer to the current object and a C program doesn&#8217;t pass on such a pointer.<\/p>\n\n\n\n<p>Static class member functions work though. You can thus work around this limitation with a trick like this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code><span style=\"color:#a30003\" class=\"has-inline-color\">\/\/ f is the pointer to your object.<\/span>\nstatic size_t YourClass::func(void *buffer, size_t <\/code>sz, size_t n, void *f)<code>{\n<span style=\"color:#a30003\" class=\"has-inline-color\">  \/\/ Call non-static member function.<\/span>\n  static_cast(f)-&gt;nonStaticFunction();<\/code>}<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code><code><span style=\"color:#a30003\" class=\"has-inline-color\">\/\/ This is how you pass pointer to the static function:<\/span>\ncurl_easy_setopt(hcurl, CURLOPT_XFERINFOFUNCTION, YourClass::func);\ncurl_easy_setopt(hcurl, CURLOPT_XEFRINFODATA, this);<\/code><\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">15.Write callback invokes<\/h2>\n\n\n\n<p>Data is delivered from libcurl to the callback <a href=\"https:\/\/curl.se\/libcurl\/c\/CURLOPT_WRITEFUNCTION.html\">CURLOPT_WRITEFUNCTION<\/a><\/p>\n\n\n\n<p>This callback might be called none, one, two or many times. Never assume you will get a certain amount of calls. The number of invokes is independent of the data amount and vary rather because of network, server, kernel or other reasons. Don&#8217;t assume the same invocation pattern will repeat!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve been traveling this road for a while. Here&#8217;s my collection of 15 of the most common mistakes and issues people will run into when writing applications and services that use libcurl. I&#8217;ve also done recorded presentations on this topic that you can watch if you prefer that medium. Most of these issues are shared &hellip; <a href=\"https:\/\/daniel.haxx.se\/blog\/2021\/09\/27\/common-mistakes-when-using-libcurl\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Common mistakes when using libcurl<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":5,"featured_media":17350,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[33],"class_list":["post-13311","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-curl","tag-curl-and-libcurl"],"_links":{"self":[{"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/posts\/13311","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/comments?post=13311"}],"version-history":[{"count":64,"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/posts\/13311\/revisions"}],"predecessor-version":[{"id":17425,"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/posts\/13311\/revisions\/17425"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/media\/17350"}],"wp:attachment":[{"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/media?parent=13311"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/categories?post=13311"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/tags?post=13311"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}