I’ve been traveling this road for a while. Here’s my collection of 15 of the most common mistakes and issues people will run into when writing applications and services that use libcurl. I’ve also done recorded presentations on this topic that you can watch if you prefer that medium.
Most of these issues are shared among application authors independently of what language the program is written in – as libcurl bindings tend to be very thin and more or less expose the API in the same way the C API does. Some mistakes are however C and C++ specific.
1. Skipping the documentation!
Nothing in my list here is magic, hidden or unknown. Everything is documented and well-known. The by far most common mistakes are done by people not reading up, rushing a bit too fast and sometimes making a little too many assumptions. Of course there’s also occasional copy-and-pasting from bad examples going on. The web is full of questionable source snippets to get inspiration from.
We spend a significant amount of time and energy on making sure the documentation is accurate, detailed and thorough. Many mistakes can be avoided by simply reading up a little more first!
All the several hundred man pages and more are available in the libcurl section of the curl web site.
2. Failure to check return codes
This sounds like such an obvious thing but we keep seeing this happen over and over again: users write code that uses libcurl functions but they don’t check the return codes.
If libcurl detects an error it will return an error code. Whenever libcurl doesn’t do what you expected it to do, it very often turns out to have returned an error code to the application that explains the behavior. We work hard at making sure libcurl functions return the correct return codes!
The libcurl examples we host on the curl web site (and ship in curl tarballs) are mostly done without error checks – for the sole purpose of making them smaller and easier to read as that removes code that isn’t strictly about libcurl.
3. Forgetting the verbose option
CURLOPT_VERBOSE
is the libcurl user’s best friend. Whenever your transfer fails or somehow doesn’t do what you expected it to, switching on verbose mode should be one of the first actions as it often gives you a lot of clues about what’s going on under the hood.
Of course, you can also go further and use CURLOPT_DEBUGFUNCTION
to get every more details, but usually you can save that for the more complicated issues.
4. There’s a global init function
You really should call curl_global_init()
expclicitly and early on and understand that it isn’t thread-safe. (We’re working on that.)
libcurl will detect if you missed to call it, and then call it itself, but that’s not a practice we recommend since then you’ll have a harder time to do it thread-safe.
And there’s a corresponding curl_global_cleanup()
to call when all your libcurl use is done.
5. Consider the redirects
HTTP/1.1 301 Moved Permanently
Server: M4gic server/3000
Retry-After: 0
Location: https://curl.se/
Content-Length: 0
Accept-Ranges: bytes
Date: Thu, 07 May 2020 08:59:56 GMT
Connection: close
When you let libcurl handle redirects, consider limiting to what protocols you should allow redirects (CURLOPT_REDIR_PROTOCOLS
), and of course you must remember that crafty users will figure out ways to redirect responses to potentially malicious servers given the chance.
Do not set custom HTTP methods on requests that follow redirects.
6. Let users set (parts of) the URL
Don’t do that. Unless you have considered the consequences and make sure you deal with them appropriately.
If you really insist that you need to let your users set the URL, restrict and carefully filter exact what parts and with what they can change it to.
The reason is of course that libcurl often supports other protocols than the one(s) you had in mind when you write your application. And users can do other crafty things to make host names point to other servers (which of course TLS based protocols will reject), abuse free-form URL input fields to pass on unexpected data (sometimes including newlines and other creative things) to your servers or have your application talk to malicious servers.
You can limit what protocols your application supports with CURLOPT_PROTOCOLS
and you can parse URLs with the curl_url_set() function family before you pass them to curl to make sure given URLs make sense!
7. Setting HTTP method
Setting the custom HTTP request method with CURLOPT_CUSTOMREQUEST
is most often done completely unnecessary, frequently causing problems and only very rarely actually done correctly.
The primarily problems with setting this option are:
- if you also ask libcurl to follow redirects, this custom method will be used in follow-up requests as well, even if the server indicate wanting a different one in the HTTP response code
- it doesn’t actually change libcurl’s behavior or expectations, it only changes the string libcurl sends in the request.
8. Disabled certificate checks
libcurl allows applications to disable TLS certificate checks with the two options CURLOPT_SSL_VERIFYPEER
and CURLOPT_SSL_VERIFYHOST
. This is powerful and at times very handy while developing and/or experimenting. It is also a very bad thing to ship in your product or deploy in your live service.
Disabling the certificate check effectively removes the TLS protection from the connections!
Searching for these option names using source code search engines or just on github will show you hundreds or thousands of applications that leave these checks disabled. Don’t be like them!
9. Assume zero terminated data in callbacks
libcurl has a series of different callbacks in its API. Some of these callbacks delivers data to the application and that data is then typically offered with a pointer and a size of that data.
The documentation very clearly stipulates that this data is not zero terminated – you cannot and should not use C functions on the data that works on “C strings” (that assume a terminating, trailing, zero byte). It seems especially common when the data that is delivered is something like HTTP headers, which is text based data and seems to lure people into assuming a zero terminator.
10. C++ strings are not C strings
libcurl is a C library with a C API for maximum portability and availability, yet a large portion of libcurl users are actually writing their programs in C++.
This is not a problem. You can use the libcurl API perfectly fine from C++.
Passing “strings” to libcurl must however be done with the C approach: you pass a pointer to a zero terminated buffer. If you pass a reference to a C++ string object, libcurl will not know what it is and it will not get or use the string correctly. It will fail in mysterious ways!
Something like this:
// Keep the URL as a C++ string object
std::string str("https://example.com/");
// Pass it to curl as a C string!
curl_easy_setopt(curl, CURLOPT_URL, str.c_str());
11. Threading mistakes
libcurl is thread-safe, but there are some basic rules and limitations that you need to follow and adhere to, as detailed in the document linked to:
- curl_global_init is not thread-safe
- you must not use any libcurl handle concurrently
- if you use older TLS libraries, you must setup mutex locks
12. Understanding CURLOPT_NOSIGNAL
Signals is a Unix concept where an asynchronous notification is sent to a process or to a specific thread within the same process in order to notify it of an event that occurred.
What does libcurl use signals for?
When using the synchronous name resolver, libcurl uses alarm() to abort slow name resolves (if a timeout is set), which ultimately sends a SIGALARM to the process and is caught by libcurl
By default, libcurl installs its own sighandler while running, and restores the original one again on return – for SIGALARM and SIGPIPE.
Closing TLS (with OpenSSL etc) can trigger a SIGPIPE if the connection is dead.
Unless CURLOPT_NOSIGNAL is set! (default)
What does CURLOPT_NOSIGNAL do?
It prevents libcurl from triggering signals
When disabled, it prevents libcurl from installing its own sighandler and…
Generated signals must then be handled by the libcurl-using application itself
13. Forgetting -DCURL_STATICLIB
Creating and using libcurl statically is easy and convenient and seems especially popular on Windows
It requires the CURL_STATICLIB
define to be set when building your application! This is a little unusual requirement and pattern which is probably why people often miss this.
Omission to use that define causes linker errors:
“unknown symbol __imp__curl_easy_init
”
This requirement is present because Windows need __declspec
to be present or absent in the headers depending on how it links.
Static builds mean chasing deps
libcurl can use many 3rd party dependencies
When linking statically, all those need to be provided to the linker, so the curl build scripts (as well as your application linking) usually need manual help to find them all
14. C++ methods
C++ class methods look very much like functions, but C++ class methods cannot be used as callbacks with libcurl
… since they assume a ‘this’ pointer to the current object and a C program doesn’t pass on such a pointer.
Static class member functions work though. You can thus work around this limitation with a trick like this:
// f is the pointer to your object.
static size_t YourClass::func(void *buffer, size_t
sz, size_t n, void *f){
// Call non-static member function.
static_cast(f)->nonStaticFunction();
}
// This is how you pass pointer to the static function:
curl_easy_setopt(hcurl, CURLOPT_XFERINFOFUNCTION, YourClass::func);
curl_easy_setopt(hcurl, CURLOPT_XEFRINFODATA, this);
15.Write callback invokes
Data is delivered from libcurl to the callback CURLOPT_WRITEFUNCTION
This callback might be called none, one, two or many times. Never assume you will get a certain amount of calls. The number of invokes is independent of the data amount and vary rather because of network, server, kernel or other reasons. Don’t assume the same invocation pattern will repeat!
I’m using cURL to retreive RSS feeds in my RSS reader. And as there are many blogs with RSS feeds out there, there are also many server configurations.
Some refuse connexions if you don’t have an user-agent. Other encode their response with GZip, others have a 301 or 302 redirections, or SSL errors…
As for me, for a RSS feed, I use among other, those options :
? CURLOPT_FOLLOWLOCATION to TRUE (transparently follow 301 & 302)
– CURLOPT_USERAGENT to a string (set a personalized UA)
– CURLOPT_SSL_VERIFYPEER to FALSE
– CURLOPT_SSL_VERIFYHOST to FALSE (I might remove those, but before LetsEncrypt, many websites had autosigned certificates, that allowed encryption alone)
– CURLOPT_ENCODING to “gzip” (allowing Gziped data to be decoded — some website return gziped data no matter what, even if the browser does not support it)
With this, I can retreive almost anything from anywhere without to much errors.
I would strongly suggest you go back to mistake (8). Disabling certificate checks is a really bad idea.