Tag Archives: API

Making libcurl init more thread-safe

Twenty-one years ago, in May 2001 we introduced the global initialization function to libcurl version 7.8 called curl_global_init().

The main reason we needed this separate function to get called before anything else was used in libcurl, was that several of libcurl’s dependencies at the time (including OpenSSL and GnuTLS) had themselves thread-unsafe initialization procedures.

This rather lame characteristic found in several third party dependencies made the libcurl function inherit that property: not thread-safe. A nasty “feature” in a library that otherwise prides itself for being thread-safe and in many ways working at “it should”. A function that is specifically marked as thread unsafe was not good. Is not good.

Still, we were victims of circumstances and if these were the dependencies we were going to use, this is what we needed to do.

Occasionally, this limitation has poked people in the eye and really hurt them since it makes some use cases really difficult to realize.

Dependencies improved

Over the following decades, the dependencies libcurl use have almost all shaped up and removed the thread-unsafe property of their initialization procedures.

We also slowly cleaned away other code that happened to also fall into the init function out of laziness and convenience because it was there and could be used (or perhaps abused).

Eventually, we were basically masters of our own faith again. The closet was all cleared out and the scrubby leftovers we had sloppily left in there had been cleaned up and gotten converted to proper thread-safe code.

The task of finally making curl_global_init() thread-safe was brought up and attempted a little half-assed a few times but was never pulled through all the way.

The challenges always included that we want to avoid relying on thread library and that we are supporting building libcurl with C89 compilers etc.

Finally, the spring cleaning of 2022

Thanks to work spear-headed by Thomas Guillem who came bursting in with a clear use-case in mind where he felt he really need this to work, and voila now the next libcurl release (7.84.0) features a thread-safe init.

If configure finds support for _Atomic (a C11 feature) or it runs on a new enough Windows version (this should cover a vast amount of platforms), libcurl can now do its own spinlock implementation that makes the init function thread-safe and independent of threading libraries.

A headers API for libcurl

For many years we’ve had this outstanding idea to add a new API to libcurl that would offer applications easy access to HTTP response headers.

Applications could already retrieve the headers using existing methods but that requires them to write a callback and to a certain amount of parsing and “understanding” HTTP that we always felt was a little unfortunate, a bit error-prone on the behalf of the applications and perhaps also a thing that forced a lot of applications out there having to write the same kind of extra function logic.

If libcurl provides this functionality, it would remove a lot of (duplicated) code from a lot of applications.

Designing the API

We started this process a while ago when I first wrote down a basic approach to an API for this and sent it off to the curl-library mailing list for feedback and critique.

/* first take */
char *curl_easy_header(CURL *easy,
                       const char *name);

The conversation that followed that first plea for help, made me realize that my first proposal had been far too basic and it wouldn’t at all work to satisfy the needs and use cases we could think of for this API.

Try again

I went back to mull over what I’ve learned and update my design proposal, trying to take the feedback into account in the best possible way. A few weeks later, I returned with a “proposal v2” and again I asked for comments and opinions on what I had put together.

/* second shot */
CURLHcode curl_easy_header(CURL *easy,
                           const char *name,
                           size_t index,
                           struct curl_header **h);

As I had already adjusted the API from feedback the first time around, the feedback this time was perhaps not calling for as big changes or radical differences as they did the first time around. I could adapt my proposal to what people asked and suggested. We arrived at something that seemed like a pretty solid API for offering HTTP headers to applications.

Let’s do this

As the API proposal feedback settled down and the interface felt good and sensible, I decided it was time for me to write up a first implementation so that we can offer code to people to give everyone a chance to try out the API in real life as well. There’s one thing to give feedback on a “paper product”, actually being able to use it and try it in an application is way better. I dove in.

The final take

When the code worked to the level that I started to be able to extract the first headers with the API, it proved to that we needed to adjust the API a little more, so I did. I then ran into more questions and thoughts about specifics that we hadn’t yet dealt with or nailed proper in the discussions up to that point and I took some questions back to the curl community. This became an iterative process and we smoothed out questions about how access different header “sources” as well as how to deal with multiple headers and “request sequences”. All supported now.

/* final version */
CURLHcode curl_easy_header(CURL *easy,
                           const char *name,
                           size_t index,
                           unsigned int origin,
                           int request,
                           struct curl_header **h);

Multiple headers

This API allows applications to extract all headers from a previous transfer. It can get one or many headers when there are duplicated ones, like Set-Cookie: commonly arrive as.

Sources

The application can ask for “normal” headers, for trailers (that arrive after the body), headers associated with the CONNECT request (if such a one was performed), pseudo headers (that might arrive when HTTP/2 and HTTP/3 is used) or headers associated with a HTTP 1xx “intermediate” response.

Multiple responses

The libcurl APIs typically work on transfers, which means that a single transfer may end up doing multiple transfers, multiple HTTP requests. Primarily when redirects are followed but it can also be due to other reasons. This header API therefore allows the caller to extract headers from the entire “chain” of requests a previous transfer was made with.

EXPERIMENTAL

This API is initially merged (in this commit) labeled “experimental” to be included in the upcoming 7.83.0 release. The experimental label means a few different things to us:

  • The API is disabled by default in the build and you need to explicitly ask for it with --enable-headers-api when you run configure
  • There are no ABI and API promises for these functions yet. We might change the functions based on feedback before we remove the label.
  • We strongly discourage anyone from shipping experimentally labeled functions in production.
  • We rely on people to enable and test this and provide feedback, to give us confidence enough to remove the experimental label as soon as possible.

We use the experimental “route” to lower the bar for merging new stuff, so that we get some extra chances to fix up mistakes before the rules and API are carved in stone and we are set to support that for a life time.

This setup relies on users actually trying out the experimental stuff as otherwise it isn’t method for improving the API, it will only delay the introduction of it to the general public. And it risks becoming be less good.

Documentation

The two new functions have detailed man pages: curl_easy_header and curl_easy_nextheader. If there is anything missing on unclear in there, let us know!

I have also created an initial example source snippet showing header API use. See headerapi.c.

This API deserves its own little section in the everything curl book, but I think I will wait for it to get landed “for real” before I work on adding that.

You wanted WebSockets?

WebSockets has been one of the most requested features and protocol to add to curl and libcurl in the annual user survey. Repeatedly, over the last few years.

WebSockets is not perfectly suitable to be done by libcurl since it’s not really an upload or download transfer protocol, but is more something like “a TCP for JavaScript”. It provides a bidirectional data stream over HTTP. (I was there when it was created, first mentioned on my blog here.)

Ignoring that technicality, WebSockets is often used more or less for a one-directional data stream. Commonly together with the use of other protocols that curl already supports. If libcurl would support it, there will be plenty of applications out there that could simplify their code.

Today, users use a mix of libcurl, custom code on top or “over” libcurl and other WebSockets libraries. There’s no single de-facto way or practice to do WebSockets with libcurl.

WebSockets for libcurl?

I took the topic of drafting a WebSockets API for libcurl to the libcurl mailing list a while ago and after a lot of back and forths and feedback from multiple people, we have a decent beginning of a WebSockets API that might work jotted down.

This is just a potential API described in a document. How it could be made to work. Nobody has actually implemented any of it.

Implement?

We know users ask for WebSockets, repeatedly and several people helped contributing to the tentative API design.

It’s just that this time I decided to pause and see if I couldn’t get some help in implementing this. To create a team of implementers willing to work before I dive in, alternatively to find someone who’d sponsor this work to allow me to spend more and dedicated time on it. I decided to do this, because I already have a lot of other things on my plate and I have to focus on my paying (curl) customers. I estimate that implementing WebSockets support is quite a lot of work.

If nobody is willing to put in the work or money to make it happen, then maybe that’s rather clear message that this is not a feature that is meant to be provided by curl. At least not now.

WebSockets future

WebSockets was created in the HTTP/1.1 era, and is probably still mostly done using that protocol as bootstrap. There are indications hinting that the future might hold less WebSockets.

It took a long time but eventually a way to do WebSockets over HTTP/2 was provided via RFC 8441, “Bootstrapping WebSockets with HTTP/2”, published in 2018. This allows a WebSockets connection to be done over a single HTTP/2 stream.

The next evolutionary step seems to rather be WebTransport. It is a new take and protocol and is meant to be used over HTTP/3 and QUIC. It is described to “send data to and receive data from servers. It can be used like WebSockets but with support for multiple streams, unidirectional streams, out-of-order delivery, and reliable as well as unreliable transport.”

Credits

Image by pisauikan from Pixabay

Heading towards curl eight

There’s plan for version 8 being forged! Let me just take you back a bit in time first..

The early days

When we first created libcurl, we bumped the major version number of the project from the previous version 6 to version 7. In late summer of 2000 we shipped curl and libcurl 7.1 as the first ever release that featured a separate library for Internet transfer powers. Everything before version 7 was just the command line tool, curl. That was the moment in time we decided we should leave kindergarten and we were ready to take on some tougher loads.

I had the main approach to the API worked out already from the start. It would be transfer-oriented, it would be built up around URLs and shouldn’t necessarily require that the users are themselves protocol experts to use it. I used inspiration from ioctl and fcntl when I made curl_easy_setopt and curl_easy_getinfo. They’re fixed functions with flexible arguments. The idea was that by doing that, we wouldn’t have to add new functions for new features but we would “just” add options and new options would simply just not work with older libcurls.

The first API we shipped also only provided synchronous single transfers. The “easy” interface.

I had no anticipations or particular hopes on the library then. It would be cool if someone found good use for it and it would be even cooler if someone would help out to improve it further.

It grew, I learned

I had never before developed and shipped a library for the world to use. I hadn’t really fully grasped and considered the impact of APIs and ABI stability etc.

We gradually improved the library over time. We bumped the SONAME several times in the first few years as we modified internals. In the same time the library caught on a bigger and bigger audience and in September 2006 as I ripped out code for what is commonly referred to as “FTP third party transfers” I once again bumped the ABI number (to 4) since the older libcurl was no longer compatible with the new release.

People don’t like SONAME bumps

That bump was met with quite a lot of resistance and objections among users. Changing SONAME of a widely used library it turns out causes a lot of pain, agony and squeaking. Possibly this was one of the earlier signs that libcurl had grown up and I decided that we should try to avoid going through this again. We shall not break ABI compatibility again. Ever.

No bumps, no worries

In this world with no SONAME bumps I wanted to keep that solidity visible in the major number of the project so even though the version number of the releases aren’t strictly related to the SONAME we kept shipping curl version seven. In September 2021, we reach curl 7.79.0.

We’ve managed to stick to our goals and a binary libcurl using application built after September 2006 can run with the latest libcurl with no modification needed! This is one of the biggest edges and “selling points” we have in libcurl: We take compatibility and unmodified behavior very seriously.

In 2013 I even wrote blog post emphasizing this and in there I said there won’t be a curl version 8 “in a long time”. But read on, things have changed a little!

An ever-growing minor number

We bump the minor version number every time we “change” something in the project, or add features. If we only do bug-fixes we only bump the patch number. You know, in classic “major,minor.patch” style.

As we do curl releases at least every 8 weeks and most releases have changes added, we bump the minor number very frequently, up to 6 times a year or so.

We provide version number information for libcurl provided as a 24 bit number, using 8 bits for each field. This implies that none of the numbers can ever go above 255. We can’t ship a 7.256.0 for this practical reason.

But also, from what I’ve seen people do with and think about version numbers before, I’m concerned that increasing the minor number beyond 99 will cause confusions. Version 7.100.0 risks gettinged confused and mixed up with 7.1 or 7.10.0, two versions that are terribly old. And frankly that’s a very large minor number and it starts becoming many digits in that release version.

There exists a solution to this!

Reset minor, bump major, keep SONAME

The idea is simply to do “a Linux kernel”. We change to version 8.0.0 at a given point in time, but we stick to the same SONAME as before.

We don’t break any compatibility, there will no no API or functionality cleanups. There will just be a version number bump to lower the minor number and let us start over that journey. Reset the counter so to speak.

When do we do this? We have roughly 20 releases left before the minor counter can reach 100, 20 releases take at least 6*20 = 120 weeks. 120 weeks is 2.3 years.

Another event within this period

On March 20, 2023. About 18 months into the future, the curl project turns 25 years old. Here’s a golden opportunity! Let’s top off the 25 year celebrations with a major version number bump!

The plan

Independently of what version number we have reached to at that point, independently of if we add features or not in that release, independently of exactly how that date fits within the pre-determined release cycle and without changing any APIs, we ship curl version 8.0.0 on that day.

Turning 25 and bumping the major version number on the same day should be fun.

Version 8

I hope that a small side-effect of bumping the major number will make users still left on version 7 to slightly faster feel outdated and push for getting up to 8. It could work as a minor push to get users to catch up a bit.

The minor number will of course immediately start “climbing” again and in a worst-case scenario, we risk reaching minor number 100 again within another 17 years. Maybe we can plan another bump for the 40th birthday?

Imagining a thread-safe curl_global_init

libcurl is thread-safe

That’s the primary message that we push and that’s important to remember. You can write a multi-threaded application that does concurrent Internet transfers with libcurl in as many threads as you like and they fly just fine.

But

But there are nuances and details of course and the devil is always in those. The main obstacle that then and again causes problems for users is the curl_global_init() function. But how come?

curl_global_init

Back in the day, libcurl developers realized that when we work with in particular a lot of third party TLS libraries, they feature init functions that need to be called first, before any other function in those libraries are called. And they typically are all marked as not thread-safe, we have to call those functions knowing that no other thread calls them. This was the case for GnuTLS (before version 3.3.0) and it was the case for OpenSSL (until they shipped version 1.1.1) etc.

In order for libcurl to adhere to those restrictions that weren’t our own inventions, we added a function to libcurl called curl_global_init() that then in itself inherited those non-thread safe characteristics. We documented the function as not thread-safe.

Time passed, and as we now had a function that is a global initialization function that is also marked not thread-safe, it was an attractive point to add more and other functionality for the library. Other global initializations that then weren’t thread-safe either – as that wasn’t any point in doing anyway since the entire thing wasn’t thread-safe to begin with.

The problems

Having the global init function not being thread-safe has caused problems to users, mostly in use cases where for example they use libcurl in a plugin-like cases where you can’t know if you’re the only user in the process.

We’ve then mostly been longing for better days and blamed the third party libraries that forced us into this corner.

Third parties shaped up, we didn’t

One day in recent times when we looked at what third party libraries a typical libcurl user uses in a modern system, we see that they’ve all fixed their init functions! OpenSSL and GnuTLS that once were part of the original reasons for this function have fixed their issues. They no longer have thread-unsafe init functions.

But libcurl still does! 🙁

While we were initially pushed into this unfortunate corner because of limitations in third party libraries, we had added our own init functions into that function that aren’t thread-safe and now, even though the third party libraries had done the right thing over time, we found ourselves no longer able to put the blame on others. Now we need to clean up our own backyard!

Fix it!

In libcurl 7.69.0 we’ve started this journey with two distinct changes. The goal is to make the function thread-safe under the condition that libcurl is built with only thread-safe dependencies, and we should make configure etc check if that’s the case.

1: EINTR handling

Since libcurl 7.30.0, we’ve provided a flag in the curl_global_init() function to let libcurl users ask for EINTR to actually abort internal loops. Starting now, that flag has no meaning and this is now default behavior. No need to store this state globally anywhere.

2: Working IPv6

At least in the past, it has been common with systems that are IPv6-capable at build-times but that can’t actually create IPv6 sockets and therefor they can’t actually use IPv6. This was previously checked for, once, in the global init and then IPv6 is disabled for everyone. Without a global state, we’ve been forced to move this check and it is now instead done for every created multi handle. A minuscule performance hit for thread safety.

Left to do until completely thread-safe

The transition isn’t completed. The low hanging fruit has been picked, here are some remaining issues to solve:

When is it thread-safe?

Since curl can be built with a number of different third party libraries, including version old versions, we need to make the configure script know what versions of what libraries that are safe so that it can tell. But how are libcurl application authors supposed to know? Can we figure how a way to tell them?

curl_version*

Both curl_version() and curl_version_info() store information in static buffers and return information pointing to that memory. They’re currently setup in the global init so they work safely from multiple threads today, but we probably need to create new, alternative versions of them, that instead allocate heap memory to return the info in. Or possibly store the info in memory associated with a handle.

Update: Patrick Monnerat made me realize that a possibly even better way to fix them is to make sure they generate the same output in a way that repeated or concurrent invokes are fine.

Reference counter

There’s a counter counting calls to curl_global_init() so that the corresponding number of calls to curl_global_cleanup() is required before things are actually cleaned up.

This is a hard nut to crack without a global context and no mutex locks. I haven’t yet figured out how to solve this. If you have ideas, I’m listening!

When?

There’s no fixed time schedule for when these remaining nits are supposed to be fixed, but I hope to work on them going forward and I will appreciate all the help I can get and if things just progress, I would imagine we can end 2020 with a libcurl with these flaws fixed!

Oh, and we also really need to make sure that we don’t simultaneously come up with or think of new thread unsafe functionality for the init function..

Credits

Top image by Andreas Lischka from Pixabay

This is your wake up curl

curl_multi_wakeup() is a new friend in the libcurl API family. It will show up for the first time in the upcoming 7.68.0 release, scheduled to happen on January 8th 2020.

Sleeping is important

One of the core functionalities in libcurl is the ability to do multiple parallel transfers in the same thread. You then create and add a number of transfers to a multi handle. Anyway, I won’t explain the entire API here but the gist of where I’m going with this is that you’ll most likely sooner or later end up calling the curl_multi_poll() function which asks libcurl to wait for activity on any of the involved transfers – or sleep and don’t return for the next N milliseconds.

Calling this waiting function (or using the older curl_multi_wait() or even doing a select() or poll() call “manually”) is crucial for a well-behaving program. It is important to let the code go to sleep like this when there’s nothing to do and have the system wake up it up again when it needs to do work. Failing to do this correctly, risk having libcurl instead busy-loop somewhere and that can make your application use 100% CPU during periods. That’s terribly unnecessary and bad for multiple reasons.

Wakey wakey

When your application calls libcurl to say “sleep for a second or until something happens on these N transfers” and something happens and the application for example needs to shut down immediately, users have been asking for a way to do a wake up call.

– Hey libcurl, wake up and return early from the poll function!

You could achieve this previously as well, but then it required you to write quite a lot of extra code, plus it would have to be done carefully if you wanted it to work cross-platform etc. Now, libcurl will provide this utility function for you out of the box!

curl_multi_wakeup()

This function explicitly makes a curl_multi_poll() function return immediately. It is designed to be possible to use from a different thread. You will love it!

curl_multi_poll()

This is the only call that can be woken up like this. Old timers may recognize that this is also a fairly new function call. We introduced it in 7.66.0 back in September 2019. This function is very similar to the older curl_multi_wait() function but has some minor behavior differences that also allow us to introduce this new wakeup ability.

Credits

This function was brought to us by the awesome Gergely Nagy.

Top image by Wokandapix from Pixabay

The future of HTTP Symposium

This year’s version of curl up started a little differently: With an afternoon of HTTP presentations. The event took place the same week the IETF meeting has just ended here in Prague so we got the opportunity to invite people who possibly otherwise wouldn’t have been here… Of course this was only possible thanks to our awesome sponsors, visible in the image above!

Lukáš Linhart from Apiary started out with “Web APIs: The Past, The Present and The Future”. A journey trough XML-RPC, SOAP and more. One final conclusion might be that we’re not quite done yet…

James Fuller from MarkLogic talked about “The Defenestration of Hypermedia in HTTP”. How HTTP web technologies have changed over time while the HTTP paradigms have survived since a very long time.

I talked about DNS-over-HTTPS. A presentation similar to the one I did before at FOSDEM, but in a shorter time so I had to talk a little faster!

Mike Bishop from Akamai (editor of the HTTP/3 spec and a long time participant in the HTTPbis work) talked about “The evolution of HTTP (from HTTP/1 to HTTP/3)” from HTTP/0.9 to HTTP/3 and beyond.

Robin Marx then rounded off the series of presentations with his tongue in cheek “HTTP/3 (QUIC): too big to fail?!” where we provided a long list of challenges for QUIC and HTTP/3 to get deployed and become successful.

We ended this afternoon session with a casual Q&A session with all the presenters discussing various aspects of HTTP, the web, REST, APIs and the benefits and deployment challenges of QUIC.

I think most of us learned things this afternoon and we could leave the very elegant Charles University room enriched and with more food for thoughts about these technologies.

We ended the evening with snacks and drinks kindly provided by Apiary.

(This event was not streamed and not recorded on video, you had to be there in person to enjoy it.)


QUIC and missing APIs

I trust you’ve heard by now that HTTP/3 is coming. It is the next destined HTTP version, targeted to get published as an RFC in July 2019. Not very far off.

HTTP/3 will not be done over TCP. It will only be performed over QUIC, which is a transport protocol replacement for TCP that always is done encrypted. There’s no clear-text version of QUIC.

TLS 1.3

The encryption in QUIC is based on TLS 1.3 technologies which I believe everyone thinks is a good idea and generally the correct decision. We need to successively raise the bar as we move forward with protocols.

However, QUIC is not only a transport protocol that does encryption by itself while TLS is typically (and designed as) a protocol that is done on top of TCP, it was also designed by a team of engineers who came up with a design that requires APIs from the TLS layer that the traditional TLS over TCP use case doesn’t need!

New TLS APIs

A QUIC implementation needs to extract traffic secrets from the TLS connection and it needs to be able to read/write TLS messages directly – not using the TLS record layer. TLS records are what’s used when we send TLS over TCP. (This was discussed and decided back around the time for the QUIC interim in Kista.)

These operations need APIs that still are missing in for example the very popular OpenSSL library, but also in other commonly used ones like GnuTLS and libressl. And of course schannel and Secure Transport.

Libraries known to already have done the job and expose the necessary mechanisms include BoringSSL, NSS, quicly, PicoTLS and Minq. All of those are incidentally TLS libraries with a more limited number of application users and less mainstream. They’re also more or less developed by people who are also actively engaged in the QUIC protocol development.

The QUIC libraries in progress now are typically using either one of the TLS libraries that already are adapted or do what ngtcp2 does: it hosts a custom-patched version of OpenSSL that brings the needed functionality.

Matt Caswell of the OpenSSL development team acknowledged this situation already back in September 2017, but so far we haven’t seen this result in updated code shipped in a released version.

curl and QUIC

curl is TLS library agnostic and can get built with around 12 different TLS libraries – one or many actually, as you can build it to allow users to select TLS backend in run-time!

OpenSSL is without competition the most popular choice to build curl with outside of the proprietary operating systems like macOS and Windows 10. But even the vendor-build and provided mac and Windows versions are also built with libraries that lack APIs for this.

With our current keen interest in QUIC and HTTP/3 support for curl, we’re about to run into an interesting TLS situation. How exactly is someone going to build curl to simultaneously support both traditional TLS based protocols as well as QUIC going forward?

I don’t have a good answer to this yet. Right now (assuming we would have the code ready in our end, which we don’t), we can’t ship QUIC or HTTP/3 support enabled for curl built to use the most popular TLS libraries! Hopefully by the time we get our code in order, the situation has improved somewhat.

This will slow down QUIC deployment

I’m personally convinced that this little API problem will be friction enough when going forward that it will slow down and hinder QUIC deployment at least initially.

When the HTTP/2 spec shipped in May 2015, it introduced a dependency on the fairly new TLS extension called ALPN that for a long time caused head aches for server admins since ALPN wasn’t supported in the OpenSSL versions that was typically installed and used at the time, but you had to upgrade OpenSSL to version 1.0.2 to get that supported.

At that time, almost four years ago, OpenSSL 1.0.2 was already released and the problem was big enough to just upgrade to that. This time, the API we’re discussing here is not even in a beta version of OpenSSL and thus hasn’t been released in any version yet. That’s far worse than the HTTP/2 situation we had and that took a few years to ride out.

Will we get these APIs into an OpenSSL release to test before the QUIC specification is done? If the schedule sticks, there’s about six months left…

Reducing 2038-problems in curl

tldr: we’ve made curl handle dates beyond 2038 better on systems with 32 bit longs.

libcurl is very portable and is built and used on virtually all current widely used operating systems that run on 32bit or larger architectures (and on a fair amount of not so widely used ones as well).

This offers some challenges. Keeping the code stellar and working on as many platforms as possible at the same time is hard work.

How long is a long?

The C variable type “long” has existed since the dawn of time and used to be 32 bit big already back in the days most systems were 32 bits. With the introduction of 64 bit systems in the 1990s, something went wrong and when most operating systems went with 64 bit longs, some took the odd route and stuck with a 32 bit long… The windows world even chose to not support “long long” for 64 bit types but instead it insists on calling them “__int64”!

(Thankfully, ints have at least remained 32 bit!)

Two less clever API decisions

Back in the days when humans still lived in caves, we decided for the libcurl API to use ‘long’ for a whole range of function arguments. In hindsight, that was naive and not too bright. (I say “we” to make it less obvious that it of course is mostly me who’s to blame for this.)

Another less clever design idea was to use vararg functions to set (all) options. This is convenient in the way we have one function to set a huge amount of different option, but it is also quirky and error-prone because when you pass on a numeric expression in C it typically gets sent as an ‘int’ unless you tell it otherwise. So on systems with differently sized ints vs longs, it is destined to cause some mistakes that, thanks to use of varargs, the compiler can’t really help us detect! (We actually have a gcc-only hack that provides type-checking even for the varargs functions, but it is not portable.)

libcurl has both an option to pass time to libcurl using a long (CURLOPT_TIMEVALUE) and an option to extract a time from libcurl using a long (CURLINFO_FILETIME).

We stick to using our “not too bright” API for stability and compatibility. We deem it to be even more work and trouble for us and our users to change to another API rather than to work and live with the existing downsides.

Time may exist after 2038

There’s also this movement to transition the time_t variable type from 32 to 64 bit. time_t of course being the preferred type for C and C++ programs to store timestamps in. It is the number of seconds since January 1st, 1970. Sometimes called the unix epoch. A signed 32 bit time_t can be used to store timestamps with second accuracy from roughly 1903 to 2038. As more and more things will start to refer to dates after 2038, this is of course becoming a problem. We need to move to 64 bit time_t all over.

We’re now less than 20 years away from the signed 32bit tip-over point: 03:14:07 UTC, 19 January 2038.

To complicate matters even more, there are odd systems out there with unsigned time_t variables. Such systems then cannot easily refer to dates before 1970, but can instead hold dates up to the year 2106 even with just 32 bits. Oh and there are some systems with 64 bit long that feature a 32 bit time_t, and 32 bit systems with 64 bit time_t!

Most modern systems today have 64 bit time_t – including win64, and 64 bit time_t can handle dates up to about year 292,471,210,647.

int – long – time_t

  1. We cannot move data between ints and longs in the code and assume it doesn’t overflow
  2. We can’t move data losslessly between ints and time_t
  3. We must not move data between long and time_t

Recently we’ve been working on making sure we live up to these three rules in libcurl. One could say it was about time! (pun intended)

In particular number (3) has required us to add new entry points to the API so that even 32 bit long systems can set/read 64 bit time. Starting in libcurl 7.59.0, applications can pass 64 bit times to libcurl with CURLOPT_TIMEVALUE_LARGE and extract 64 bit times with CURLINFO_FILETIME_T. For compatibility reasons, the old versions will of course be kept around but newer applications should really consider the new options.

We also recently did an overhaul of our time and date parser (externally accessible as curl_getdate() ) which we learned erroneously used a ‘long’ in the calculation which made it not work proper beyond 2038 on systems with 32 bit longs. This fix will also ship in 7.59.0 (planned release date: March 21, 2018).

If you find anything in curl that doesn’t deal with times after 2038 correctly, please file a bug!

fcurl is fread and friends for URLs

This whole family of functions, fopen, fread, fwrite, fgets, fclose and more are defined in the C standard since C89. You can’t really call yourself a C programmer without knowing them and probably even using them in at least a few places.

The charm with these is that they’re standard, they’re easy to use and they’re available everywhere where there’s a C compiler.

A basic example that just reads a file from disk and writes it to stdout could look like this:

FILE *file;

file = fopen("hello.txt", "r");
if(file) {
  char buffer [256];
  while(1) {
    size_t rc = fread(buffer, sizeof(buffer),
                1, file);
    if(rc > 0)
      fwrite(buffer, rc, 1, stdout);
    else
      break;
  }
  fclose(file);
}

Imagine you’d like to switch this example, or one of your actual real world programs that use the fopen() family of functions to read or write files, and instead read and write files from and to the Internet instead using your favorite Internet protocols. How would you do that without having to change your code a lot and do a major refactoring job?

Enter fcurl

I’ve started to work on a library that provides a look-alike API with matching functions and behaviors, but that allows fopen() to instead specify a URL instead of a file name. I call it fcurl. (Much inspired by the libcurl example fopen.c, which I wrote the first version of already back in 2002!)

It is of course open source and is powered by libcurl.

The project is in its early infancy. I think it would be interesting to try it out and I’ve mentioned the idea to a few people that have shown interest. I really can’t make this happen all on my own anyway so while I’ve created a first embryo, it will take some time before it gets truly useful. Help from others would be greatly appreciated of course.

Using this API, a version of the above example that instead reads data from a HTTPS site instead of a local file could look like:

FCURL *file;

file = fcurl_open("https://daniel.haxx.se/",
                  "r");
if(file) {
  char buffer [256];
  while(1) {
    size_t rc = fcurl_read(buffer,         
                           sizeof(buffer), 1, 
                           file);
    if(rc > 0)
      fwrite(buffer, rc, 1, stdout);
    else
      break;
  }
  fcurl_close(file);
}

And it could even actually also read a local file using the file:// sheme.

Drop-in replacement

The idea here is to make the alternative functions have new names but as far as possible accept the same input arguments, return the same return codes and so on.

If we do it right, you could possibly even convert an existing program with just a set of #defines at the top without even having to change the code!

Something like this:

#define FILE FCURL
#define fopen(x,y) fcurl_open(x, y)
#define fclose(x) fcurl_close(x)

I think it is worth considering a way to provide an official macro set like that for those who’d like to switch easily (?) and quickly.

Fun things to consider

1. for non-scheme input, use normal fopen?

An interesting take is probably to make fcurl_open() treat input specified without a “scheme://” to be a local file, and then passed to fopen() instead under the hood. That would then enable even more code to switch to fcurl since all the existing use cases with local file names would just continue to work.

2. LD_PRELOAD

An interesting area of deeper research around this could be to provide a way to LD_PRELOAD replacements for the functions so that not even any source code would need be changed and already built existing binaries could be given this functionality.

3. fopencookie

There’s also the GNU libc’s fopencookie concept to figure out if that is something for fcurl to support/use. BSD and OS X have something similar called funopen.

4. merge in official libcurl

If this turns out useful, appreciated and good. We could consider moving the API in under the curl project’s umbrella and possibly eventually even making it part of the actual libcurl. But hey, we’re far away from that and I’m not saying that is even the best idea…

Your input is valuable

Please file issues or pull-requests. Let’s see where we can take this!