Tag Archives: API

Reducing 2038-problems in curl

tldr: we've made curl handle dates beyond 2038 better on systems with 32 bit longs.

libcurl is very portable and is built and used on virtually all current widely used operating systems that run on 32bit or larger architectures (and on a fair amount of not so widely used ones as well).

This offers some challenges. Keeping the code stellar and working on as many platforms as possible at the same time is hard work.

How long is a long?

The C variable type "long" has existed since the dawn of time and used to be 32 bit big already back in the days most systems were 32 bits. With the introduction of 64 bit systems in the 1990s, something went wrong and when most operating systems went with 64 bit longs, some took the odd route and stuck with a 32 bit long... The windows world even chose to not support "long long" for 64 bit types but instead it insists on calling them "__int64"!

(Thankfully, ints have at least remained 32 bit!)

Two less clever API decisions

Back in the days when humans still lived in caves, we decided for the libcurl API to use 'long' for a whole range of function arguments. In hindsight, that was naive and not too bright. (I say "we" to make it less obvious that it of course is mostly me who's to blame for this.)

Another less clever design idea was to use vararg functions to set (all) options. This is convenient in the way we have one function to set a huge amount of different option, but it is also quirky and error-prone because when you pass on a numeric expression in C it typically gets sent as an 'int' unless you tell it otherwise. So on systems with differently sized ints vs longs, it is destined to cause some mistakes that, thanks to use of varargs, the compiler can't really help us detect! (We actually have a gcc-only hack that provides type-checking even for the varargs functions, but it is not portable.)

libcurl has both an option to pass time to libcurl using a long (CURLOPT_TIMEVALUE) and an option to extract a time from libcurl using a long (CURLINFO_FILETIME).

We stick to using our "not too bright" API for stability and compatibility. We deem it to be even more work and trouble for us and our users to change to another API rather than to work and live with the existing downsides.

Time may exist after 2038

There's also this movement to transition the time_t variable type from 32 to 64 bit. time_t of course being the preferred type for C and C++ programs to store timestamps in. It is the number of seconds since January 1st, 1970. Sometimes called the unix epoch. A signed 32 bit time_t can be used to store timestamps with second accuracy from roughly 1903 to 2038. As more and more things will start to refer to dates after 2038, this is of course becoming a problem. We need to move to 64 bit time_t all over.

We're now less than 20 years away from the signed 32bit tip-over point: 03:14:07 UTC, 19 January 2038.

To complicate matters even more, there are odd systems out there with unsigned time_t variables. Such systems then cannot easily refer to dates before 1970, but can instead hold dates up to the year 2106 even with just 32 bits. Oh and there are some systems with 64 bit long that feature a 32 bit time_t, and 32 bit systems with 64 bit time_t!

Most modern systems today have 64 bit time_t - including win64, and 64 bit time_t can handle dates up to about year 292,471,210,647.

int - long - time_t

  1. We cannot move data between ints and longs in the code and assume it doesn't overflow
  2. We can't move data losslessly between ints and time_t
  3. We must not move data between long and time_t

Recently we've been working on making sure we live up to these three rules in libcurl. One could say it was about time! (pun intended)

In particular number (3) has required us to add new entry points to the API so that even 32 bit long systems can set/read 64 bit time. Starting in libcurl 7.59.0, applications can pass 64 bit times to libcurl with CURLOPT_TIMEVALUE_LARGE and extract 64 bit times with CURLINFO_FILETIME_T. For compatibility reasons, the old versions will of course be kept around but newer applications should really consider the new options.

We also recently did an overhaul of our time and date parser (externally accessible as curl_getdate() ) which we learned erroneously used a 'long' in the calculation which made it not work proper beyond 2038 on systems with 32 bit longs. This fix will also ship in 7.59.0 (planned release date: March 21, 2018).

If you find anything in curl that doesn't deal with times after 2038 correctly, please file a bug!

fcurl is fread and friends for URLs

This whole family of functions, fopen, fread, fwrite, fgets, fclose and more are defined in the C standard since C89. You can't really call yourself a C programmer without knowing them and probably even using them in at least a few places.

The charm with these is that they're standard, they're easy to use and they're available everywhere where there's a C compiler.

A basic example that just reads a file from disk and writes it to stdout could look like this:

FILE *file;

file = fopen("hello.txt", "r");
if(file) {
  char buffer [256];
  while(1) {
    size_t rc = fread(buffer, sizeof(buffer),
                1, file);
    if(rc > 0)
      fwrite(buffer, rc, 1, stdout);
    else
      break;
  }
  fclose(file);
}

Imagine you'd like to switch this example, or one of your actual real world programs that use the fopen() family of functions to read or write files, and instead read and write files from and to the Internet instead using your favorite Internet protocols. How would you do that without having to change your code a lot and do a major refactoring job?

Enter fcurl

I've started to work on a library that provides a look-alike API with matching functions and behaviors, but that allows fopen() to instead specify a URL instead of a file name. I call it fcurl. (Much inspired by the libcurl example fopen.c, which I wrote the first version of already back in 2002!)

It is of course open source and is powered by libcurl.

The project is in its early infancy. I think it would be interesting to try it out and I've mentioned the idea to a few people that have shown interest. I really can't make this happen all on my own anyway so while I've created a first embryo, it will take some time before it gets truly useful. Help from others would be greatly appreciated of course.

Using this API, a version of the above example that instead reads data from a HTTPS site instead of a local file could look like:

FCURL *file;

file = fcurl_open("https://daniel.haxx.se/",
                  "r");
if(file) {
  char buffer [256];
  while(1) {
    size_t rc = fcurl_read(buffer,         
                           sizeof(buffer), 1, 
                           file);
    if(rc > 0)
      fwrite(buffer, rc, 1, stdout);
    else
      break;
  }
  fcurl_close(file);
}

And it could even actually also read a local file using the file:// sheme.

Drop-in replacement

The idea here is to make the alternative functions have new names but as far as possible accept the same input arguments, return the same return codes and so on.

If we do it right, you could possibly even convert an existing program with just a set of #defines at the top without even having to change the code!

Something like this:

#define FILE FCURL
#define fopen(x,y) fcurl_open(x, y)
#define fclose(x) fcurl_close(x)

I think it is worth considering a way to provide an official macro set like that for those who'd like to switch easily (?) and quickly.

Fun things to consider

1. for non-scheme input, use normal fopen?

An interesting take is probably to make fcurl_open() treat input specified without a "scheme://" to be a local file, and then passed to fopen() instead under the hood. That would then enable even more code to switch to fcurl since all the existing use cases with local file names would just continue to work.

2. LD_PRELOAD

An interesting area of deeper research around this could be to provide a way to LD_PRELOAD replacements for the functions so that not even any source code would need be changed and already built existing binaries could be given this functionality.

3. fopencookie

There's also the GNU libc's fopencookie concept to figure out if that is something for fcurl to support/use. BSD and OS X have something similar called funopen.

4. merge in official libcurl

If this turns out useful, appreciated and good. We could consider moving the API in under the curl project's umbrella and possibly eventually even making it part of the actual libcurl. But hey, we're far away from that and I'm not saying that is even the best idea...

Your input is valuable

Please file issues or pull-requests. Let's see where we can take this!

Using APIs without reading docs

This morning, my debug session was interrupted for a brief moment when two friends independently of each other pinged me to inform me about a talk at the current SEC-T conference going on here in Stockholm right now. It was yet again time to bring up the good old fun called libcurl API bashing. Again from the angle that users who don't read the API docs might end up using it wrong.

Updated: You can see Meredith Patterson's talk here, and the libcurl parts start at 24:15.

The specific libcurl topic at hand once again mostly had the CURLOPT_VERIFYHOST option in focus, with basically is the same argument that was thrown at us two years ago when libcurl was said to be dangerous. It is not a boolean. It is an option that takes (or took) three different values, where 2 is the secure level and 0 is disabled.

SEC-T on curl API

(This picture is a screengrab from the live stream off youtube, I don't have any link to a stored version of it yet. Click it for slightly higher resolution.)

Speaker Meredith L. Patterson actually spoke for quite a long time about curl and its options to verify server certificates. While I will agree that she has a few good points, it was still riddled with errors and I think she deliberately phrased things in a manner to make the talk good and snappy rather than to be factually correct and trying to understand why things are like they are.

The VERIFYHOST option apparently sounds as if it takes a boolean (accordingly), but it doesn't. She says verifying a certificate has to be a Yes/No question so obviously it is a boolean. First, let's be really technical: the libcurl options that take numerical values always accept a 'long' and all documentation specify which values you can pass in. None of them are boolean, not by actual type in the C language and not described like that in the man pages. There are however language bindings running on top of libcurl that may use booleans for the values that take 0 or 1, but there's no guarantee we won't add more values in a future to numerical options.

I wrote down a few quotes from her that I'd like to address.

"In order for it to do anything useful, the value actually has to be set to two"

I get it, she wants a fun presentation that makes the audience listen and grin cheerfully. But this is highly inaccurate. libcurl has it set to verify by default. An application doesn't have to set it to anything. The only reason to set this value is if you're not happy with checking the cert unconditionally, and then you've already wondered off the secure route.

"All it does when set to to two is to check that the common name in the cert matches the host name in the URL. That's literally all it does."

No, it's not. It "only" verifies the host name curl connects to against the name hints in the server cert, yes, but that's a lot more than just the common name field.

"there's been 10 versions and they haven't fixed this yet [...] the docs still say they're gonna fix this eventually [...] I wanna know when eventually is"

Qualified BS and ignorance of details. Let's see the actual code first: it ignores the 1 value and returns an error and thus leaves the internal default 2, Alas, code that sets 1 or 2 gets the same effect == verified certificate. Why is this a problem?

Then, she says she really wants to know when "eventually" is. (The docs say "Future versions will...") So if she was so curious you'd think she would've tried to ask us? We're an accessible bunch, on mailing lists, on IRC and on twitter. No she didn't ask.

But perhaps most importantly: did she really consider why it returns an error for 1? Since libcurl silently accepted 1 as a value for something like 10 years, there are a lot of old installations "out there" in the wild, and by returning an error for 1 we try to make applications notice and adjust. By silently accepting 1 without errors, there would be no notice and people will keep using 1 in new applications as well and thus when running such an newly written application with an older libcurl - you'd be back to having the security problem again. So, we have the error there to improve the situation.

"a peer is someone like you [...] a host is a server"

I'm a networking guy since 20+ years and I'm not used to people having a hard time to understand these terms. While perhaps there are rookies out in the world who don't immediately understand some terms in the curl option names, should we really be criticized for that? I find that a hilarious critique. Also, these names were picked 13 years ago and we have them around for compatibility and API stability.

"why would you ever want to ..."

Welcome to the real world. Why would an application author ever want to have these options to something else than just full check and no check? Because people and software development is a large world with many different desires and use case scenarios and curl is more widely used and abused than what many people consider. Lots of people have wanted something else than just a Yes/No to server cert verification. In fact, I've had many users ask for even more switches and fine-grained ways to fiddle with verification. Yes/No is a lay mans simplified view of certificate verification.

SEC-T curl slide

(This picture is the slide from the above picture, just zoomed and straightened out a bit.)

API age, stability and organic growth

We started working on libcurl in spring 1999, we added the CURLOPT_SSL_VERIFYPEER option in October 2000 and we added CURLOPT_SSL_VERIFYHOST in August 2001. All that quite a long time ago.

Then add thousands of hours, hundreds of hackers, thousands of applications, a user count that probably surpasses one billion users by now. Then also add the fact that option names are sticky in the way we write docs, examples pop up all over the internet and everyone who's close to the project learns them by name and spirit and we quite simply grow attached to them and the way they work. Changing the name of an option is really painful and cause of a lot of confusion.

I've instead tried to more and more emphasize the functionality in the docs, to stress what the options do and how to do server cert verifications with curl the safe way.

I can't force users to read docs. I can't forbid users to blindly assume something and I'm not in control of, nor do I want to affect, the large population of third party bindings that exist for using on top of libcurl to cater for every imaginable programming language - and some of them may of course themselves have documentation problems and what not.

Would I change some of the APIs and names for options we have in libcurl if I would redo them today? Yes I would.

So what do we do about it?

I think this is the only really interesting question to take from all this. Everyone wants stable APIs. Everyone wants sensible and easy to understand APIs and as we can see they should also basically be possible to figure out without reading any documentation. And yet the API has to be powerful and flexible enough to be really useful for all those different applications.

At this point where we have these options that we do, when you've done your mud slinging and the finger of blame is firmly pointed at us. How exactly do you suggest we move forward to fix these claimed problems?

Taking it personally

Before anyone tells me to not take it personally: curl is my biggest hobby and a project I've spent many years and thousands of hours on. Of course I take it personally, otherwise I would've stopped working in the project a long time ago. This is personal to me. I give it my loving care and personal energy and then someone comes here and throw ill-founded and badly researched criticisms at me. I think criticizers of open source projects should learn to discuss the matters with the projects as their primary way instead of using it to make their conference presentations become more feisty.