Reducing 2038-problems in curl

tldr: we’ve made curl handle dates beyond 2038 better on systems with 32 bit longs.

libcurl is very portable and is built and used on virtually all current widely used operating systems that run on 32bit or larger architectures (and on a fair amount of not so widely used ones as well).

This offers some challenges. Keeping the code stellar and working on as many platforms as possible at the same time is hard work.

How long is a long?

The C variable type “long” has existed since the dawn of time and used to be 32 bit big already back in the days most systems were 32 bits. With the introduction of 64 bit systems in the 1990s, something went wrong and when most operating systems went with 64 bit longs, some took the odd route and stuck with a 32 bit long… The windows world even chose to not support “long long” for 64 bit types but instead it insists on calling them “__int64”!

(Thankfully, ints have at least remained 32 bit!)

Two less clever API decisions

Back in the days when humans still lived in caves, we decided for the libcurl API to use ‘long’ for a whole range of function arguments. In hindsight, that was naive and not too bright. (I say “we” to make it less obvious that it of course is mostly me who’s to blame for this.)

Another less clever design idea was to use vararg functions to set (all) options. This is convenient in the way we have one function to set a huge amount of different option, but it is also quirky and error-prone because when you pass on a numeric expression in C it typically gets sent as an ‘int’ unless you tell it otherwise. So on systems with differently sized ints vs longs, it is destined to cause some mistakes that, thanks to use of varargs, the compiler can’t really help us detect! (We actually have a gcc-only hack that provides type-checking even for the varargs functions, but it is not portable.)

libcurl has both an option to pass time to libcurl using a long (CURLOPT_TIMEVALUE) and an option to extract a time from libcurl using a long (CURLINFO_FILETIME).

We stick to using our “not too bright” API for stability and compatibility. We deem it to be even more work and trouble for us and our users to change to another API rather than to work and live with the existing downsides.

Time may exist after 2038

There’s also this movement to transition the time_t variable type from 32 to 64 bit. time_t of course being the preferred type for C and C++ programs to store timestamps in. It is the number of seconds since January 1st, 1970. Sometimes called the unix epoch. A signed 32 bit time_t can be used to store timestamps with second accuracy from roughly 1903 to 2038. As more and more things will start to refer to dates after 2038, this is of course becoming a problem. We need to move to 64 bit time_t all over.

We’re now less than 20 years away from the signed 32bit tip-over point: 03:14:07 UTC, 19 January 2038.

To complicate matters even more, there are odd systems out there with unsigned time_t variables. Such systems then cannot easily refer to dates before 1970, but can instead hold dates up to the year 2106 even with just 32 bits. Oh and there are some systems with 64 bit long that feature a 32 bit time_t, and 32 bit systems with 64 bit time_t!

Most modern systems today have 64 bit time_t – including win64, and 64 bit time_t can handle dates up to about year 292,471,210,647.

int – long – time_t

  1. We cannot move data between ints and longs in the code and assume it doesn’t overflow
  2. We can’t move data losslessly between ints and time_t
  3. We must not move data between long and time_t

Recently we’ve been working on making sure we live up to these three rules in libcurl. One could say it was about time! (pun intended)

In particular number (3) has required us to add new entry points to the API so that even 32 bit long systems can set/read 64 bit time. Starting in libcurl 7.59.0, applications can pass 64 bit times to libcurl with CURLOPT_TIMEVALUE_LARGE and extract 64 bit times with CURLINFO_FILETIME_T. For compatibility reasons, the old versions will of course be kept around but newer applications should really consider the new options.

We also recently did an overhaul of our time and date parser (externally accessible as curl_getdate() ) which we learned erroneously used a ‘long’ in the calculation which made it not work proper beyond 2038 on systems with 32 bit longs. This fix will also ship in 7.59.0 (planned release date: March 21, 2018).

If you find anything in curl that doesn’t deal with times after 2038 correctly, please file a bug!

isalnum() is not my friend

The other day we noticed some curl test case failures, that only happened on macos and not on Linux. Curious!

The failures were detected in our unit test 1307, when testing a particular internal pattern matching function (Curl_fnmatch). Both targets run almost identical code but somehow they ended up with different results! Test cases acting differently on different platforms isn’t an extremely rare situation, but in this case it is just a pattern matching function and there’s really nothing timing dependent or anything that I thought could explain different behaviors. It piqued my interest, so I dug in.

The isalnum() return value

Eventually I figured out that the libc function isalnum(), when it got the 8 input value hexadecimal c3 (decimal 195), would return true on the macos machine and false on the box running Linux with glibc!

int value = isalnum(0xc3);

Setting LANG=C before running the test on macos made its isalnum() return false. The input became c3 because the test program has an UTF-8 encoded character in it and the function works on bytes, not “characters”.

Or in the words of the opengroup.org documentation:

The isalnum() function shall test whether c is a character of class alpha or digit in the program’s current locale.

It’s all documented – of course. It was just me not really considering the impact of this.

Avoiding this

I don’t like different behaviors on different platforms given the same input. I don’t like having string functions in curl act differently depending on locale, mostly because curl and libcurl can very well be used with many different locales and I prefer having a stable fixed behavior that we can document and stand by. Also, the libcurl functionality has never been documented to vary due to locale so it would be a surprise (bug!) to users anyway.

We’ve now introduced a private version of isalnum() and the rest of the ctype family of functions for curl. Hopefully this will make the tests more stable now. And make our functions work more similar and independent of locale.

See also: strcasecmp in Turkish.

Time’s up to shut up and sign up for curl up

We have just opened up the registration site for curl up 2018, the annual curl developers meeting that this year takes place in Stockholm, Sweden, over the weekend April 14-15. There’s a limited number of seats available, so if you want to join in the fun it might be a good idea to decide early on.

Sign up!

Also, to get into the proper curl up spirit, here’s the curl quiz we ran last year. I hope to run something similar again this year, but of course with a different set of questions.

Cheers for curl 7.58.0

Here’s to another curl release!

curl 7.58.0 is the 172nd curl release and it contains, among other things, 82 bug fixes thanks to 54 contributors (22 new). All this done with 131 commits in 56 days.

The bug fix rate is slightly lower than in the last few releases, which I tribute mostly to me having been away on vacation for a month during this release cycle. I retain my position as “committer of the Month” and January 2018 is my 29th consecutive month where I’ve done most commits in the curl source code repository. In total, almost 58% of the commits have been done by me (if we limit the count to all commits done since 2014, I’m at 43%). We now count a total of 545 unique commit authors and 1,685 contributors.

So what’s new this time? (full changelog here)

libssh backend

Introducing the pluggable SSH backend, and libssh is now the new alternative SSH backend to libssh2 that has been supported since late 2006. This change alone brought thousands of new lines of code.

Tell configure to use it with –with-libssh and you’re all set!

The libssh backend work was done by Nikos Mavrogiannopoulos, Tomas Mraz, Stanislav Zidek, Robert Kolcun and Andreas Schneider.

Security

Yet again we announce security issues that we’ve found and fixed. Two of them to be exact:

  1. We found a problem with how HTTP/2 trailers was handled, which could lead to crashes or even information leakage.
  2. We addressed a problem for users sending custom Authorization: headers to HTTP servers and who are then redirected to another host that shouldn’t receive those Authorization headers.

Progress bar refresh

A minor thing, but we refreshed the progress bar layout for when no total size is known.

Next?

March 21 is the date set for next release. Unless of course we find an urgent reason to fix and release something before then…

A flying curl progress bar

curl features an alternative progress bar. When you invoke it with -# or the longer version –progress-bar, curl will show the transfer progress using a single “bar” on the screen instead of the default meter that shows a lot of data like amount of data, transfer speeds and times.

$ curl -# -O https://example.com/coolfile.tar.gz
############################################ 100.0%

The alternative progress bar works great when the amount of data to transfer is known since then it can actually know how large part of the transfer that is done etc. If the amount of data is unknown – which is not a super rare situation – the progress bar output instead used to output one ‘#’ per kilobyte of data so that it would still show something. That could then end up filling up the screen and more if you did a large transfer.

$ curl -# -O https://example.com/nosize.html
###########################################################################################################

The space ship bar

Starting in curl 7.58.0 (to be released on January 24, 2018), this latter progress bar layout is modified. If the total size is unknown, it will now instead display a small space ship flying across the line, back and forth – and it will only move as long as there is data being transferred. If it stalls, the little ship stops.

“Over” the space ship there are four nonsensical flying hashes (‘#’) that are simply moving across the line on a sine wave, following each other. They move independently of there being data transferred or not.

It can then end up looking similar to this:

Pointless

There’s no real “meaning” behind this new progress bar output mode. I wanted it to

  1. only use a single line, even in the no-total size known case
  2. somehow indicate when there’s no data flying (ie space ship stops)
  3. make it slightly more interesting to watch than just one # per kilobyte

Since this new bar has just landed and this is the first time we ship a release with it, I wouldn’t be surprised if we end up polishing it further later on.

Can you tell I started out my programming life as a demo programmer on the Commodore 64? 🙂

Inspect curl’s TLS traffic

Since a long time back, the venerable network analyzer tool Wireshark (screenshot above) has provided a way to decrypt and inspect TLS traffic when sent and received by Firefox and Chrome.

You do this by making the browser tell Wireshark the SSL secrets:

  1. set the environment variable named SSLKEYLOGFILE to a file name of your choice before you start the browser
  2. Setting the same file name path in the Master-secret field in Wireshark. Go to Preferences->Protocols->SSL and edit the path as shown in the screenshot below.

Having done this simple operation, you can now inspect your browser’s HTTPS traffic in Wireshark. Just super handy and awesome.

Just remember that if you record TLS traffic and want to save it for analyzing later, you need to also save the file with the secrets so that you can decrypt that traffic capture at a later time as well.

curl

Adding curl to the mix. curl can be built using a dozen different TLS libraries and not just a single one as the browsers do. It complicates matters a bit.

In the NSS library for example, which is the TLS library curl is typically built with on Redhat and Centos, handles the SSLKEYLOGFILE magic all by itself so by extension you have been able to do this trick with curl for a long time – as long as you use curl built with NSS. A pretty good argument to use that build really.

Since curl version 7.57.0 the SSLKEYLOGFILE feature can also be enabled when built with GnuTLS, BoringSSL or OpenSSL. In the latter two libs, the feature is powered by new APIs in those libraries and in GnuTLS the library’s own logic similar to how NSS does it. Since OpenSSL is the by far most popular TLS backend for curl, this feature is now brought to users more widely.

In curl 7.58.0 (due to ship on Janurary 24, 2018), this feature is built by default also for curl with OpenSSL and in 7.57.0 you need to define ENABLE_SSLKEYLOGFILE to enable it for OpenSSL and BoringSSL.

And what’s even cooler? This feature is at the same time also brought to every single application out there that is built against this or later versions of libcurl. In one single blow. now suddenly a whole world opens to make it easier for you to debug, diagnose and analyze your applications’ TLS traffic when powered by libcurl!

Like the description above for browsers, you

  1. set the environment variable SSLKEYLOGFILE to a file name to store the secrets in
  2. tell Wireshark to use that same file to find the TLS secrets (Preferences->Protocols->SSL), as the screenshot showed above
  3. run the libcurl-using application (such as curl) and Wireshark will be able to inspect TLS-based protocols just fine!

trace options

Of course, as a light weight alternative: you may opt to use the –trace or –trace-ascii options with the curl tool and be fully satisfied with that. Using those command line options, curl will log everything sent and received in the protocol layer without the TLS applied. With HTTPS you’ll see all the HTTP traffic for example.

Credits

Most of the curl work to enable this feature was done by Peter Wu and Ray Satiro.

Microsoft curls too

On December 19 2017, Microsoft announced that since insider build 17063 of Windows 10, curl is now a default component. I’ve been away from home since then so I haven’t really had time to sit down and write and explain to you all what this means, so while I’m a bit late, here it comes!

I see this as a pretty huge step in curl’s road to conquer the world.

curl was already existing on Windows

Ever since we started shipping curl, it has been possible to build curl for Windows and run it on Windows. It has been working fine on all Windows versions since at least Windows 95. Running curl on Windows is not new to us. Users with a little bit of interest and knowledge have been able to run curl on Windows for almost 20 years already.

Then we had the known debacle with Microsoft introducing a curl alias to PowerShell that has put some obstacles in the way for users of curl.

Default makes a huge difference

Having curl shipped by default by the manufacturer of an operating system of course makes a huge difference. Once this goes out to the general public, all of a sudden several hundred million users will get a curl command line tool install for them without having to do anything. Installing curl yourself on Windows still requires some skill and knowledge and on places like stackoverflow, there are many questions and users showing how it can be problematic.

I expect this to accelerate the curl command line use in the world. I expect this to increase the number of questions on how to do things with curl.

Lots of people mentioned how curl is a “good” new tool to use for malicious downloads of files to windows machines if you manage to run code on someone’s Windows computer. curl is quite a capable thing that you truly do not want to get invoked involuntarily. But sure, any powerful and capable tool can of course be abused.

About the installed curl

This is what it looks when you check out the curl version on this windows build:

(screenshot from Steve Holme)

I don’t think this means that this is necessarily exactly what curl will look like once this reaches the general windows 10 installation, and I also expect Microsoft to update and upgrade curl as we go along.

Some observations from this simple screenshot, and if you work for Microsoft you may feel free to see this as some subtle hints on what you could work on improving in future builds:

  1. They ship 7.55.1, while 7.57.0 was the latest version at the time. That’s just three releases away so I consider that pretty good. Lots of distros and others ship (much) older releases. It’ll be interesting to see how they will keep this up in the future.
  2. Unsurprisingly, they use a build that uses the WinSSL backend for TLS.
  3. They did not build it with IDN support.
  4. They’ve explicitly disabled support a whole range of protocols that curl supports natively by default (gopher, smb, rtsp etc), but they still have a few rare protocols enabled (like dict).
  5. curl supports LDAP using the windows native API, but that’s not used.
  6. The Release-Date line shows they built curl from unreleased sources (most likely directly from a git clone).
  7. No HTTP/2 support is provided.
  8. There’s no automatic decompression support for gzip or brotli content.
  9. The build doesn’t support metalink and no PSL (public suffix list).

(curl gif from the original Microsoft curl announcement blog post)

Independent

Finally, I’d like to add that like all operating system distributions that ship curl (macOS, Linux distros, the BSDs, AIX, etc) Microsoft builds, packages and ships the curl binary completely independently from the actual curl project.

Sure I’ve been in contact with the good people working on this from their end, but they are working totally independently of us in the curl project. They mostly get our code, build it and ship it.

I of course hope that we will get bug fixes and improvement from their end going forward when they find problems or things to polish.

The future looks as great as ever before!

Update: in March 2018, they mentioned that curl comes in Windows 10 version 1803.

The curl year 2017

I’m about to take an extended vacation for the rest of the year and into the beginning of the next, so I decided I’d sum up the year from a curl angle already now, a few weeks early. (So some numbers will grow a bit more after this post.)

2017

So what did we do this year in the project, how did curl change?

The first curl release of the year was version 7.53.0 and the last one was 7.57.0. In the separate blog posts on 7.55.0, 7.56.0 and 7.57.0 you’ll note that we kept up adding new goodies and useful features. We produced a total of 9 releases containing 683 bug fixes. We announced twelve security problems. (Down from 24 last year.)

At least 125 different authors wrote code that was merged into curl this year, in the 1500 commits that were made. We never had this many different authors during a single year before in the project’s entire life time! (The 114 authors during 2016 was the previous all-time high.)

We added more than 160 new names to the THANKS document for their help in improving curl. The total amount of contributors is now over 1660.

This year we truly started to use travis for CI builds and grew from a mere two builds per commit and PR up to nineteen (with additional ones run on appveyor and elsewhere). The current build set is a very good verification that that most things still compile and work after a PR is merged. (see also the testing curl article).

Mozilla announced that they too will use colon-slash-slash in their logo. Of course we all know who had it that in their logo first… =)

 

In March 2017, we had our first ever curl get-together as we arranged curl up 2017 a weekend in Nuremberg, Germany. It was very inspiring and meeting parts of the team in real life was truly a blast. This was so good we intend to do it again: curl up 2018 will happen.

curl turned 19 years old in March. In May it surpassed 5,000 stars on github.

Also in May, we moved over the official curl site (and my personal site) to get hosted by Fastly. We were beginning to get problems to handle the bandwidth and load, and in one single step all our worries were graciously taken care of!

We got curl entered into the OSS-fuzz project, and Max Dymond even got a reward from Google for his curl-fuzzing integration work and thanks to that project throwing heaps of junk at libcurl’s APIs we’ve found and fixed many issues.

The source code (for the tool and library only) is now at about 143,378 lines of code. It grew around 7,057 lines during the year. The primary reasons for the code growth were:

  1. the new libssh-powered SSH backend (not yet released)
  2. the new mime API (in 7.56.0) and
  3. the new multi-SSL backend support (also in 7.56.0).

Your maintainer’s view

Oh what an eventful year it has been for me personally.

The first interim meeting for QUIC took place in Japan, and I participated from remote. After all, I’m all set on having curl support QUIC and I’ll keep track of where the protocol is going! I’ve participated in more interim meetings after that, all from remote so far.

I talked curl on the main track at FOSDEM in early February (and about HTTP/2 in the Mozilla devroom). I’ve then followed that up and have also had the pleasure to talk in front of audiences in Stockholm, Budapest, Jönköping and Prague through-out the year.

 

I went to London and “represented curl” in the third edition of the HTTP workshop, where HTTP protocol details were discussed and disassembled, and new plans for the future of HTTP were laid out.

 

In late June I meant to go to San Francisco to a Mozilla “all hands” conference but instead I was denied to board the flight. That event got a crazy amount of attention and I received massive amounts of love from new and old friends. I have not yet tried to enter the US again, but my plan is to try again in 2018…

I wrote and published my h2c tool, meant to help developers convert a set of HTTP headers into a working curl command line.

The single occasion that overshadows all other events and happenings for me this year by far, was without doubt when I was awarded the Polhem Prize and got a gold medal medal from no other than his majesty the King of Sweden himself. For all my work and years spent on curl no less.

Not really curl related, but in November I was also glad to be part of the huge Firefox Quantum release. The biggest Firefox release ever, and one that has been received really well.

I’ve managed to commit over 800 changes to curl through the year, which is 54% of the totals and more commits than I’ve done in curl during a single year since 2005 (in which I did 855 commits). I explain this increase mostly on inspiration from curl up and the prize, but I think it also happened thanks to excellent feedback and motivation brought by my fellow curl hackers.

We’re running towards the end of 2017 with me being the individual who did most commits in curl every single month for the last 28 months.

2018?

More things to come!

curl 7.57.0 happiness

The never-ending series of curl releases continued today when we released version 7.57.0. The 171th release since the beginning, and the release that follows 37 days after 7.56.1. Remember that 7.56.1 was an extra release that fixed a few most annoying regressions.

We bump the minor number to 57 and clear the patch number in this release due to the changes introduced. None of them very ground breaking, but fun and useful and detailed below.

41 contributors helped fix 69 bugs in these 37 days since the previous release, using 115 separate commits. 23 of those contributors were new, making the total list of contributors now contain 1649 individuals! 25 individuals authored commits since the previous release, making the total number of authors 540 persons.

The curl web site currently sends out 8GB data per hour to over 2 million HTTP requests per day.

Support RFC7616 – HTTP Digest

This allows HTTP Digest authentication to use the must better SHA256 algorithm instead of the old, and deemed unsuitable, MD5. This should be a transparent improvement so curl should just be able to use this without any particular new option has to be set, but the server-side support for this version seems to still be a bit lacking.

(Side-note: I’m credited in RFC 7616 for having contributed my thoughts!)

Sharing the connection cache

In this modern age with multi core processors and applications using multi-threaded designs, we of course want libcurl to enable applications to be able to get the best performance out of libcurl.

libcurl is already thread-safe so you can run parallel transfers multi-threaded perfectly fine if you want to, but it doesn’t allow the application to share handles between threads. Before this specific change, this limitation has forced multi-threaded applications to be satisfied with letting libcurl has a separate “connection cache” in each thread.

The connection cache, sometimes also referred to as the connection pool, is where libcurl keeps live connections that were previously used for a transfer and still haven’t been closed, so that a subsequent request might be able to re-use one of them. Getting a re-used connection for a request is much faster than having to create a new one. Having one connection cache per thread, is ineffective.

Starting now, libcurl’s “share concept” allows an application to specify a single connection cache to be used cross-thread and cross-handles, so that connection re-use will be much improved when libcurl is used multi-threaded. This will significantly benefit the most demanding libcurl applications, but it will also allow more flexible designs as now the connection pool can be designed to survive individual handles in a way that wasn’t previously possible.

Brotli compression

The popular browsers have supported brotli compression method for a while and it has already become widely supported by servers.

Now, curl supports it too and the command line tool’s –compressed option will ask for brotli as well as gzip, if your build supports it. Similarly, libcurl supports it with its CURLOPT_ACCEPT_ENCODING option. The server can then opt to respond using either compression format, depending on what it knows.

According to CertSimple, who ran tests on the top-1000 sites of the Internet, brotli gets contents 14-21% smaller than gzip.

As with other compression algorithms, libcurl uses a 3rd party library for brotli compression and you may find that Linux distributions and others are a bit behind in shipping packages for a brotli decompression library. Please join in and help this happen. At the moment of this writing, the Debian package is only available in experimental.

(Readers may remember my libbrotli project, but that effort isn’t really needed anymore since the brotli project itself builds a library these days.)

Three security issues

In spite of our hard work and best efforts, security issues keep getting reported and we fix them accordingly. This release has three new ones and I’ll describe them below. None of them are alarmingly serious and they will probably not hurt anyone badly.

Two things can be said about the security issues this time:

1. You’ll note that we’ve changed naming convention for the advisory URLs, so that they now have a random component. This is to reduce potential information leaks based on the name when we pass these around before releases.

2. Two of the flaws happen only on 32 bit systems, which reveals a weakness in our testing. Most of our CI tests, torture tests and fuzzing are made on 64 bit architectures. We have no immediate and good fix for this, but this is something we must work harder on.

1. NTLM buffer overflow via integer overflow

(CVE-2017-8816) Limited to 32 bit systems, this is a flaw where curl takes the combined length of the user name and password, doubles it, and allocates a memory area that big. If that doubling ends up larger than 4GB, an integer overflow makes a very small buffer be allocated instead and then curl will overwrite that.

Yes, having user name plus password be longer than two gigabytes is rather excessive and I hope very few applications would allow this.

2. FTP wildcard out of bounds read

(CVE-2017-8817) curl’s wildcard functionality for FTP transfers is not a not very widely used feature, but it was discovered that the default pattern matching function could erroneously read beyond the URL buffer if the match pattern ends with an open bracket ‘[‘ !

This problem was detected by the OSS-Fuzz project! This flaw  has existed in the code since this feature was added, over seven years ago.

3. SSL out of buffer access

(CVE-2017-8818) In July this year we introduced multissl support in libcurl. This allows an application to select which TLS backend libcurl should use, if it was built to support more than one. It was a fairly large overhaul to the TLS code in curl and unfortunately it also brought this bug.

Also, only happening on 32 bit systems, libcurl would allocate a buffer that was 4 bytes too small for the TLS backend’s data which would lead to the TLS library accessing and using data outside of the heap allocated buffer.

Next?

The next release will ship no later than January 24th 2018. I think that one will as well add changes and warrant the minor number to bump. We have fun pending stuff such as: a new SSH backend, modifiable happy eyeballs timeout and more. Get involved and help us do even more good!

HTTPS-only curl mirrors

We’ve had volunteers donating bandwidth to the curl project basically since its inception. They mirror our download archives so that you can download them directly from their server farms instead of hitting the main curl site.

On the main site we check the mirrors daily and offers convenient download links from the download page. It has historically been especially useful for the rare occasions when our site has been down for administrative purpose or others.

Since May 2017 the curl site is fronted by Fastly which then has reduced the bandwidth issue as well as the downtime problem. The mirrors are still there though.

Starting now, we will only link to download mirrors that offer the curl downloads over HTTPS in our continued efforts to help our users to stay secure and avoid malicious manipulation of data. I’ve contacted the mirror admins and asked if they can offer HTTPS instead.

The curl download page still contains links to HTTP-only packages and pages, and we would really like to fix them as well. But at the same time we’ve reasoned that it is better to still help users to find packages than not, so for the packages where there are no HTTPS linkable alternatives we still link to HTTP-only pages. For now.

If you host curl packages anywhere, for anyone, please consider hosting them over HTTPS for all the users’ sake.

curl, open source and networking