Tag Archives: C89

curl is C

For some reason, this post got picked up again and is debated today in 2021, almost 4 years since I wrote it. Some things have changed in the mean time and I might’ve phrased a few things differently if I had written this today. But still, what’s here below is what I wrote back then. Enjoy!

Every once in a while someone suggests to me that curl and libcurl would do better if rewritten in a “safe language”. Rust is one such alternative language commonly suggested. This happens especially often when we publish new security vulnerabilities. (Update: I think Rust is a fine language! This post and my stance here has nothing to do with what I think about Rust or other languages, safe or not.)

curl is written in C

The curl code guidelines mandate that we stick to using C89 for any code to be accepted into the repository. C89 (sometimes also called C90) – the oldest possible ANSI C standard. Ancient and conservative.

C is everywhere

This fact has made it possible for projects, companies and people to adopt curl into things using basically any known operating system and whatever CPU architecture you can think of (at least if it was 32bit or larger). No other programming language is as widespread and easily available for everything. This has made curl one of the most portable projects out there and is part of the explanation for curl’s success.

The curl project was also started in the 90s, even long before most of these alternative languages you’d suggest, existed. Heck, for a truly stable project it wouldn’t be responsible to go with a language that isn’t even old enough to start school yet.

Everyone knows C

Perhaps not necessarily true anymore, but at least the knowledge of C is very widespread, where as the current existing alternative languages for sure have more narrow audiences or amount of people that master them.

C is not a safe language

Does writing safe code in C require more carefulness and more “tricks” than writing the same code in a more modern language better designed to be “safe” ? Yes it does. But we’ve done most of that job already and maintaining that level isn’t as hard or troublesome.

We keep scanning the curl code regularly with static code analyzers (we maintain a zero Coverity problems policy) and we run the test suite with valgrind and address sanitizers.

C is not the primary reason for our past vulnerabilities

There. The simple fact is that most of our past vulnerabilities happened because of logical mistakes in the code. Logical mistakes that aren’t really language bound and they would not be fixed simply by changing language.

Of course that leaves a share of problems that could’ve been avoided if we used another language. Buffer overflows, double frees and out of boundary reads etc, but the bulk of our security problems has not happened due to curl being written in C.

C is not a new dependency

It is easy for projects to add a dependency on a library that is written in C since that’s what operating systems and system libraries are written in, still today in 2017. That’s the default. Everyone can build and install such libraries and they’re used and people know how they work.

A library in another language will add that language (and compiler, and debugger and whatever dependencies a libcurl written in that language would need) as a new dependency to a large amount of projects that are themselves written in C or C++ today. Those projects would in many cases downright ignore and reject projects written in “an alternative language”.

curl sits in the boat

In the curl project we’re deliberately conservative and we stick to old standards, to remain a viable and reliable library for everyone. Right now and for the foreseeable future. Things that worked in curl 15 years ago still work like that today. The same way. Users can rely on curl. We stick around. We don’t knee-jerk react to modern trends. We sit still in the boat. We don’t rock it.

Rewriting means adding heaps of bugs

The plain fact, that also isn’t really about languages but is about plain old software engineering: translating or rewriting curl into a new language will introduce a lot of bugs. Bugs that we don’t have today.

Not to mention how rewriting would take a huge effort and a lot of time. That energy can instead today be spent on improving curl further.

What if

If I would start the project today, would I’ve picked another language? Maybe. Maybe not. If memory safety and related issues was the primary concern I had, then sure. But as I’ve mentioned above there are several others concerns too so it would really depend on my priorities.

Finally

At the end of the day the question that remains is: would we gain more than we would pay, and over which time frame? Who would gain and who would lose?

I’m sure that there will be or it may even already exist, curl and libcurl competitors and potent alternatives written in most of these new alternative languages. Some of them are absolutely really good and will get used and reach fame and glory. Some of them will be crap. Just like software always work. Let a thousand curl competitors bloom!

Will curl be rewritten at some point in the future? I won’t rule it out, but I find it unlikely. I find it even more unlikely that it will happen in the short term or within the next few years.

Discuss this post on Hacker news or Reddit!

Followup-post: Yes, C is unsafe, but…

curl wants to QUIC

The interesting Google transfer protocol that is known as QUIC is being passed through the IETF grinding machines to hopefully end up with a proper “spec” that has been reviewed and agreed to by many peers and that will end up being a protocol that is thoroughly documented with a lot of protocol people’s consensus. Follow the IETF QUIC mailing list for all the action.

I’d like us to join the fun

Similarly to how we implemented HTTP/2 support early on for curl, I would like us to get “on the bandwagon” early for QUIC to be able to both aid the protocol development and serve as a testing tool for both the protocol and the server implementations but then also of course to get us a solid implementation for users who’d like a proper QUIC capable client for data transfers.

implementations

The current version (made entirely by Google and not the output of the work they’re now doing on it within the IETF) of the QUIC protocol is already being widely used as Chrome speaks it with Google’s services in preference to HTTP/2 and other protocol options. There exist only a few other implementations of QUIC outside of the official ones Google offers as open source. Caddy offers a separate server implementation for example.

the Google code base

For curl’s sake, it can’t use the Google code as a basis for a QUIC implementation since it is C++ and code used within the Chrome browser is really too entangled with the browser and its particular environment to become very good when converted into a library. There’s a libquic project doing exactly this.

for curl and others

The ideal way to implement QUIC for curl would be to create “nghttp2” alternative that does QUIC. An ngquic if you will! A library that handles the low level protocol fiddling, the binary framing etc. Done that way, a QUIC library could be used by more projects who’d like QUIC support and all people who’d like to see this protocol supported in those tools and libraries could join in and make it happen. Such a library would need to be written in plain C and be suitably licensed for it to be really interesting for curl use.

a needed QUIC library

I’m hoping my post here will inspire someone to get such a project going. I will not hesitate to join in and help it get somewhere! I haven’t started such a project myself because I think I already have enough projects on my plate so I fear I wouldn’t be a good leader or maintainer of a project like this. But of course, if nobody else will do it I will do it myself eventually. If I can think of a good name for it.

some wishes for such a library

  • Written in C, to offer the same level of portability as curl itself and to allow it to get used as extensions by other languages etc
  • FOSS-licensed suitably
  • It should preferably not “own” the socket but also work in-memory and to allow applications to do many parallel connections etc.
  • Non-blocking. It shouldn’t wait for things on its own but let the application do that.
  • Should probably offer both client and server functionality for maximum use.
  • What else?

Three static code analyzers compared

I’m a fan of static code analyzing. With the use of fancy scanner tools we can get detailed reports about source code mishaps and quite decently pinpoint what source code that is suspicious and may contain bugs. In the old days we used different lint versions but they were all annoying and very often just puked out far too many warnings and errors to be really useful.

Out of coincidence I ended up getting analyses done (by helpful volunteers) on the curl 7.26.0 source base with three different tools. An excellent opportunity for me to compare them all and to share the outcome and my insights of this with you, my friends. Perhaps I should add that the analyzed code base is 100% pure C89 compatible C code.

Some general observations

First out, each of the three tools detected several issues the other two didn’t spot. I would say this indicates that these tools still have a lot to improve and also that it actually is worth it to run multiple tools against the same source code for extra precaution.

Secondly, the libcurl source code has some known peculiarities that admittedly is hard for static analyzers to figure out and not alert with false positives. For example we have several macros that look like functions and on several platforms and build combinations they evaluate as nothing, which causes dead code to be generated. Another example is that we have several cases of vararg-style functions and these functions are documented to work in ways that the analyzers don’t always figure out (both clang-analyzer and Coverity show problems with these).

Thirdly, the same lesson we knew from the lint days is still true. Tools that generate too many false positives are really hard to work with since going through hundreds of issues that after analyses turn out to be nothing makes your eyes sore and your head hurt.

Fortify

The first report I got was done with Fortify. I had heard about this commercial tool before but I had never seen any results from a run but now I did. The report I got was a PDF containing 629 pages listing 1924 possible issues among the 130,000 lines of code in the project.

fortify-curl

Fortify claimed 843 possible buffer overflows. I quickly got bored trying to find even one that could lead to a problem. It turns out Fortify has a very short attention span and warns very easily on lots of places where a very quick glance by a human tells us there’s nothing to be worried about. Having hundreds and hundreds of these is really tedious and hard to work with.

If we’re kind we call them all false positives. But sometimes it is more than so, some of the alerts are plain bugs like when it warns on a buffer overflow on this line, warning that it may write beyond the buffer. All variables are ‘int’ and as we know sscanf() writes an integer to the passed in variable for each %d instance.

sscanf(ptr, "%d.%d.%d.%d", &int1, &int2, &int3, &int4);

I ended up finding and correcting two flaws detected with Fortify, both were cases where memory allocation failures weren’t handled properly.

LLVM, clang-analyzer

Given the exact same code base, clang-analyzer reported 62 potential issues. clang is an awesome and free tool. It really stands out in the way it clearly and very descriptive explains exactly how the code is executed and which code paths that are selected when it reaches the passage is thinks might be problematic.

clang-analyzer report - click for larger version

The reports from clang-analyzer are in HTML and there’s a single file for each issue and it generates a nice looking source code with embedded comments about which flow that was followed all the way down to the problem. A little snippet from a genuine issue in the curl code is shown in the screenshot I include above.

Coverity

Given the exact same code base, Coverity detected and reported 118 issues. In this case I got the report from a friend as a text file, which I’m sure is just one output version. Similar to Fortify, this is a proprietary tool.

coverity curl report - click for larger version

As you can see in the example screenshot, it does provide a rather fancy and descriptive analysis of the exact the code flow that leads to the problem it suggests exist in the code. The function referenced in this shot is a very large function with a state-machine featuring many states.

Out of the 118 issues, many of them were actually the same error but with different code paths leading to them. The report made me fix at least 4 accurate problems but they will probably silence over 20 warnings.

Coverity runs scans on open source code regularly, as I’ve mentioned before, including curl so I’ve appreciated their tool before as well.

Conclusion

From this test of a single source base, I rank them in this order:

  1. Coverity – very accurate reports and few false positives
  2. clang-analyzer – awesome reports, missed slightly too many issues and reported slightly too many false positives
  3. Fortify – the good parts drown in all those hundreds of false positives