curl is C

Every once in a while someone suggests to me that curl and libcurl would do better if rewritten in a “safe language”. Rust is one such alternative language commonly suggested. This happens especially often when we publish new security vulnerabilities. (Update: I think Rust is a fine language! This post and my stance here has nothing to do with what I think about Rust or other languages, safe or not.)

curl is written in C

The curl code guidelines mandate that we stick to using C89 for any code to be accepted into the repository. C89 (sometimes also called C90) – the oldest possible ANSI C standard. Ancient and conservative.

C is everywhere

This fact has made it possible for projects, companies and people to adopt curl into things using basically any known operating system and whatever CPU architecture you can think of (at least if it was 32bit or larger). No other programming language is as widespread and easily available for everything. This has made curl one of the most portable projects out there and is part of the explanation for curl’s success.

The curl project was also started in the 90s, even long before most of these alternative languages you’d suggest, existed. Heck, for a truly stable project it wouldn’t be responsible to go with a language that isn’t even old enough to start school yet.

Everyone knows C

Perhaps not necessarily true anymore, but at least the knowledge of C is very widespread, where as the current existing alternative languages for sure have more narrow audiences or amount of people that master them.

C is not a safe language

Does writing safe code in C require more carefulness and more “tricks” than writing the same code in a more modern language better designed to be “safe” ? Yes it does. But we’ve done most of that job already and maintaining that level isn’t as hard or troublesome.

We keep scanning the curl code regularly with static code analyzers (we maintain a zero Coverity problems policy) and we run the test suite with valgrind and address sanitizers.

C is not the primary reason for our past vulnerabilities

There. The simple fact is that most of our past vulnerabilities happened because of logical mistakes in the code. Logical mistakes that aren’t really language bound and they would not be fixed simply by changing language.

Of course that leaves a share of problems that could’ve been avoided if we used another language. Buffer overflows, double frees and out of boundary reads etc, but the bulk of our security problems has not happened due to curl being written in C.

C is not a new dependency

It is easy for projects to add a dependency on a library that is written in C since that’s what operating systems and system libraries are written in, still today in 2017. That’s the default. Everyone can build and install such libraries and they’re used and people know how they work.

A library in another language will add that language (and compiler, and debugger and whatever dependencies a libcurl written in that language would need) as a new dependency to a large amount of projects that are themselves written in C or C++ today. Those projects would in many cases downright ignore and reject projects written in “an alternative language”.

curl sits in the boat

In the curl project we’re deliberately conservative and we stick to old standards, to remain a viable and reliable library for everyone. Right now and for the foreseeable future. Things that worked in curl 15 years ago still work like that today. The same way. Users can rely on curl. We stick around. We don’t knee-jerk react to modern trends. We sit still in the boat. We don’t rock it.

Rewriting means adding heaps of bugs

The plain fact, that also isn’t really about languages but is about plain old software engineering: translating or rewriting curl into a new language will introduce a lot of bugs. Bugs that we don’t have today.

Not to mention how rewriting would take a huge effort and a lot of time. That energy can instead today be spent on improving curl further.

What if

If I would start the project today, would I’ve picked another language? Maybe. Maybe not. If memory safety and related issues was the primary concern I had, then sure. But as I’ve mentioned above there are several others concerns too so it would really depend on my priorities.

Finally

At the end of the day the question that remains is: would we gain more than we would pay, and over which time frame? Who would gain and who would lose?

I’m sure that there will be or it may even already exist, curl and libcurl competitors and potent alternatives written in most of these new alternative languages. Some of them are absolutely really good and will get used and reach fame and glory. Some of them will be crap. Just like software always work. Let a thousand curl competitors bloom!

Will curl be rewritten at some point in the future? I won’t rule it out, but I find it unlikely. I find it even more unlikely that it will happen in the short term or within the next few years.

Discuss this post on Hacker news or Reddit!

Followup-post: Yes, C is unsafe, but…

29 thoughts on “curl is C”

    1. I wouldn’t call it “good enough” but I’ll agree that for other languages that’s an interesting “middle way” that might not require too much dependencies to get installed everywhere but generated code are usually (much) harder to read than carefully crafted hand-written code and debugging such code bases where nobody is truly familiar with the C code (since all developers would use the higher level language) would also be far from easy.

      1. But you need to be able to debug it everywhere, so if the source isn’t there you debug the generated code. I’ve worked on some projects that used generated code in my days, and maybe they were bad examples but they did not reach the level of comfort for me as a “run the real language” do.

      1. GCC and Clang do, so that really just means that its Windows lagging behind. Its a bit disingenuous to say “not even Windows” as if Microsoft are always first to the table with this.

      2. I said “not even windows”, because that’s a huge platform and their primary C compiler didn’t support it until recently. They do now. And sure, gcc and clang do and were available, but limiting builds on windows to those compilers is very restrictive.

  1. I just poked around at a few source files for curl really quick, and the code quality looks pretty good. It doesn’t fall into the C trap where there’s this idea that variable names have to be as short as possible. When they are shortened, they are variable names that most C programmers would understand like “len” and “ptr”

    If the code base was a mess to where I couldn’t figure things you, then yes, you should refactor or rewrite. For code like this, I don’t think it’s necessary, and I don’t blame anyone for acting conservatively.

  2. Ultimately no new developer will be able (or wish) to maintain the project without jeopardizing security: more architectures to address, e.g. ARM64, and fewer skills on low level memory management. Meaning less and less workforce to implement new features.

    In theory good automated tests should allow a painless refactoring (yes I do agree with Uncle Bob), and by extension rewriting it to a safe language.

  3. Great article.

    I too really like C, but because it allows for a lot of unsafe behavior, you have to treat it differently than other languages. It’s a trade-off whenever you want to work close to bare metal. However especially in open source with many eyes on the project, I think its a good choice even if you had to rewrite today.

  4. Part of the reason I started writing libdynvol in C is because I saw how curl was written and I thought to myself, “Look at this. Look at how beautiful and elegant the code is. I can see why everyone and their grandma’s dog’s fleas use curl now.”

    Recently, though, I’ve come to the realization that if I’m going to ever finish this project without losing what little bit of my mind I have left, I need to do the bulk of it in C++, not C.

  5. I would have thought C is exactly the right language to choose for something like Curl. It is when I see CRUD apps written in C that I start to worry.

  6. I don’t think it is given that translating your code produces more bugs. Depends on the process and the language you are going to. I rewrote a program from Objective-C to Swift and while not without problems it made me catch quite a lot of bugs without even running the program. The compiler caught all sorts of issues that the Objective-C compiler couldn’t.

  7. You might consider SaferCPlusPlus[1]. It allows you to achieve memory safety by replacing C’s unsafe elements (pointers and arrays mostly) with safe, compatible substitutes. It requires a modern C++ compiler, but doesn’t require the adoption of any C++ paradigms. Your code could remain mostly unchanged.

    Adopting SaferCPlusPlus is straightforward, involving what might be described as a glorified “search-and-replace” action. A tool to mostly automate the task is in early development. But even doing it manually would be much less effort than a rewrite in another language.

    Some other features:
    – no additional dependencies, the library is basically a bunch of header files written in pure, portable C++
    – can be “disabled” (i.e. elements aliased back to their native counterparts) with a compile-time directive
    – can be adopted completely incrementally (i.e. partially converted code continues to build and run just fine (and receive the corresponding partial safety benefit))
    – minimal runtime overhead [2]

    [1] https://github.com/duneroadrunner/SaferCPlusPlus
    [2] https://github.com/duneroadrunner/SaferCPlusPlus#simple-benchmarks

  8. It would be really wonderful if curl could implement a [potentially optional compile-time flag for a] “jail” mechanism to increase security.

    These seem to work quite well.

    http://undeadly.org/cgi?action=article&sid=20151013161745

    “First of all, on the positive side, privileges separation, chrooting and the message passing design have proven fairly efficient at protecting us from a complete disaster. [The] Worst attacks resulted in [the] unprivileged process being compromised, the privileged process remained untouched, so did the queue process which runs as a separate user too, preventing data loss… This is good news, we’re not perfect and bugs will creep in, but we know that these lines of defense work, and they do reduce considerably how we will suffer from a bug, turning a bug into a nuisance rather than a full catastrophe. No root were harmed during this audit as far as we know.”

    This could be done by installing curl setuid to root. Upon launch, curl forks, and the network-facing process could chroot() to /var/empty, then setuid() to nobody.

    The pair of curl processes then coordinate the activity, and the process touching the user’s files doesn’t directly access the network.

    This would be a “nice to have” option that might get picked up by many distros that repackage your work. It’s certainly a popular configure option for OpenSSH.

    1. Feel free and welcome to join in and drive the discussion for that within the project! But libcurl is the larger product we provide and that’s a library and I don’t see how we can practically make that run in any sort of jail for users…

  9. “Everyone knows C” This is plain wrong. A lot of programmers believe they know C, but actually they don’t know anything about it. (Hence the lot of bugs produced, and if they really knew it, they’d choose another programming language).

    “C is not a safe language” This is wrong too. Granted, there are a few features that are unsafe in C, but it is the C implementations that are unsafe actually, providing unsound and unchecked environments.

  10. I completely sympathise. I write scientific and engineering applications. The bulk of our code is written in C – yes even the front end user interfaces and logic (it is a pain to develop and maintain). If we were doing it from scratch today we’d probably look at something like C++ or Java for the GUI, house keeping and front-end systems. But the project has been going for nearly 30 years’ now and it would be crazy to start thinking of migrating to another language. Besides, we’d still probably use C for all the numerical implementations and quant libraries.

    You make a good point regarding trends too. Originally all our 3D rendering was done in software and later OpenGL. Vulkan looks as if the API will be written in C too. Writing bindings for another language would be an absolute pain to develop and maintain. In short, C is still the go to language for such systems, so using C in a way future-proofs your product.

  11. C89 is a withdrawn standard. There is no reason to write code in it for user space applications.
    C99 and C11 were and are well portable. I have a C99, C11 and C++14 codebase, and all of the code is easily buildable for all major desktop platforms, including Windows (using mingw, previously with gcc, and now with clang).

    1. Are you aware that HP-UX continues to ship a K&R compiler for C in the base operating system, that is not ANSI-C compliant, let alone C99?

      K&R-standard code continues to be the only allowed input in many environments. We aspire to ANSI-C in many cases, and we fail to reach it. I’m still finding bugs in mission-critical K&R C.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.