Microsoft FOSS Fund Winner: curl

Hi Daniel,

My name is Emma Irwin, and I am a program manager at Microsoft, specifically working in the open source program’s office (OSPO). One of the programs I lead is the FOSS fund.

This may sound a bit odd, but normally I send an email notice to project winners, yet someone pointed out that I have missed notifying curl of their won (and I could not find any history of reaching out). As a result, I offer my sincere apologies now! – I will happily provide notice now, although the payments themselves have already started!

The Free and Open Source Software Fund (FOSS Fund) is a way for our employees to collectively select open source projects to receive $10,000 sponsorship awards throughout the year.

Microsoft’s engineers select projects they are super passionate about. Only employees who contribute to open source projects can participate in the selection process.

curl was selected in January for $10, 000.00 provided one month, for ten months through GitHub Sponsors.

Thanks very much and congratulations!

-Emma Irwin

Emma Irwin (she/her)
Senior Program Manager
Open Source Programs Office
Microsoft

Update

Discussed on reddit.

I don’t know who uses my code

When I (in spite of knowing better) talk to ordinary people about what I do for a living and the project I work on, one of the details about it that people have the hardest time to comprehend, is the fact that I really and truly don’t know a lot about who uses my code. (Or where. Or what particular features they use.)

I work on curl full-time and we ship releases frequently. Users download the curl source code from us, build curl and put it to use. Most of “my” users never tell me or anyone else in the curl project that they use curl or libcurl. This is of course perfectly fine and I probably could not even handle the flood if every user would tell me.

This not-knowing is a most common situation for Open Source authors and projects. It is not unique for me.

The not knowing your users is otherwise unusual in a world of products and software, and quite frankly, sometimes it is an obstacle for us as well since we lack a good way to communicate with users about plans, changes or ideas. It also makes it really hard to estimate our own success and the always-recurring question: how many users do you have?

To be fair: I do know quite a lot of users as well. But I don’t know how representative they are, nor how big fraction of the totals they consist of etc.

How to find out that someone uses your code

I wrote a section in Uncurled about How to find out that someone uses your code, and these are three main ways I learn about users of my code:

  1. A user of my project runs into an issue and asks for help in a project forum – without hiding where they come from or what product they make.
  2. While vanity-searching I find my project’s Open Source license mentioned for or bundled with a product or tool or device.
  3. A user emails me asking funny questions about a product I never heard of, and then I realize they found my email because that product uses my code and the license has my email in it.

curl is 8888 days old

Today on July 20 2022, curl turns exactly 8888 days old. It was born on March 20, 1998 when curl 4.0 was shipped.

The number 8 is considered a lucky number in several Asian cultures and I figure we can view this as a prequel to the planned curl version 8 release we intend to ship on curl’s 25th birthday.

Of course the precursors to curl existed well over a year before curl’s birthday, but let’s not get into that right now.

predef is our friend

For C programmers like me, who want to write portable programs that can get built and run on the widest possible array of machines and platforms, we often need to write conditional code. Code that use #ifdefs for particular conditions.

Such ifdefs expressions often need to check for a particular compiler, an operating system or perhaps even a specific version of of one those things.

Back in April 2002, Bjørn Reese created the predef project and started collecting this information on a page on Sourceforge. I found it a super useful idea and I have tried to contribute what I have learned through the years to the effort.

The collection of data has been maintained and slowly expanded over the years thanks to contributions from many friends.

The predef documents have turned out to be a genuine goldmine and a resource I regularly come back to time and time again, when I work on projects such as curl, c-ares and libssh2 and need to make sure they remain functional on a plethora of systems. I can only imagine how many others are doing the same for their projects. Or maybe should do the same…

Now, in July 2022, the team is moving the predef documents over to a brand new GitHub organization. The point being to making it more accessible and allow edits and future improvements using standard pull-requests to make ease the maintenance of them.

You find all the documentation here:

https://github.com/cpredef/predef

If you find errors, mistakes or omissions, we will of course be thrilled to see issues and pull requests filed.

5 years on OSS-Fuzz

On July 1st 2017, exactly five years ago today, the OSS-Fuzz project “adopted” curl into their program and started running fuzz tests against it.

OSS-Fuzz is a project run by Google and they do fuzzing on a large amount of open source projects: OSS-Fuzz aims to make common open source software more secure and stable by combining modern fuzzing techniques with scalable, distributed execution.

That initial adoption of curl into OSS-Fuzz was done entirely by Google themselves and its fuzzing integration was rough and not ideal but it certainly got the ball rolling.

Later in in the fall of 2017, Max Dymond stepped up and seriously improved the curl-fuzzer so that it would better test protocols and libcurl options deeper and to a higher degree. (Max subsequently got a grant from Google for his work.)

Fuzzing curl

Fuzzing is really the next-level thing all projects should have a go at when there is no compiler warning left to fix and the static code analyzers all report zero defects, because fuzzing has a way to find silly mistakes, off-by-ones etc in a way no analyzers can and that is very hard for human reviewers to match.

curl is a network tool and library written in C, it is meant to handle non-trusted data properly and doing it wrongly can have serious repercussions.

For the curl project, OSS-Fuzz has found several silly things and bad code paths especially after that rework Max put in. We have no less than 6 curl CVEs registered in 2017 and 2018 giving OSS-Fuzz (at least partial) credit for their findings.

In the first year or two, OSS-Fuzz truly kept us on our toes and we fixed numerous other flaws as well that weren’t CVE worthy but still bugs. At least 36 separate ones to date.

The beauty of OSS-Fuzz

For a little Open Source project such as curl, this service is great and in case you never worked with it let me elaborate what makes me enjoy it so much.

  1. Google runs all the servers and keeps the service running. We don’t have to admin it, spend energy on maintaining etc.
  2. They automatically pull the latest curl code from our master branch and keep fuzzing it. All day, every day, non-stop.
  3. When they find a problem, we get a bug submitted and an email sent. The bug report they create contains stack traces and a minimized and reproducible test case. In most cases, that means I can download a small file that is often less than a hundred bytes, and run my locally built fuzzer code with this input and see the same issue happen in my end.
  4. The bug they submit is initially closed for the public and only accessible to select team of curl developers.
  5. 90 days after the initial bug was filed, it is made public automatically.

The number of false positives are remarkably low, meaning that almost every time the system emails me, it spotted a genuine problem.

Fuzzing per commit

In 2020, the OSS-Fuzz team introduced CIFuzz, which allows us to run a little bit of fuzzing – for a limited time – on a PR as part of the CI pipeline. Using this, we catch the most obvious mistakes even before we merge the code into our master branch!

After the first few years of us fixing bad code paths, and with the introduction of CIFuzz, we nowadays rarely get reports from OSS-Fuzz anymore. And when we do, we have it complain on code that hasn’t been released yet. The number of CVEs detected in curl by OSS-Fuzz has therefore seemingly stopped at 6.

Taking fuzzing further

After five years of fuzzing, we would benefit from extending and improving the “hooks” where and how the fuzzing is made so that we can help it find and reach code paths with junk that it so far hasn’t reached. There are of course still potentially many undetected flaws in curl remaining because of the limited set of entry point we have for the fuzzer.

I hope OSS-Fuzz continues and lives on for a long time. It certainly helps curl a lot!

Thanks!