This pops up just a little over three years since we reached our first 1,000 forks. Also, 10,000 stars no too long ago.
A typical reason why people fork a project on GitHub, is so that they can make a change in their own copy of the source code and then suggest that change to the project in the form of a pull-request.
The curl project has almost 700 individual commit authors, which makes at least 2,300 forks done who still haven’t had their pull-requests accepted! Of course those are 700 contributors who actually managed to work all the way through to inclusion. We can imagine that there is a huge number of people who only ever thought about doing a change, some who only ever just started to do it, many who ditched the idea before it was completed, some who didn’t actually manage to implement it properly, some who got their idea and suggestion shut down by the project and of course, lots of people still have their half-finished change sitting there waiting for inspiration.
Then there are people who just never had the intention of sending any change back. Maybe they just wanted to tinker with the code and have fun. Some want to do private changes they don’t want to offer or perhaps they already know the upstream project won’t accept.
We just can’t tell.
Is 3,000 forks a lot or a little? Both. It is certainly more forks than we’ve ever had before in this project. But compared to some of the most popular projects on GitHub, even comparing to some other C projects (on GitHub the most popular projects are never written in C) our numbers are dwarfed by the really popular ones. You can probably guess which ones they are.
In the end, this number is next to totally meaningless as it doesn’t say anything about the project nor about what contributions we get or will get in the future. It tells us we have (or had) the attention of a lot of users and that’s about it.
I will continue to try to make sure we’re worth the attention, both now and going forward!
As some of you already found out, I’ve tried live-streaming curl development recently. If you want to catch previous and upcoming episodes subscribe on my twitch page.
For the fun of it. I work alone from home most of the time and this is a way for me to interact with others.
To show what’s going on in curl right now. By streaming some of my development I also show what kind of work that’s being done, showing that a lot of development and work are being put into curl and I can share my thoughts and plans with a wider community. Perhaps this will help getting more people to help out or to tickle their imagination.
For the feedback and interaction. It is immediately notable that one of the biggest reasons I enjoy live-streaming is the chat with the audience and the instant feedback on mistakes I do or thoughts and plans I express. It becomes a back-and-forth and it is not at all just a one-way broadcast. The more my audience interact with me, the more fun I have! That’s also the reason I show the chat within the stream most of the time since parts of what I say and do are reactions and follow-ups to what happens there.
I can only hope I get even more feedback and comments as I get better at this and that people find out about what I’m doing here.
And really, by now I also think of it as a really concentrated and devoted hacking time. I can get a lot of things done during these streaming sessions! I’ll try to keep them going a while.
I decided to go with twitch simply because it is an established and known live-streaming platform. I didn’t do any deeper analyses or comparisons, but it seems to work fine for my purposes. I get a stream out with video and sound and people seem to be able to enjoy it.
As of this writing, there are 1645 people following me on twitch. Typical recent live-streams of mine have been watched by over a hundred simultaneous viewers. I also archive all past streams on Youtube, so you can get almost the same experience my watching back issues there.
I announce my upcoming streaming sessions as “events” on Twitch, and I announce them on twitter (@bagder you know). I try to stick to streaming on European day time hours basically because then I’m all alone at home and risk fewer interruptions or distractions from family members or similar.
It’s not as easy as it may look trying to write code or debug an issue while at the same time explaining what I do. I learnt that the sessions get better if I have real and meaty issues to deal with or features to add, rather than to just have a few light-weight things to polish.
I also quickly learned that it is better to now not show an actual screen of mine in the stream, but instead I show a crafted set of windows placed on the output to look like it is a screen. This way there’s a much smaller risk that I actually show off private stuff or other content that wasn’t meant for the audience to see. It also makes it easier to show a tidy, consistent and clear “desktop”.
Streaming makes me have to stay focused on the development and prevents me from drifting off and watching cats or reading amusing tweets for a while
So far we’ve been spared from the worst kind of behavior and people. We’ve only had some mild weirdos showing up in the chat and nothing that we couldn’t handle.
Equipment and software
I do all development on Linux so things have to work fine on Linux. Luckily, OBS Studio is a fine streaming app. With this, I can setup different “scenes” and I can change between them easily. Some of the scenes I have created are “emacs + term”, “browser” and “coffee break”.
When I want to show off me fiddling with the issues on github, I switch to the “browser” scene that primarily shows a big browser window (and the chat and the webcam in smaller windows).
When I want to show code, I switch to “emacs + term” that instead shows a terminal and an emacs window (and again the chat and the webcam in smaller windows), and so on.
OBS has built-in support for some of the major streaming services, including twitch, so it’s just a matter of pasting in a key in an input field, press ‘start streaming’ and go!
The rest of the software is the stuff I normally use anyway for developing. I don’t fake anything and I don’t make anything up. I use emacs, make, terminals, gdb etc. Everything this runs on my primary desktop Debian Linux machine that has 32GB of ram, an older i7-3770K CPU at 3.50GHz with a dual screen setup. The video of me is captured with a basic Logitech C270 webcam and the sound of my voice and the keyboard is picked up with my Sennheiser PC8 headset.
The other day we noticed some curl test case failures, that only happened on macos and not on Linux. Curious!
The failures were detected in our unit test 1307, when testing a particular internal pattern matching function (Curl_fnmatch). Both targets run almost identical code but somehow they ended up with different results! Test cases acting differently on different platforms isn’t an extremely rare situation, but in this case it is just a pattern matching function and there’s really nothing timing dependent or anything that I thought could explain different behaviors. It piqued my interest, so I dug in.
The isalnum() return value
Eventually I figured out that the libc function isalnum(), when it got the 8 input value hexadecimal c3 (decimal 195), would return true on the macos machine and false on the box running Linux with glibc!
int value = isalnum(0xc3);
Setting LANG=C before running the test on macos made its isalnum() return false. The input became c3 because the test program has an UTF-8 encoded character in it and the function works on bytes, not “characters”.
Or in the words of the opengroup.org documentation:
The isalnum() function shall test whether c is a character of class alpha or digit in the program’s current locale.
It’s all documented – of course. It was just me not really considering the impact of this.
I don’t like different behaviors on different platforms given the same input. I don’t like having string functions in curl act differently depending on locale, mostly because curl and libcurl can very well be used with many different locales and I prefer having a stable fixed behavior that we can document and stand by. Also, the libcurl functionality has never been documented to vary due to locale so it would be a surprise (bug!) to users anyway.
We’ve now introduced a private version of isalnum() and the rest of the ctype family of functions for curl. Hopefully this will make the tests more stable now. And make our functions work more similar and independent of locale.
In order to ship a quality product – once every eight weeks – we need lots of testing. This is what we do to test curl and libcurl.
We have basic script that verifies that the source code adheres to our code standard. It doesn’t catch all possible mistakes, but usually it complains with enough details to help contributors to write their code to match the style we already use. Consistent code style makes the code easier to read. Easier reading makes less bugs and quicker debugging.
By doing this check with a script (that can be run automatically when building curl), it makes it easier for everyone to ship properly formatted code.
We have not (yet) managed to convince clang-format or other tools to reformat code to correctly match our style, and we don’t feel like changing it just for the sake of such a tool. I consider this a decent work-around.
The test suite that we bundle with the source code in the git repository has a large number of tests that test…
curl – it runs the command line tool against test servers for a large range of protocols and verifies error code, the output, the protocol details and that there are no memory leaks
libcurl – we then build many small test programs that use the libcurl API and perform tests against test servers and verifies that they behave correctly and don’t leak memory etc.
unit tests – we build small test programs that use libcurl internal functions that aren’t exposed in the API and verify that they behave correctly and generate the presumed output.
valgrind – all the tests above can be run with and without valgrind to better detect memory issues
“torture” – a special mode that can run the tests above in a way that first runs the entire test, counts the number of memory related functions (malloc, strdup, fopen, etc) that are called and then runs the test again that number of times and for each run it makes one of the memory related functions fail – and makes sure that no memory is leaked in any of those situations and no crash occurs etc. It runs the test over and over until all memory related functions have been made to fail once each.
Right now, a single “make test” runs over 1100 test cases, varying a little depending on exactly what features that are enabled in the build. Without valgrind, running those tests takes about 8 minutes on a reasonably fast machine but still over 25 minutes with valgrind.
Then we of course want to run all tests with different build options…
For every pull request and for every source code commit done, the curl source is built for Linux, mac and windows. With a large set of different build options and TLS libraries selected, and all the tests mentioned above are run for most of these build combinations. Running ‘checksrc’ on the pull requests is of course awesome so that humans don’t have to remark on code style mistakes much. There are around 30 different builds done and verified for each commit.
If any CI build fails, the pull request on github gets a red X to signal that something was not OK.
We also run test case coverage analyses in the CI so that we can quickly detect if we for some reason significantly decrease test coverage or similar.
We use Travis CI, Appveyor and Coveralls.io for this.
Independently of the CI builds, volunteers run machines that regularly update from git, build and run the entire test suite and then finally email the results back to a central server. These setups help us cover even more platforms, architectures and build combinations. Just with a little longer turn around time.
With millions of build combinations and support for virtually every operating system and CPU architecture under the sun, we have to accept that not everything can be fully tested. But since almost all code is shared for many platforms, we can still be reasonably sure about the code even for targets we don’t test regularly.
Static code analyzing
We run the clang scan-build on the source code daily and we run Coverity scans on the code “regularly”, about once a week.
We always address defects detected by these analyzers immediately when notified.
We’re happy to be part of Google’s OSS-fuzz effort, which with a little help with integration from us keeps hammering our code with fuzz to make sure we’re solid.
OSS-fuzz has so far resulted in two security advisories for curl and a range of other bug fixes. It hasn’t been going on for very long and based on the number it has detected so far, I expect it to keep finding flaws – at least for a while more into the future.
Fuzzing is really the best way to hammer out bugs. When we’re down to zero detected static analyzer detects and thousands of test cases that all do good, the fuzzers can still continue to find holes in the net.
Independently of what we test, there are a large amount of external testing going on, for each curl release we do.
In a presentation by Google at curl up 2017, they mentioned their use of curl in “hundreds of applications” and how each curl release they adopt gets tested more than 400,000 times. We also know a lot of other users also have curl as a core component in their systems and test their installations extensively.
We have a large set of security interested developers who run tests and fuzzers on curl at their own will.
I posted curl is C a few days ago and it raced on hacker news, reddit and elsewhere and got well over a thousand comments in those forums alone. The blog post has been read more than 130,000 times so far.
Addendum a few days later
Many commenters of my curl is C post struck down on my claim that most of our security flaws aren’t due to curl being written in C. It turned out into some sort of CVE counting game in some of the threads.
I think that’s missing the point I was trying to make. Even if 75% of them happened due to us using C, that fact alone would still not be a strong enough reason for me to reconsider our language of choice (at this point in time). We use C for a whole range of reasons as I tried to lay out there in spite of the security challenges the language brings. We know C has tricky corners and we know we are likely to do more mistakes going forward.
curl is currently one of the most distributed and most widely used software components in the universe, be it open or proprietary and there are easily way over three billion instances of it running in appliances, servers, computers and devices across the globe. Right now. In your phone. In your car. In your TV. In your computer. Etc.
If we then have had 40, 50 or even 60 security problems because of us using C, through-out our 19 years of history, it really isn’t a whole lot given the scale and time we’re talking about here.
Using another language would’ve caused at least some problems due to that language, plus I feel a need to underscore the fact that none of the memory safe languages anyone would suggest we should switch to have been around for 19 years. A portion of our security bugs were even created in our project before those alternatives you would suggest were available! Let alone as stable and functional alternatives.
This is of course no guarantee that there isn’t still more ugly things to discover or that we won’t mess up royally in the future, but who will throw the first stone when it comes to that? We will continue to work hard on minimizing risks, detecting problems early by ourselves and work closely together with everyone who reports suspected problems to us.
Number of problems as a measurement
The fact that we have 62 CVEs to date (and more will follow surely) is rather a proof that we work hard on fixing bugs, that we have an open process that deals with the problems in the most transparent way we can think of and that people are on their toes looking for these problems. You should not rate a project in any way purely based on the number of CVEs – you really need to investigate what lies behind the numbers if you want to understand and judge the situation.
Let me clarify this too: I can very well imagine a future where we transition to another language or attempt various others things to enhance the project further – security wise and more. I’m not really ruling anything out as I usually only have very vague ideas of what the future might look like. I just don’t expect it to be happening within the next few years.
These “you should switch language” remarks are strangely enough from the backseat drivers of the Internet. Those who can tell us with confidence how to run our project but who don’t actually show us any code.
What perhaps made me most sad in the aftermath of said previous post, is everyone who failed to hold more than one thought at a time in their heads. In my post I wrote 800 words on some of the reasoning behind us sticking to the language C in the curl project. I specifically did not say that I dislike certain other languages or that any of those alternative languages are bad or should be avoided. Please friends, I wrote about why curl uses C. There are many fine languages out there and you should all use them as much as you possibly can, and I will too – but not in the curl project (at the moment). So no, I don’t hate language XXXX. I didn’t say so, and I didn’t imply it either. Don’t put that label on me, thanks.
Every once in a while someone suggests to me that curl and libcurl would do better if rewritten in a “safe language”. Rust is one such alternative language commonly suggested. This happens especially often when we publish new security vulnerabilities. (Update: I think Rust is a fine language! This post and my stance here has nothing to do with what I think about Rust or other languages, safe or not.)
curl is written in C
The curl code guidelines mandate that we stick to using C89 for any code to be accepted into the repository. C89 (sometimes also called C90) – the oldest possible ANSI C standard. Ancient and conservative.
C is everywhere
This fact has made it possible for projects, companies and people to adopt curl into things using basically any known operating system and whatever CPU architecture you can think of (at least if it was 32bit or larger). No other programming language is as widespread and easily available for everything. This has made curl one of the most portable projects out there and is part of the explanation for curl’s success.
The curl project was also started in the 90s, even long before most of these alternative languages you’d suggest, existed. Heck, for a truly stable project it wouldn’t be responsible to go with a language that isn’t even old enough to start school yet.
Everyone knows C
Perhaps not necessarily true anymore, but at least the knowledge of C is very widespread, where as the current existing alternative languages for sure have more narrow audiences or amount of people that master them.
C is not a safe language
Does writing safe code in C require more carefulness and more “tricks” than writing the same code in a more modern language better designed to be “safe” ? Yes it does. But we’ve done most of that job already and maintaining that level isn’t as hard or troublesome.
We keep scanning the curl code regularly with static code analyzers (we maintain a zero Coverity problems policy) and we run the test suite with valgrind and address sanitizers.
C is not the primary reason for our past vulnerabilities
There. The simple fact is that most of our past vulnerabilities happened because of logical mistakes in the code. Logical mistakes that aren’t really language bound and they would not be fixed simply by changing language.
Of course that leaves a share of problems that could’ve been avoided if we used another language. Buffer overflows, double frees and out of boundary reads etc, but the bulk of our security problems has not happened due to curl being written in C.
C is not a new dependency
It is easy for projects to add a dependency on a library that is written in C since that’s what operating systems and system libraries are written in, still today in 2017. That’s the default. Everyone can build and install such libraries and they’re used and people know how they work.
A library in another language will add that language (and compiler, and debugger and whatever dependencies a libcurl written in that language would need) as a new dependency to a large amount of projects that are themselves written in C or C++ today. Those projects would in many cases downright ignore and reject projects written in “an alternative language”.
curl sits in the boat
In the curl project we’re deliberately conservative and we stick to old standards, to remain a viable and reliable library for everyone. Right now and for the foreseeable future. Things that worked in curl 15 years ago still work like that today. The same way. Users can rely on curl. We stick around. We don’t knee-jerk react to modern trends. We sit still in the boat. We don’t rock it.
Rewriting means adding heaps of bugs
The plain fact, that also isn’t really about languages but is about plain old software engineering: translating or rewriting curl into a new language will introduce a lot of bugs. Bugs that we don’t have today.
Not to mention how rewriting would take a huge effort and a lot of time. That energy can instead today be spent on improving curl further.
If I would start the project today, would I’ve picked another language? Maybe. Maybe not. If memory safety and related issues was the primary concern I had, then sure. But as I’ve mentioned above there are several others concerns too so it would really depend on my priorities.
At the end of the day the question that remains is: would we gain more than we would pay, and over which time frame? Who would gain and who would lose?
I’m sure that there will be or it may even already exist, curl and libcurl competitors and potent alternatives written in most of these new alternative languages. Some of them are absolutely really good and will get used and reach fame and glory. Some of them will be crap. Just like software always work. Let a thousand curl competitors bloom!
Will curl be rewritten at some point in the future? I won’t rule it out, but I find it unlikely. I find it even more unlikely that it will happen in the short term or within the next few years.
“Probably the only person in the whole of Sweden whose code is used by all people in the world using a computer /smartphone /ATM/ etc …every day.His contribution to the world is so large that it is impossible tounderstand the breadth.“
Thank you everyone who nominated me. I’m truly grateful, honored and humbled. You, my community, is what makes me keep doing what I do. I love you all!
To list “Sweden’s best developers” (the list and site is in Swedish) seems like a rather futile task, doesn’t it? Yet that’s something the Swedish IT and technology news site Techworld has been doing occasionally for the last several years. With two, three year intervals since 2008.
Everyone reading this will of course immediately start to ponder on what developers they speak of or how they define developers and how on earth do you judge who the best developers are? Or even who’s included in the delimiter “Sweden” – is that people living in Sweden, born in Sweden or working in Sweden?
I’m certainly not alone in having chuckled to these lists when they have been published in the past, as I’ve never seen anyone on the list be even close to my own niche or areas of interest. The lists have even worked a little as a long-standing joke in places.
It always felt as if the people on the lists were found on another planet than mine – mostly just Java and .NET people. and they very rarely appeared to be developers who actually spend their days surrounded by code and programming. I suppose I’ve now given away some clues to some characteristics I think “a developer” should posses…
This year, their fifth time doing this list, they changed the way they find candidates, opened up for external nominations and had a set of external advisors. This also resulted in me finding several friends on the list that were never on it in the past.
Tonight I got called onto the stage during the little award ceremony and I was handed this diploma and recognition for landing at second place in the best developer in Sweden list.
And just to keep things safe for the future, this is how the listing looks on the Swedish list page:
Yes I’m happy and proud and humbled. I don’t get this kind of recognition every day so I’ll take this opportunity and really enjoy it. And I’ll find a good spot for my diploma somewhere around the house.
I’ll keep a really big smile on my face for the rest of the day for sure!
(Photo from the award ceremony by Emmy Jonsson/IDG)
At times when I’ve gone out (yes it happens), faced an audience and talked about my primary spare time project curl, I’ve said a few times in the past that we have one billion users.
OK, as this is open source I’m talking about, I can’t actually count my users and what really constitutes “a user” anyway?
If the same human runs multiple copies of curl (in different devices and applications), is that human then counted once or many times? If a single developer writes an application that uses libcurl and that application is used by millions of humans, is that one user or are they millions of curl users?
What about pure machine “users”? In the subway in one of the world’s largest cities, there’s an automated curl transfer being done for every person passing the ticket check point. Yet I don’t think we can count the passing (and unknowing) passengers as curl users…
I’ve had a few people approach me to object to my “curl has one billion users” statement. Surely not one in every seven humans on earth are writing curl command lines! We’re engineers and we’re picky with the definitions.
Because of this, I’m trying to stop talking about “number of users”. That’s not a proper metric for a project whose primary product is a library that is used by applications or within devices. I’m instead trying to assess the number of humans that are using services, tools or devices that are powered by curl. Fun challenge, right?
Who isn’t using?
I’ve tried to imagine of what kind of person that would not have or use any piece of hardware or applications that include curl during a typical day. I certainly can’t properly imagine all humans in this vast globe and how they all live their lives, but I quite honestly think that most internet connected humans in the world own or use something that runs my code. Especially if we include people who use online services that use curl.
curl is used in basically all modern TVs, a large percentage of all car infotainment systems, routers, printers, set top boxes, mobile phones and apps on them, tablets, video games, audio equipment, Blu-ray players, hundreds of applications, even in fridges and more. Apple alone have said they have one billion active devices, devices that use curl! Facebook uses curl extensively and they have 1.5 billion users every month. libcurl is commonly used by PHP sites and PHP empowers no less than 82% of the sites w3techs.com has figured out what they run (out of the 10 million most visited sites in the world).
There are about 3 billion internet users worldwide. I seriously believe that most of those use something that is running curl, every day. Where Internet is less used, so is of course curl.
Every human in the connected world, use something powered by curl every day
It is an amazing feeling when I stop and really think about it. When I pause to let it sink in properly. My efforts and code have spread to almost every little corner of the connected world. What an amazing feat and of course I didn’t think it would reach even close to this level. I still have hard time fully absorbing it! What a collaborative success story, because I could never have gotten close to this without the help from others and the community we have around the project.
But it isn’t something I think about much or that make me act very different in my every day life. I still work on the bug reports we get, respond to emails and polish off rough corners here and there as we go forward and keep releasing new curl releases every 8 weeks. Like we’ve done for years. Like I expect us and me to continue doing for the foreseeable future.
It is also a bit scary at times to think of the massive impact it could have if or when a really terrible security flaw is discovered in curl. We’ve had our fair share of security vulnerabilities so far through our history, but we’ve so far been spared from the really terrible ones.
So I’m rich, right?
If I ever start to describe something like this to “ordinary people” (and trust me, I only very rarely try that), questions about money is never far away. Like how come I give it away free and the inevitable “what if everyone using curl would’ve paid you just a cent, then…“.
I’m sure I don’t need to tell you this, but I’ll do it anyway: I give away curl for free as open source and that is a primary reason why it has reached to the point where it is today. It has made people want to help out and bring the features that made it attractive and it has made companies willing to use and trust it. Hadn’t it been open source, it would’ve died off already in the 90s. Forgotten and ignored. And someone else would’ve made the open source version and instead filled the void a curlless world would produce.
It is time for our annual survey on how you use curl and libcurl. Your chance to tell us how you think we’ve done and what we should do next. The survey will close on midnight (central European time) May 27th, 2016.
If you use curl or libcurl from time to time, please consider helping us out with providing your feedback and opinions on a few things: