Category Archives: Development

UndefinedBehaviorSanitizer’s unexpected behavior

October 17, 2024 Daniel Stenberg 3 Comments

The transition from Ubuntu 22 to 24 for ubuntu-latest on GitHub actions started recently with the associated version bumps of a lot of applications. As expected.

One of the version bumps is for clang: it now uses clang 18 by default. clang 18 introduced some changes that turned out to be relevant for me and other curl developers. Yeah, surely for some others as well.

clang and gcc

In my daily developer life I just typically use gcc for building local stuff – mostly out of old habits. I rebuild and test curl dozens of times every day. In my normal work process I use a couple of different build combinations that enable a lot of third party dependencies and I almost always build curl and libcurl with debug enabled and only statically. It is a debug-friendly setup.

Of course I also have clang installed so that I can try out building with it when I want to, and I have a large set of alternative config setups that I use when I have a particular reason to check or debug such a build.

CI to the rescue

There are literally many millions of build combinations of curl, and we do some of the most important ones automatically for every pull request and commit in the source repository. They help us avoid regressions. Currently we do almost two hundred different jobs.

Two of those CI jobs build curl using clang and enable some sanitizers: address, memory, undefined and signed-integer-overflow and use those builds to run through the test suite to help us verify that everything still looks fine.

Since it gets done in the CI for every change, I don’t have to run it myself locally very often. We have thus been using the default clang version shipped in Ubuntu 22.04 for this for quite some time now.

Undefined behavior sanitizer

When the clang version for the Ubuntu jobs on GitHub was bumped up to version 18, the undefined behavior sanitizer job suddenly found plenty of new problems in curl.

In code that had been running without problems for a long time (decades in some cases) on countless systems and on almost every imaginable architecture. Unexpected.

Picky function prototypes

Here is the reason:

The sanitizer now keeps track of exactly how a function pointer prototype is declared and verifies that the function actually called via that pointer is using an identical prototype.

This is generally probably a good idea and a sound sanity check for most programs but since the checker insists on identical prototypes, I believe it goes beyond what is undefined behavior – I believe some discrepancies are handled just fine. For example a signed vs unsigned pointer or void vs char pointers. I am however not a compiler developer and neither am I an expert in the C language specifications so maybe I am wrong.

Example

A function pointer defined to call a function that returns a void and gets a single char pointer input

void (*name)(char *ptr);
typedef void (*name_func)(char *ptr);

Such a pointer can be made to point to a function that instead takes a void pointer:

void target(void *ptr)
{
   printf("Input %p\n", ptr);
}

This construct works perfectly fine in C:

char *data = "string";
name = (name_func)target;
name(data);

In libcurl we set function pointers (callbacks) via a setopt() style function, which cannot validate the pointer at compile time.

When the code example above is tested with the undefined behavior sanitizer and its -fsanitize=function check (I believe), it complains about the mismatching prototypes between the pointer and the actually called function.

How this became annoying

For the example above, the sanitizer report is most welcome, even if I think it goes beyond what is actually undefined behavior. It helps us clean up the code.

For libcurl, we have a CURL * type returned for a handle from curl_easy_init(). This handle is used as an input argument to multiple functions and it is also used as an input argument to several callbacks an application can tell libcurl to call, etc.

That type was originally done like this:

typedef void CURL;

But in 2016, we changed it to instead become:

#if defined(BUILDING_LIBCURL)
typedef struct Curl_easy CURL;
#else
typedef void CURL;
#endif

This made us get a more descriptive pointer for the type when we build libcurl. For convenience.

The function pointer is defined internally for libcurl as a struct pointer, but outside in the application land as a void pointer. This works great.

Until this new sanitizer check. Now it complaints loudly because the prototypes for the function called does not match the prototype for the function pointer. The struct vs the void pointers. The sanitizer stores and uses “resolved” typedefs in its checks, not the name of the types visible in code.

The fix

Since we can’t have build breakage in the CI jobs, I fixed this.

We are back to how we did it in the past. With a plain

typedef void CURL;

… even when we build libcurl. To make sure the pointer and the final function have the same prototypes. To hush up the undefined behavior sanitizer.

This is now in master and how the code in the pending curl 8.11.0 release will look.

Disabling the check is not enough

While we could disable this particular check in our CI jobs, that would not suffice since we want everyone to be able to run these tools against curl without any warnings or errors.

We also want application authors in general who use libcurl to be able to similarly run this sanitizer against their tools to not get error reports like this.

Is this a clang issue?

Maybe. I just can’t see how this could happen by mistake, and since it is a feature that has existed for quite a while now already I have not bothered to submit an issue or have any argument or discussion with the clang team. I have simply accepted that this is the way they want to play this and adapt accordingly.

A historic footnote

In 2016 I wanted to change the type universally to just

typedef struct Curl_easy CURL;

… as I thought we could do that without breaking neither API nor ABI. I still believe I was right, but the change still caused an “uproar” among some users who had already built code and done things based on the assumption that it was and would always remain a void pointer. Changing the type thus caused build errors at places to a level that made us retract that change and revert back to the #ifdef version showed above.

And now we had to retract even the #ifdef and thus we are back to the pre 2016 way.

Post-publish update

It has been pointed out to me that the way the C standard is phrased, this tool seems to be correct. More discussions around that can be found in a long OpenSSL issue from last year.

cURL and libcurl, Development

Why curl closes PRs on GitHub

June 11, 2024 Daniel Stenberg 23 Comments

Contributors to the curl project on GitHub tend to notice the above sequence quite quickly: pull requests submitted do not generally appear as “merged” with its accompanying purple blob, instead they are said to be “closed”. This has been happening since 2015 and is probably not going to change anytime soon.

Let me explain why this happens.

I blame GitHub

GitHub’s UI does not allow us to review or comment on commit messages for pull requests. Therefore, it is hard to insist on contributors to provide the correct message, using the proper language in the correct format.

If you make a pull request based on a single commit, the initial PR message is based on the commit message but when follow-up fixes are done and perhaps force-pushed, the PR message is not updated accordingly with the commit message’s updates.

Commit messages with style

I believe having good commit messages following a fixed style and syntax helps the project. It makes the git history better and easier to browse. It allows us to write tools and scripts around git and the git history. Like how we for example generate release notes and project stat graphs based on git log basically.

We also like and use a strictly linear history in curl, meaning that all commits are rebased on the master branch. Lots of the scripting mentioned above depends on this fact.

Manual merges

In order to make sure the commit message is correct, and in fact that the entire commit looks correct, we merge pull requests manually. That means that we pull down the pull request into a local git repository, clean up the commit message to adhere to project standards.

And then we push the commit to git. One or more of the commit messages in such a push then typically contains lines like:

Fixes #[number] and Closes #[number]. Those are instructions to GitHub and we use them like this:

Fixes means that this commit fixed an issue that was reported in the GitHub issue with that id. When we push a commit with that instruction, GitHub closes that issue.

Closes means that we merged a pull request with this id. (GitHub has no way for us to tell it that we merged the pull request.) This instruction makes GitHub closes the corresponding pull request: “[committer] closed this in [commit hash]”.

We do not let GitHub dictate how we do git. We use git and expect GitHub to reflect our git activity.

We COULD but we won’t

We could in theory fix and cleanup the commits locally and manually exactly the way we do now and then force-push them to the remote branch and then use the merge button on the GitHub site and then they would appear as “merged”.

That is however a clunky, annoying and time-consuming extra-step that not only requires that we (always) push code to other people’s branches, it also triggers a whole new round of CI jobs. This, only to get a purple blob instead of a red one. Not worth it.

If GitHub would allow it, I would disable the merge button in the GitHub PR UI for curl since it basically cannot be used correctly in the project.

Squashing all the commits in the PR is also not something we want since in many cases the changes should be kept as more than one commit and they need their own dedicated and correct commit message.

What GitHub could do

GitHub could offer a Merged keyword in the exact same style as Fixed and Closes, that just tells the service that we took care of this PR and merged it as this commit. It’s on me. My responsibility. I merged it. It would help users and contributors to better understand that their closed PR was in fact merged as that commit.

It would also have saved me from having to write this blog post.

Discussion

Hacker news

Addendum

In some post-publish discussions I have seen people ask about credits. This method to merge commits does not break or change how the authors are credited for their work. The commit authors remain the commit authors, and the one doing the commits (which is I when I do them) is stored separately. Like git always do. Doing the pushes manually this way does in no way change this model. GitHub will even count the commits correctly for the committer – assuming they use an email address their GitHub account does (I think).

cURL and libcurl, Development

curl’s use of many CI services

February 1, 2023 Daniel Stenberg

In the beginning and for many years, the curl project used no CI services at all. It instead used a distributed build and test systems where volunteers ran machines that pulled the latest code repeatedly, built curl, ran the tests and reported back the results to a central server.

One

In 2013, the year curl turned 15, we created our first CI jobs on Travis CI. With only a single CI service life was easy for a few years.

Two, three

This single service had a limited feature set and in particular a limited set of supported platforms. To also do automatic testing on FreeBSD and Windows we had to use two additional services because Travis did not support them. Now they were three, early 2019. Cirrus CI and AppVeyor.

Four

When we use free services, we need to live with the limitations of what the good providers offer for free or at low cost. In the case of CI services, they tend to reduce CPU time and parallelisms for users of the free tier and so did Travis.

When the number of CI jobs on Travis surpassed 30, and we had already gotten a small performance boost just because of their good will, we created the next few new CI jobs on GitHub Actions instead to increase the parallelism for no extra money. If I recall things correctly, the macOS support was also much better on GitHub since it was rather limited on Travis.

GitHub later graciously bumped our service level for even more power and parallelism. Increased parallelism, not the least thanks to the use of several independent CI services, made sure that the complete set of CI jobs would still complete within a reasonable time.

Five

When working on extending and improving our Windows CI testing in late 2019, our previous Windows CI provider AppVeyor was not good enough so we opted to add jobs on Azure Pipelines. This was also because GitHub Actions could not run the images we have and wanted to use for this purpose.

Redundancy

When we entered the year 2020 we were at 60 CI jobs and having them run on several different CI services often turned out useful when one of them acted up: at least a lot of other jobs would still work and help us assess and verify proposed changes. No all eggs in the same basket problem.

Services come and go

Redundancy also helps soften the blow when a service goes away. If you are in the race long enough, all services will go away or go sour eventually. This includes CI services.

In 2021, Travis CI changed their policies and suddenly we could not keep using them unless we paid up a few K USD per year and we would rather avoid that.

We had to move the 30+ CI jobs from Travis to something else. Thanks to a generous offer, volunteers showed up and helped transition the Travis jobs over to a new service: Zuul CI. It softened the repercussions from the “jump” and the CI jobs kept helping us ship quality code.

Five, Six

To manage the Travis CI eviction, Zuul took over most of the curl CI jobs and a few of them were added on Circle CI, which then appeared as CI service number six. Primarily because of their at the time early and convenient support for arm.

Zuul CI

We were grateful for the help we got to move over to Zuul from Travis, but soon it became apparent to us that Zuul CI is more “crude” than some of the other services and it left us wanting more. It’s UI is way less sophisticated, to the level that it is almost difficult for a casual PR submitted to read and understand build errors. Also, it was slightly buggy, which could result in Zuul jobs not showing up in the GitHub UI at all or simply failing to trigger the new jobs. When the responses from the Zuul side to our problems were somewhere between slow to non-existent I felt with had no other choice but to transition away from this service as well.

The change took its time. At the end of 2021 we had 30 CI jobs on Zuul, and just days ago in late January 2023, we removed the final curl jobs from it.

Five

We use five services now and we could possibly consolidate down to four if we really wanted to, but I see no reason to do that now when things are working and huffing along.

GitHub Actions have really taken off as our primary CI service and now runs almost half of the entire set. Thanks to it being convenient, well integrated, well documented and us having good parallelism on it.

We do what we need

Whatever is good for the project we will consider doing. We have gotten to this point with this set of CI services because they help the project. If someone proposes a change that improve things and that change reduces the number of CI services, then we might go that way next. Or maybe we add one? We have not planned what comes next.

What we run in CI

We build curl and run tests with numerous different build configurations on several architectures on different operating systems. With and without debug enabled. With and without using valgrind. Most builds also run checksrc , which verifies source code style.
We run dedicated jobs that do “deeper” testing, such as building with address and undefined behavior-analyzers and running the complete curl test suite in “torture mode”.
We run markdown and man page spell checkers
We run English prose checking (using proselint) of markdown files
We run static code analyzers and fuzzers
We confirm the copyright and license situation of all files in git
We verify links within markdowns
We have a few “bot services” that can set the “hacktoberfest-accepted” label, and a labeler service that tries to automatically set proper categories for pull requests.
We verify that the release tarball looks right and works when generated from the current set of files in git
… and probably a few other things I have forgot now

Of course we have graphs

These graphs were screenshotted from the dashboard on February 1st, 2023.

The total number of CI jobs done for each PR and commit, over time

Number of CI jobs running on which CI service, over time

Future

Whatever helps the project and whatever someone offers to help us make that happen, we might do. That may mean using more services, it might mean using less.

The important part is that these services are used to improve and strengthen the curl project and the products we ship.

Development, Open Source

Uncurled – the presentation

August 18, 2022 Daniel Stenberg

Uncurled – everything I know and learned about running and maintaining Open Source projects for three decades.

This is me, doing a live English-speaking presentation/webinar on these topics that I cover in my book: Uncurled.

Recording

Date: Tuesday August 23, 2022

Time: 10: 00 UTC (12:00 CEST)

Where: over zoom [~~Sign up~~]

The plan is to record this session and make it available after the fact on YouTube. This post will be updated with a link to that once it exists.

Agenda

Here’s the outlook on what I hope to be able to cover in a 40 minutes talk.

This will be followed by a Q&A-session with me answering any questions you might have. Feel most welcome and encouraged to submit your questions ahead of time if you already have some! (comment here, email me, comment or DM on Twitter, send a carrier pigeon, anything!)

I have not done this presentation before. I know the subject very intimately so I have no worries about that. The timing of the thing is what is going to be my bigger challenge I think. I aim for no more than 40 minutes of me blabbing.

cURL and libcurl, Development

How I merge PRs in curl

August 8, 2022 Daniel Stenberg 3 Comments

The preferred method of providing changes to the curl project, be it source code, documentation or web site contents, is by submitting a pull-request. A “PR”. On the curl repository on GitHub.

When a proposed curl change, bugfix or improvement is submitted as a PR on GitHub, it gets built, checked, tested and verified in countless ways and a few hundred developers get a notification about it.

The first thing I do every morning and the last thing I do every night before I go to bed, is eyeing through the list of open PRs, and especially the ones with recent activities. I of course also get back to them during the day if there is activity that might need my attention – not to mention that I also personally submit by own PRs to curl at an average rate of a few per day.

You can rightly say that my day and my work with curl is extremely PR centered. And by extension, also extremely git and GitHub focused.

Merging the work

When all the CI checks run green and all review comments and concerns have been addressed, the PR has become ripe for merge.

As a general rule, we always work in the master branch as the current development branch that is destined to become the next release when the time comes. An approved PR thus gets merged into the master branch and pushed there. We keep the git history linear and clean, meaning we do merge + rebase before pushing.

Commit messages

One of the key properties of a git commit, is the commit message. The text that describes the particular change in the repository.

In the curl project we have a standard style or template for how to write these messages. This makes them consistent and helps making them useful and get populated with informational meta-data that future versions of ourselves might appreciate when we look back at the changes decades into the future.

The PR review UI on GitHub unfortunately totally ignores commit messages, which means that we cannot comment on them or insist or check that users write them in our preferred style. Arguably, it is also just quicker and easier not to do that and instead just gently brush up the message in the commit locally before the commit is pushed.

Adding correct fixes #xxx and closes #xxx information to the message also helps making sure GitHub closes the associated issues/pull-requests and allows us to map git repository commits with GitHub activities at will – like when running statistic scripts etc.

This strictness and way of working makes it impossible to use the “merge” button in the GitHub UI since we simply cannot ensure that the commit is good enough with it. The button is however not possible to remove from the UI with any settings. We just prohibit its use.

They appear as “closed this in”

This way of working makes most PRs appear as “closed this in #[commit hash]” in the GitHub UI instead of saying “merged by” that same hash. Because in GitHub’s view, the PR was not merged.

GitHub could fix this with a merges keyword, which has been suggested since forever, but clearly they do not see this as a problem.

Occasional users of course get surprised or even puzzled by us closing the PR “instead of merging”, because that is what it looks like. When it really is not.

Ultimately, I consider the state and contents of our git repository and history more important than how the PRs appear on GitHub so I stick to our way of working.

Signed commits

As a little side-note, I of course also always GPG sign my commits so you can verify on GitHub and with git that my commits are genuinely done by me.

Stats

To give you an idea of the issue and pull request frequency and management in curl (the source code repository). Snapshots from the curl dashboard from August 8, 2022.

Update

Several people have mentioned to me after I posted this article that I could squash and edit the commits and force-push the work back to the user’s branch used for the PR before using the “merge” button or auto-merge feature of GitHub.

Sure, I could do it that way, but I think that is an even worse solution and a poor work-around. It is very intrusive method that would make me force-push in every user’s PR branch and I am certain a certain amount of users will be surprised by that and some even downright upset because it removes/overwrites their work. For the sake of working around a GitHub quirk. Nope, I will not do that.

Development

predef is our friend

July 13, 2022 Daniel Stenberg

For C programmers like me, who want to write portable programs that can get built and run on the widest possible array of machines and platforms, we often need to write conditional code. Code that use #ifdefs for particular conditions.

Such ifdefs expressions often need to check for a particular compiler, an operating system or perhaps even a specific version of of one those things.

Back in April 2002, Bjørn Reese created the predef project and started collecting this information on a page on Sourceforge. I found it a super useful idea and I have tried to contribute what I have learned through the years to the effort.

The collection of data has been maintained and slowly expanded over the years thanks to contributions from many friends.

The predef documents have turned out to be a genuine goldmine and a resource I regularly come back to time and time again, when I work on projects such as curl, c-ares and libssh2 and need to make sure they remain functional on a plethora of systems. I can only imagine how many others are doing the same for their projects. Or maybe should do the same…

Now, in July 2022, the team is moving the predef documents over to a brand new GitHub organization. The point being to making it more accessible and allow edits and future improvements using standard pull-requests to make ease the maintenance of them.

You find all the documentation here:

https://github.com/cpredef/predef

If you find errors, mistakes or omissions, we will of course be thrilled to see issues and pull requests filed.

Development, IT Politics, Open Source

Meeting the Cyber Safety Review Board

May 5, 2022 Daniel Stenberg 1 Comment

Three Open Source hackers were invited to this meeting with the CSRB and I was one of them.

The board with this name is part of CISA, a US government effort that received a presidential order to work on “Improving the Nation’s Cybersecurity“. Where “the Nation” here is the US.

I’m not in the US and I’m not a US citizen but I felt I should help out when asked and I was able to.

On April 21 2022, I joined the video meeting together with an OpenSSL and a Tomcat contributor and several members of the board. (I am not naming any names of participants in this post because I have not asked for permission nor do I think the names are important here.)

For about an hour we talked to the board how we develop Open Source, how we take on security problems and how we work on making sure we do things as securely as we can. It was striking how similarly the three of us looked at the issues and how we work in our project, despite our projects all being different and having our own specifics.

As projects, we believe we have pretty well-established and working procedures for getting problems reported and we think we fix the issues fairly swiftly. We ship fixes, advisories and updates not long after the issues get known. The CVE system where we register and publish security vulnerabilities in a global registry is working adequately. (I’m not saying things are perfect.)

The main problem

It was pretty clear to me that we agreed that the biggest problem in the Open Source supply chain today is the slow uptake in patching vulnerable software.

Lots of vendors and products have not been made or have any plans for how to handle upgrades when vulnerabilities are found. Many of those that do act, do that with such glacier like speeds that users of such products remain exposed for attackers for a long period after the flaws are already fixed and have become known.

My own analysis of this is that such vendors of course do this because its the cheapest way. Plain capitalistic reasons.

Addressing this is hard

If we had any easy fixes for this, we would already have them in progress. We were also asked by the board what kind of systems that we would not like to see.

Will Software Bill Of Materials (SBOM) fix this? Maybe it can help, by exposing to the world what software and versions are used in products, but it will certainly depend on how it is used and enforced. If done too heavy-handed, it risks causing overhead and added complications but in the other end it might end up too wishy-washy.

Ended there

This was just an hour of conversation with a few follow-up clarifying emails. I hope that we were able to provide insights into how Open Source is made but I have no illusions of us changing anything in drastic ways.

I felt honored to represent “my kind” and help sharing knowledge of Open Source to areas of the world that might not always get informed about it.

cURL and libcurl, Development

My work on tool vs library

February 10, 2022 Daniel Stenberg

I’m the lead developer in the curl project. We make the command line tool curl and the library libcurl, for doing Internet transfers. The command line tool uses the library for all the internet transfer heavy lifting.

The command line tool is somewhat of a shell binding to access libcurl.

We make these things. We recently surpassed 1,000 authors. I lead the project and I have done the most number of commits per month in curl for the last 79 months, and in fact in 222 of the 267 months we have stats for.

The tool came first

curl was born in 1998 as a command line tool only. Two years in, we (well, mostly me actually) remodeled some internals and shipped the first libcurl to the world in August 2000. The idea was (and still is) to provide the same Internet transfer powers the tool has to any application or device out there that need it.

Code sizes

The library side of things is right now about 85% of the total product code. A little over 120,000 lines right now, including comments.

Pie chart showing code distribution between tool and library

Users at scale

Many users think of the curl project as equivalent to the curl tool and the command line tool certainly has a lot of users. It is available for and on all the popular platforms. It is impossible to count curl command users, but millions should not be an exaggeration.

While it seems likely that more users are using the tool than is writing applications that use libcurl, each product, service or device that uses libcurl can themselves scale up the volumes. A single libcurl user can use libcurl in an application used by billions. A few hundreds or thousands of libcurl users populate the world with things transferring data with our library. (A noticeable share of the current Internet traffic is likely driven by libcurl.)

The net result is therefore that libcurl runs in several thousand times more installations than the tool.

If we visualize the number of curl users as a yellow ball and the libcurl installations as a green ball, putting them next to each other would look something like this:

Planet curl is barely visible next to planet libcurl. Install and user numbers are **estimated**.

(Image math: 3 million curl users, 10 billion libcurl installations. Yellow sphere radius is 90 vs the green’s 1340)

Complexity

Making a command line tool is much easier than doing a library. The command line tool has just one entry point and the interface is limited. A command line and associated files and pipes to read from. In our case, the tool lets libcurl deal with a lot of platform specifics which makes the command line code generic and mostly identical on all platforms it runs on.

A library has many more entry points, each that needs to be written to care for what the users might pass in to them. libcurl has code to work with millions of build combinations, and (right now) up to 35 different external libraries. (13 different TLS libraries, 3 different SSH libraries, 2 different QUIC/HTTP/3 libraries, 3 different compression libraries etc.)

libcurl is used in many more challenging environments such as niche operating systems, scaled up to thousands of concurrent transfers, systems that never exits and in builds with a creative set of features disabled at build-time.

Compared to the curl tool, libcurl is way more advanced. A bug in the command line tool is often much easier to fix than one in the library. Such bugs are also rarer since it is much simpler and a smaller amount of code.

Where it matters

Because of what I have outlined above, I focus my curl work on library related changes. Both when it comes to bug-fixes and adding features. It scales better to the world, and as one of the designers and architects of most solutions used in libcurl I am in many cases a suitable engineer to work on many of the more complicated problems.

It is smarter for the entire project for me to leave the slightly less complicated problems and more easily understood features to be fixed and added by others. It scales better. After all, I have a finite number of hours per week to spend on curl. I want to make the best use of my time as I can.

Therefore, I tend to leave tool-related things for others to work on.

Where it pays

I work on curl for a living. The companies that pay for support have higher priority than almost any other bug reports, and they tend to be libcurl related. Keeping my paying customers happy is crucial to me. And in a funny way also to others, since that work usually end up benefiting all curl users.

Where its fun

As I work on curl full-time during my workdays and also during a good chunk of my spare time, I need to “lighten up” my work at times and get some variation. Sometimes I can go about and find new little things to work on in the project that maybe isn’t top priority by any means, but are things that could use some polish and are different enough from what I spent the rest of my week on. To give me variation. To keep the fun.

Also, sometimes scratching the surface on a new somewhat forgotten place brings up more important stuff.

Theory meets practice

Years ago I even for a while considered to hand over maintenance of the curl tool to someone else and more distinctly say that I would only work on the library as they are separate entities and could possibly benefit from being worked on independently from each other.

My idea of focusing my work on the more complicated issues, to work on design and architecture and help newcomers find their way into the code doesn’t always work out.

I never gave up maintenance of the tool and a lot of things that someone else could implement or fix in the project aren’t, which makes me eventually get to work on that too anyway. For the good of the project. Also, it makes my work day more varied and fun if I take occasional strolls around the project every now and then and put on some new paint on areas I find could use some.

Summary

Most of my work efforts go into libcurl matters. But I work on the tool too.

cURL and libcurl, Development

I am an 80 column purist

November 30, 2020 Daniel Stenberg 8 Comments

I write and prefer code that fits within 80 columns in curl and other projects – and there are reasons for it. I’m a little bored by the people who respond and say that they have 400 inch monitors already and they can use them.

I too have multiple large high resolution screens – but writing wide code is still a bad idea! So I decided I’ll write down my reasoning once and for all!

Narrower is easier to read

There’s a reason newspapers and magazines have used narrow texts for centuries and in fact even books aren’t using long lines. For most humans, it is simply easier on the eyes and brain to read texts that aren’t using really long lines. This has been known for a very long time.

Easy-to-read code is easier to follow and understand which leads to fewer bugs and faster debugging.

Side-by-side works better

I never run windows full sized on my screens for anything except watching movies. I frequently have two or more editor windows next to each other, sometimes also with one or two extra terminal/debugger windows next to those. To make this feasible and still have the code readable, it needs to fit “wrapless” in those windows.

Sometimes reading a code diff is easier side-by-side and then too it is important that the two can fit next to each other nicely.

Better diffs

Having code grow vertically rather than horizontally is beneficial for diff, git and other tools that work on changes to files. It reduces the risk of merge conflicts and it makes the merge conflicts that still happen easier to deal with.

It encourages shorter names

A side effect by strictly not allowing anything beyond column 80 is that it becomes really hard to use those terribly annoying 30+ letters java-style names on functions and identifiers. A function name, and especially local ones, should be short. Having long names make them really hard to read and makes it really hard to spot the difference between the other functions with similarly long names where just a sub-word within is changed.

I know especially Java people object to this as they’re trained in a different culture and say that a method name should rather include a lot of details of the functionality “to help the user”, but to me that’s a weak argument as all non-trivial functions will have more functionality than what can be expressed in the name and thus the user needs to know how the function works anyway.

I don’t mean 2-letter names. I mean long enough to make sense but not be ridiculous lengths. Usually within 15 letters or so.

Just a few spaces per indent level

To make this work, and yet allow a few indent levels, the code basically have to have small indent-levels, so I prefer to have it set to two spaces per level.

Many indent levels is wrong anyway

If you do a lot of indent levels it gets really hard to write code that still fits within the 80 column limit. That’s a subtle way to suggest that you should not write functions that needs or uses that many indent levels. It should then rather be split out into multiple smaller functions, where then each function won’t need that many levels!

Why exactly 80?

Once upon the time it was of course because terminals had that limit and these days the exact number 80 is not a must. I just happen to think that the limit has worked fine in the past and I haven’t found any compelling reason to change it since.

It also has to be a hard and fixed limit as if we allow a few places to go beyond the limit we end up on a slippery slope and code slowly grow wider over time – I’ve seen it happen in many projects with “soft enforcement” on code column limits.

Enforced by a tool

In curl, we have ‘checksrc’ which will yell errors at any user trying to build code with a too long line present. This is good because then we don’t have to “waste” human efforts to point this out to contributors who offer pull requests. The tool will point out such mistakes with ruthless accuracy.

Credits

Image by piotr kurpaska from Pixabay

cURL and libcurl, Development, Open Source

curl: 3K forks

May 30, 2019 Daniel Stenberg

It’s just another meaningless number, but today there are 3,000 forks done of the curl GitHub repository.

This pops up just a little over three years since we reached our first 1,000 forks. Also, 10,000 stars no too long ago.

Why fork?

A typical reason why people fork a project on GitHub, is so that they can make a change in their own copy of the source code and then suggest that change to the project in the form of a pull-request.

The curl project has almost 700 individual commit authors, which makes at least 2,300 forks done who still haven’t had their pull-requests accepted! Of course those are 700 contributors who actually managed to work all the way through to inclusion. We can imagine that there is a huge number of people who only ever thought about doing a change, some who only ever just started to do it, many who ditched the idea before it was completed, some who didn’t actually manage to implement it properly, some who got their idea and suggestion shut down by the project and of course, lots of people still have their half-finished change sitting there waiting for inspiration.

Then there are people who just never had the intention of sending any change back. Maybe they just wanted to tinker with the code and have fun. Some want to do private changes they don’t want to offer or perhaps they already know the upstream project won’t accept.

We just can’t tell.

Many?

Is 3,000 forks a lot or a little? Both. It is certainly more forks than we’ve ever had before in this project. But compared to some of the most popular projects on GitHub, even comparing to some other C projects (on GitHub the most popular projects are never written in C) our numbers are dwarfed by the really popular ones. You can probably guess which ones they are.

In the end, this number is next to totally meaningless as it doesn’t say anything about the project nor about what contributions we get or will get in the future. It tells us we have (or had) the attention of a lot of users and that’s about it.

I will continue to try to make sure we’re worth the attention, both now and going forward!

(Picture from pixabay.)

clang and gcc

CI to the rescue

Undefined behavior sanitizer

Picky function prototypes

Example

How this became annoying

The fix

Disabling the check is not enough

Is this a clang issue?

A historic footnote

Post-publish update

I blame GitHub

Commit messages with style

Manual merges

We COULD but we won’t

What GitHub could do

Discussion

Addendum

One

Two, three

Four

Five

Redundancy

Services come and go

Five, Six

Zuul CI

Five

We do what we need

What we run in CI

Of course we have graphs

Future

Recording

Agenda

Merging the work

Commit messages

They appear as “closed this in”

Signed commits

Stats

Update

The main problem

Addressing this is hard

Ended there

The tool came first

Code sizes

Users at scale

Complexity

Where it matters

Where it pays

Where its fun

Theory meets practice

Summary

Narrower is easier to read

Side-by-side works better

Better diffs

It encourages shorter names

Just a few spaces per indent level

Many indent levels is wrong anyway

Why exactly 80?

Enforced by a tool

Credits

Why fork?

Many?

curl, open source and networking