Tag Archives: Development

predef is our friend

For C programmers like me, who want to write portable programs that can get built and run on the widest possible array of machines and platforms, we often need to write conditional code. Code that use #ifdefs for particular conditions.

Such ifdefs expressions often need to check for a particular compiler, an operating system or perhaps even a specific version of of one those things.

Back in April 2002, Bjørn Reese created the predef project and started collecting this information on a page on Sourceforge. I found it a super useful idea and I have tried to contribute what I have learned through the years to the effort.

The collection of data has been maintained and slowly expanded over the years thanks to contributions from many friends.

The predef documents have turned out to be a genuine goldmine and a resource I regularly come back to time and time again, when I work on projects such as curl, c-ares and libssh2 and need to make sure they remain functional on a plethora of systems. I can only imagine how many others are doing the same for their projects. Or maybe should do the same…

Now, in July 2022, the team is moving the predef documents over to a brand new GitHub organization. The point being to making it more accessible and allow edits and future improvements using standard pull-requests to make ease the maintenance of them.

You find all the documentation here:

https://github.com/cpredef/predef

If you find errors, mistakes or omissions, we will of course be thrilled to see issues and pull requests filed.

case insensitive string comparisons in C

Back in 2008, I had a revelation when it dawned on me that the POSIX function called strcasecmp() compares strings case insensitively, but locale dependent. Because of this, “file” and “FILE” is not actually a case insensitive match in Turkish but is a match in most other locales. curl would sometimes fail in mysterious ways due to this. Mysterious to the users, now we know why.

Of course this behavior was no secret. The knowledge about this problem was widespread already then. It was just me who hadn’t realized this yet.

A custom replacement

To work around that problem for curl, we immediately implemented our own custom comparison replacement function that doesn’t care about locales. Internet protocols work the same way no matter which locale the user happens to prefer.

We did not go the POSIX route. The POSIX function for case insensitive string comparisons that ignores the locale is called strcasecmp_l() but that uses a special locale argument and also doesn’t exist on non-POSIX platforms.

curl has used its custom set of functions since 7.19.1, released in early November 2008.

OpenSSL 3.0.3

Fast forward to May 2022. OpenSSL released their version 3.0.3. In the change-log for this release we learned that they now offer public functions for case insensitive string comparisons. Whatdoyouknow! They too have learned about the Turkish locale. Apparently to the degree that they feel they need to offer those functions in their already super-huge API set. Oh well, that is certainly their choice.

I can relate since we too have such functions in libcurl, but I have always regretted that we added them to the API since comparing strings is not libcurl’s core business. We did wrong then and we still live with the consequences several decades later.

OpenSSL however took the POSIX route and based their implementation on strcasecmp_l() and use a global variable for the locale and an elaborate system to initialize that global and even a way to make things work if string comparisons are needed before that global variable is initialized etc.

This new system was complicated to the degree that it broke the library on several platforms, which curl users running Windows 7 figured out almost instantly. curl with OpenSSL 3.0.3 simply does not work on Windows 7 – at all.

Reasons for not exposing a string compare API

Libraries should only provide functions that are within their core objective. Not fluffy might be useful things. Reasons for this include:

  • It adds to the complexity to users. Yet another function in the ever expanding set of function calls in the API.
  • It increases the documentation size even more and makes the real things harder to find somewhere in there.
  • It adds “attack surface” and areas where you can make errors and introduce security problems.
  • You get more work since now you have additional functions to keep ABI and API stable for all eternity and you have to spend developer time and effort on making sure they remain so.

Do a custom one for OpenSSL?

I think there is a software law that goes something like this

eventually, all C libraries implement their own case insensitive string comparison functions

When I proposed they should implement their own custom function in discussions in one of the issues about this OpenSSL problem, the suggestion was shot down fairly quickly because of how hard it is to implement a such function that is as fast as the glibc version.

In my ears, that sounds like they prefer to stick with an overworked and complicated error-prone system, because an underlying function is faster, rather than going with simplicity and functionality at the price of sightly slower performance. In fairness, they say that case insensitive string comparisons are “6-7%” of the time spent in some (to me unknown) performance test they referred to. I have no way or intention to argue with that.

I think maybe they couldn’t really handle that idea from an outsider and they might just need a little more time to figure out the right way forward on their own. Then go with simple.

I am of course not in the best position to say how they should act on this. I’m just a spectator here. I may be completely wrong.

Update (May 23)

In a separate PR (4 days after this blog post went live), OpenSSL suddenly implemented their own and it was deemed that it would not hurt performance noticeably. Merged on May 23. Almost like they followed my recommendation!

OpenSSL’s current tolower() implementation used in the comparison function is similar to curl’s old one so I suspect curl’s current function is a tad bit faster.

Custom vs glibc performance

glibc truly has really fast string comparison implementations, with optimized assembly versions for the common architectures. Versions written in plain C tend to be slower.

However, the API and way to use those functions to make them locale independent is horrific because of the way it forces the caller to provide a locale argument (which could be the “C” locale – the equivalent of no locale).

The curl custom function

That talk about the slowness of custom string functions made us start discussing this topic a little in the curl IRC channel and we bounced around some ideas of what things the curl function does not already do and what it could do and how it compares against the glibc assembly version.

Also: the string comparisons in curl are certainly not that performance critical as they seem to be in OpenSSL and while used a lot in curl they are not used in the most important performance critical transfer-data code paths.

Optimizations

Frank Gevaerts took the lead and after some rounds and discussions backed up with tests, he ended up with an updated function that is 1.6 to 1.7 times faster than before his work. We dropped non-ASCII support in curl a while ago, which also made this task more straight-forward.

The two improvements:

  1. Use a lookup table for our own toupper() implementation instead of the previous simple condition + math.
  2. Better end of loop handling: return immediately on mismatch, and a minor touch-up of the final check when the loop goes all the way to the end.

Measurements

The glibc assembler versions are still faster than curl’s custom functions and the exact speed improvements the above mentioned changes provide will of course depend both on platform and the test set.

Ships in 7.84.0

The faster libcurl functions will ship in curl 7.84.0. I doubt anyone will notice the difference!

Meeting the Cyber Safety Review Board

Three Open Source hackers were invited to this meeting with the CSRB and I was one of them.

The board with this name is part of CISA, a US government effort that received a presidential order to work on “Improving the Nation’s Cybersecurity“. Where “the Nation” here is the US.

I’m not in the US and I’m not a US citizen but I felt I should help out when asked and I was able to.

On April 21 2022, I joined the video meeting together with an OpenSSL and a Tomcat contributor and several members of the board. (I am not naming any names of participants in this post because I have not asked for permission nor do I think the names are important here.)

For about an hour we talked to the board how we develop Open Source, how we take on security problems and how we work on making sure we do things as securely as we can. It was striking how similarly the three of us looked at the issues and how we work in our project, despite our projects all being different and having our own specifics.

As projects, we believe we have pretty well-established and working procedures for getting problems reported and we think we fix the issues fairly swiftly. We ship fixes, advisories and updates not long after the issues get known. The CVE system where we register and publish security vulnerabilities in a global registry is working adequately. (I’m not saying things are perfect.)

The main problem

It was pretty clear to me that we agreed that the biggest problem in the Open Source supply chain today is the slow uptake in patching vulnerable software.

Lots of vendors and products have not been made or have any plans for how to handle upgrades when vulnerabilities are found. Many of those that do act, do that with such glacier like speeds that users of such products remain exposed for attackers for a long period after the flaws are already fixed and have become known.

My own analysis of this is that such vendors of course do this because its the cheapest way. Plain capitalistic reasons.

Addressing this is hard

If we had any easy fixes for this, we would already have them in progress. We were also asked by the board what kind of systems that we would not like to see.

Will Software Bill Of Materials (SBOM) fix this? Maybe it can help, by exposing to the world what software and versions are used in products, but it will certainly depend on how it is used and enforced. If done too heavy-handed, it risks causing overhead and added complications but in the other end it might end up too wishy-washy.

Ended there

This was just an hour of conversation with a few follow-up clarifying emails. I hope that we were able to provide insights into how Open Source is made but I have no illusions of us changing anything in drastic ways.

I felt honored to represent “my kind” and help sharing knowledge of Open Source to areas of the world that might not always get informed about it.

What curl expects from dependencies

curl supports a large number of third party libraries. In a build, those libraries become “dependencies”. These components offer functionality and features that we don’t implement ourselves but still have been deemed interesting or even crucial to support to do Internet transfers the way we want.

A curl build done today can use one or more out of 35 different libraries. No build can actually use all of them at once as many are mutually exclusive and most of them only work on one or a subset of platforms.

The green boxes illustrate the third party dependencies curl can use as direct dependencies.

Keeping our backyard tidy

Every now and then we learn that one of the 3rd party libraries we can build curl to use has ceased development or has in some other way started to decay into a state where we feel is no longer healthy to the level that we can no longer recommend our users to use it.

We do this as a service to our users. If users build curl with a dependency we support, I think we should at least have some rudimentary knowledge that the dependencies we help users to use are not terrible. It’s not a guarantee, but we try. To help strengthen the ecosystem. To sweep our own backyard.

Also, getting rid of old code is good.

The different third party direct dependencies supported by curl by the time of the initial added support. A minus prefix means the support was dropped.

Indirect dependencies

There are of course also indirect dependencies in the form of libraries our direct dependencies use (or even libraries the indirect used libraries use), and we try to also include them in the “package” when we consider dependencies, but especially if they are optional we need to put less attention on them.

What is a healthy dependency?

We have no automatic checks or even fixed set of rules or conditions to help us make this distinction. It would of course be cool to have that, but we don’t.

Ideally, it would be awesome if all dependencies would be top-rated on bestpractices, as that would greatly help us figure this out. But unfortunately too many projects are still not even added to that effort so this doesn’t work – plus we also support a number of proprietary dependencies that can’t be rated there.

Instead, we need to rely on old-fashioned human checks and asking users and maintainers.

Maybe not add it to begin with

We have declined to add functionality to curl in the past just because the proposed 3rd party dependency it would use just didn’t live up to our standards. I don’t mean that we need to raise the bar to ridiculous levels, but if a casual browsing of the 3rd party library found issues and there were not satisfying answers in a reasonable time on how those should be addressed, then that library is probably not ready to be used by curl. There’s no need to “lure” curl users into a possibly bad situation if we can save them from it.

Abandoned

Sometimes work officially stops on a library we support. That’s a strong sign we should also stop.

curl users actually using it

Since curl is being developed, extended and bug-fixed at a fairly high pace, we can be fairly sure that if a dependency is actually being used, it needs to get fixed every now and then to keep up. If support code for a dependency hasn’t been updated or touched for many years, there’s a strong suspicion that there aren’t many users of it in modern curl.

Sometimes that can be verified to be the case when we notice a blatant bug that’s been present in the code for a good while without anyone noticing, but more often we need to ask users. Anyone using this anymore? (Which also is complicated because we often lack connections to users who don’t read any of our mailing lists and generally only upgrade curl once every decade so it might take a while until those users notice changes…)

Releases

A dependency that has stopped making new releases can be a signal that it on its way downwards. It could also be a sign that it has matured and doesn’t need much more to be done to it.

How do we even know they stopped? Maybe they just take forever from the previous release…

Developer activity

A library that is used by curl is almost required to have some level of developer activity over time. Nobody writes bug-free code unless its scope is razor-sharp-narrow and the project spent a lot of time perfecting it. No commits or developer activity for a long time means that clearly nobody takes care of the bug reports.

Slowly deteriorating projects are probably the hardest to handle. Are they still good enough?

Maintainers ultimately decide

But we just ship source code.

In the end of the day, the people who package curl or libcurl decide what third party libraries to actually get used. They are the ones who decide what dependencies users of their build rely on. In many cases this means the maintainers of the curl packages in Linux distros and other operating systems. Manufacturers of devices and tools that use libcurl often build their own and then they can decide and cherry-pick individually between all provided choices.

This makes it possible for such maintainers to add extra conditions and checks and only go with the dependencies they like.

The only binary packages the curl project itself provides, are the ones for Windows, and we try to go with only solid, reliable and conservative choices for those.

My work on tool vs library

I’m the lead developer in the curl project. We make the command line tool curl and the library libcurl, for doing Internet transfers. The command line tool uses the library for all the internet transfer heavy lifting.

The command line tool is somewhat of a shell binding to access libcurl.

We make these things. We recently surpassed 1,000 authors. I lead the project and I have done the most number of commits per month in curl for the last 79 months, and in fact in 222 of the 267 months we have stats for.

The tool came first

curl was born in 1998 as a command line tool only. Two years in, we (well, mostly me actually) remodeled some internals and shipped the first libcurl to the world in August 2000. The idea was (and still is) to provide the same Internet transfer powers the tool has to any application or device out there that need it.

Code sizes

The library side of things is right now about 85% of the total product code. A little over 120,000 lines right now, including comments.

Pie chart showing code distribution between tool and library

Users at scale

Many users think of the curl project as equivalent to the curl tool and the command line tool certainly has a lot of users. It is available for and on all the popular platforms. It is impossible to count curl command users, but millions should not be an exaggeration.

While it seems likely that more users are using the tool than is writing applications that use libcurl, each product, service or device that uses libcurl can themselves scale up the volumes. A single libcurl user can use libcurl in an application used by billions. A few hundreds or thousands of libcurl users populate the world with things transferring data with our library. (A noticeable share of the current Internet traffic is likely driven by libcurl.)

The net result is therefore that libcurl runs in several thousand times more installations than the tool.

If we visualize the number of curl users as a yellow ball and the libcurl installations as a green ball, putting them next to each other would look something like this:

Planet curl is barely visible next to planet libcurl. Install and user numbers are estimated.

(Image math: 3 million curl users, 10 billion libcurl installations. Yellow sphere radius is 90 vs the green’s 1340)

Complexity

Making a command line tool is much easier than doing a library. The command line tool has just one entry point and the interface is limited. A command line and associated files and pipes to read from. In our case, the tool lets libcurl deal with a lot of platform specifics which makes the command line code generic and mostly identical on all platforms it runs on.

A library has many more entry points, each that needs to be written to care for what the users might pass in to them. libcurl has code to work with millions of build combinations, and (right now) up to 35 different external libraries. (13 different TLS libraries, 3 different SSH libraries, 2 different QUIC/HTTP/3 libraries, 3 different compression libraries etc.)

libcurl is used in many more challenging environments such as niche operating systems, scaled up to thousands of concurrent transfers, systems that never exits and in builds with a creative set of features disabled at build-time.

Compared to the curl tool, libcurl is way more advanced. A bug in the command line tool is often much easier to fix than one in the library. Such bugs are also rarer since it is much simpler and a smaller amount of code.

Where it matters

Because of what I have outlined above, I focus my curl work on library related changes. Both when it comes to bug-fixes and adding features. It scales better to the world, and as one of the designers and architects of most solutions used in libcurl I am in many cases a suitable engineer to work on many of the more complicated problems.

It is smarter for the entire project for me to leave the slightly less complicated problems and more easily understood features to be fixed and added by others. It scales better. After all, I have a finite number of hours per week to spend on curl. I want to make the best use of my time as I can.

Therefore, I tend to leave tool-related things for others to work on.

Where it pays

I work on curl for a living. The companies that pay for support have higher priority than almost any other bug reports, and they tend to be libcurl related. Keeping my paying customers happy is crucial to me. And in a funny way also to others, since that work usually end up benefiting all curl users.

Where its fun

As I work on curl full-time during my workdays and also during a good chunk of my spare time, I need to “lighten up” my work at times and get some variation. Sometimes I can go about and find new little things to work on in the project that maybe isn’t top priority by any means, but are things that could use some polish and are different enough from what I spent the rest of my week on. To give me variation. To keep the fun.

Also, sometimes scratching the surface on a new somewhat forgotten place brings up more important stuff.

Theory meets practice

Years ago I even for a while considered to hand over maintenance of the curl tool to someone else and more distinctly say that I would only work on the library as they are separate entities and could possibly benefit from being worked on independently from each other.

My idea of focusing my work on the more complicated issues, to work on design and architecture and help newcomers find their way into the code doesn’t always work out.

I never gave up maintenance of the tool and a lot of things that someone else could implement or fix in the project aren’t, which makes me eventually get to work on that too anyway. For the good of the project. Also, it makes my work day more varied and fun if I take occasional strolls around the project every now and then and put on some new paint on areas I find could use some.

Summary

Most of my work efforts go into libcurl matters. But I work on the tool too.

What if GitHub is the devil?

Some critics think the curl project shouldn’t use GitHub. The reasons for being against GitHub hosting tend to be one or more of:

  1. it is an evil proprietary platform
  2. it is run by Microsoft and they are evil
  3. GitHub is American thus evil

Some have insisted on craziness like “we let GitHub hold our source code hostage”.

Why GitHub?

The curl project switched to GitHub (from Sourceforge) almost eleven years ago and it’s been a smooth ride ever since.

We’re on GitHub not only because it provides a myriad of practical features and is a stable and snappy service for hosting and managing source code. GitHub is also a developer hub for millions of developers who already have accounts and are familiar with the GitHub style of developing, the terms and the tools. By being on GitHub, we reduce friction from the contribution process and we maximize the ability for others to join in and help. We lower the bar. This is good for us.

I like GitHub.

Self-hosting is not better

Providing even close to the same uptime and snappy response times with a self-hosted service is a challenge, and it would take someone time and energy to volunteer that work – time and energy we now instead can spend of developing the project instead. As a small independent open source project, we don’t have any “infrastructure department” that would do it for us. And trust me: we already have enough infrastructure management to deal with without having to add to that pile.

… and by running our own hosted version, we would lose the “network effect” and convenience for people that already are on and know the platform. We would also lose the easy integration with cool services like the many different CI and code analyzer jobs we run.

Proprietary is not the issue

While git is open source, GitHub is a proprietary system. But the thing is that even if we would go with a competitor and get our code hosting done elsewhere, our code would still be stored on a machine somewhere in a remote server park we cannot physically access – ever. It doesn’t matter if that hosting company uses open source or proprietary code. If they decide to switch off the servers one day, or even just selectively block our project, there’s nothing we can do to get our stuff back out from there.

We have to work so that we minimize the risk for it and the effects from it if it still happens.

A proprietary software platform holds our code just as much hostage as any free or open source software platform would, simply by the fact that we let someone else host it. They run the servers our code is stored on.

If GitHub takes the ball and goes home

No matter which service we use, there’s always a risk that they will turn off the light one day and not come back – or just change the rules or licensing terms that would prevent us from staying there. We cannot avoid that risk. But we can make sure that we’re smart about it, have a contingency plan or at least an idea of what to do when that day comes.

If GitHub shuts down immediately and we get zero warning to rescue anything at all from them, what would be the result for the curl project?

Code. We would still have the entire git repository with all code, all source history and all existing branches up until that point. We’re hundreds of developers who pull that repository frequently, and many automatically, so there’s a very distributed backup all over the world.

CI. Most of our CI setup is done with yaml config files in the source repo. If we transition to another hosting platform, we could reuse them.

Issues. Bug reports and pull requests are stored on GitHub and a sudden exit would definitely make us lose some of them. We do daily “extractions” of all issues and pull-requests so a lot of meta-data could still be saved and preserved. I don’t think this would be a terribly hard blow either: we move long-standing bugs and ideas over to documents in the repository, so the currently open ones are likely possible to get resubmitted again within the nearest future.

There’s no doubt that it would be a significant speed bump for the project, but it would not be worse than that. We could bounce back on a new platform and development would go on within days.

Low risk

It’s a rare thing, that a service just suddenly with no warning and no heads up would just go black and leave projects completely stranded. In most cases, we get alerts, notifications and get a chance to transition cleanly and orderly.

There are alternatives

Sure there are alternatives. Both pure GitHub alternatives that look similar and provide similar services, and projects that would allow us to run similar things ourselves and host locally. There are many options.

I’m not looking for alternatives. I’m not planning to switch hosting anytime soon! As mentioned above, I think GitHub is a net positive for the curl project.

Nothing lasts forever

We’ve switched services several times before and I’m expecting that we will change again in the future, for all sorts of hosting and related project offerings that we provide to the work and to the developers and participators within the project. Nothing lasts forever.

When a service we use goes down or just turns sour, we will figure out the best possible replacement and take the jump. Then we patch up all the cracks the jump may have caused and continue the race into the future. Onward and upward. The way we know and the way we’ve done for over twenty years already.

Credits

Image by Elias Sch. from Pixabay

Updates

After this blog post went live, some users remarked than I’m “disingenuous” in the list of reasons at the top, that people have presented to me. This, because I don’t mention the moral issues staying on GitHub present – like for example previously reported workplace conflicts and their association with hideous American immigration authorities.

This is rather the opposite of disingenuous. This is the truth. Not a single person have ever asked me to leave GitHub for those reasons. Not me personally, and nobody has asked it out to the wider project either.

These are good reasons to discuss and consider if a service should be used. Have there been violations of “decency” significant enough that should make us leave? Have we crossed that line in the sand? I’m leaning to “no” now, but I’m always listening to what curl users and developers say. Where do you think the line is drawn?

I am an 80 column purist

I write and prefer code that fits within 80 columns in curl and other projects – and there are reasons for it. I’m a little bored by the people who respond and say that they have 400 inch monitors already and they can use them.

I too have multiple large high resolution screens – but writing wide code is still a bad idea! So I decided I’ll write down my reasoning once and for all!

Narrower is easier to read

There’s a reason newspapers and magazines have used narrow texts for centuries and in fact even books aren’t using long lines. For most humans, it is simply easier on the eyes and brain to read texts that aren’t using really long lines. This has been known for a very long time.

Easy-to-read code is easier to follow and understand which leads to fewer bugs and faster debugging.

Side-by-side works better

I never run windows full sized on my screens for anything except watching movies. I frequently have two or more editor windows next to each other, sometimes also with one or two extra terminal/debugger windows next to those. To make this feasible and still have the code readable, it needs to fit “wrapless” in those windows.

Sometimes reading a code diff is easier side-by-side and then too it is important that the two can fit next to each other nicely.

Better diffs

Having code grow vertically rather than horizontally is beneficial for diff, git and other tools that work on changes to files. It reduces the risk of merge conflicts and it makes the merge conflicts that still happen easier to deal with.

It encourages shorter names

A side effect by strictly not allowing anything beyond column 80 is that it becomes really hard to use those terribly annoying 30+ letters java-style names on functions and identifiers. A function name, and especially local ones, should be short. Having long names make them really hard to read and makes it really hard to spot the difference between the other functions with similarly long names where just a sub-word within is changed.

I know especially Java people object to this as they’re trained in a different culture and say that a method name should rather include a lot of details of the functionality “to help the user”, but to me that’s a weak argument as all non-trivial functions will have more functionality than what can be expressed in the name and thus the user needs to know how the function works anyway.

I don’t mean 2-letter names. I mean long enough to make sense but not be ridiculous lengths. Usually within 15 letters or so.

Just a few spaces per indent level

To make this work, and yet allow a few indent levels, the code basically have to have small indent-levels, so I prefer to have it set to two spaces per level.

Many indent levels is wrong anyway

If you do a lot of indent levels it gets really hard to write code that still fits within the 80 column limit. That’s a subtle way to suggest that you should not write functions that needs or uses that many indent levels. It should then rather be split out into multiple smaller functions, where then each function won’t need that many levels!

Why exactly 80?

Once upon the time it was of course because terminals had that limit and these days the exact number 80 is not a must. I just happen to think that the limit has worked fine in the past and I haven’t found any compelling reason to change it since.

It also has to be a hard and fixed limit as if we allow a few places to go beyond the limit we end up on a slippery slope and code slowly grow wider over time – I’ve seen it happen in many projects with “soft enforcement” on code column limits.

Enforced by a tool

In curl, we have ‘checksrc’ which will yell errors at any user trying to build code with a too long line present. This is good because then we don’t have to “waste” human efforts to point this out to contributors who offer pull requests. The tool will point out such mistakes with ruthless accuracy.

Credits

Image by piotr kurpaska from Pixabay

This is how I git

Every now and then I get questions on how to work with git in a smooth way when developing, bug-fixing or extending curl – or how I do it. After all, I work on open source full time which means I have very frequent interactions with git (and GitHub). Simply put, I work with git all day long. Ordinary days, I issue git commands several hundred times.

I have a very simple approach and way of working with git in curl. This is how it works.

command line

I use git almost exclusively from the command line in a terminal. To help me see which branch I’m working in, I have this little bash helper script.

brname () {
  a=$(git rev-parse --abbrev-ref HEAD 2>/dev/null)
  if [ -n "$a" ]; then
    echo " [$a]"
  else
    echo ""
  fi
}
PS1="\u@\h:\w\$(brname)$ "

That gives me a prompt that shows username, host name, the current working directory and the current checked out git branch.

In addition: I use Debian’s bash command line completion for git which is also really handy. It allows me to use tab to complete things like git commands and branch names.

git config

I of course also have my customized ~/.gitconfig file to provide me with some convenient aliases and settings. My most commonly used git aliases are:

st = status --short -uno
ci = commit
ca = commit --amend
caa = commit -a --amend
br = branch
co = checkout
df = diff
lg = log -p --pretty=fuller --abbrev-commit
lgg = log --pretty=fuller --abbrev-commit --stat
up = pull --rebase
latest = log @^{/RELEASE-NOTES:.synced}..

The ‘latest’ one is for listing all changes done to curl since the most recent RELEASE-NOTES “sync”. The others should hopefully be rather self-explanatory.

The config also sets gpgsign = true, enables mailmap and a few other things.

master is clean and working

The main curl development is done in the single curl/curl git repository (primarily hosted on GitHub). We keep the master branch the bleeding edge development tree and we work hard to always keep that working and functional. We do our releases off the master branch when that day comes (every eight weeks) and we provide “daily snapshots” from that branch, put together – yeah – daily.

When merging fixes and features into master, we avoid merge commits and use rebases and fast-forward as much as possible. This makes the branch very easy to browse, understand and work with – as it is 100% linear.

Work on a fix or feature

When I start something new, like work on a bug or trying out someone’s patch or similar, I first create a local branch off master and work in that. That is, I don’t work directly in the master branch. Branches are easy and quick to do and there’s no reason to shy away from having loads of them!

I typically name the branch prefixed with my GitHub user name, so that when I push them to the server it is noticeable who is the creator (and I can use the same branch name locally as I do remotely).

$ git checkout -b bagder/my-new-stuff-or-bugfix

Once I’ve reached somewhere, I commit to the branch. It can then end up one or more commits before I consider myself “done for now” with what I was set out to do.

I try not to leave the tree with any uncommitted changes – like if I take off for the day or even just leave for food or an extended break. This puts the repository in a state that allows me to easily switch over to another branch when I get back – should I feel the need to. Plus, it’s better to commit and explain the change before the break rather than having to recall the details again when coming back.

Never stash

“git stash” is therefore not a command I ever use. I rather create a new branch and commit the (temporary?) work in there as a potential new line of work.

Show it off and get reviews

Yes I am the lead developer of the project but I still maintain the same work flow as everyone else. All changes, except the most minuscule ones, are done as pull requests on GitHub.

When I’m happy with the functionality in my local branch. When the bug seems to be fixed or the feature seems to be doing what it’s supposed to do and the test suite runs fine locally.

I then clean up the commit series with “git rebase -i” (or if it is a single commit I can instead use just “git commit --amend“).

The commit series should be a set of logical changes that are related to this change and not any more than necessary, but kept separate if they are separate. Each commit also gets its own proper commit message. Unrelated changes should be split out into its own separate branch and subsequent separate pull request.

git push origin bagder/my-new-stuff-or-bugfix

Make the push a pull request

On GitHub, I then make the newly pushed branch into a pull request (aka “a PR”). It will then become visible in the list of pull requests on the site for the curl source repository, it will be announced in the #curl IRC channel and everyone who follows the repository on GitHub will be notified accordingly.

Perhaps most importantly, a pull request kicks of a flood of CI jobs that will build and test the code in numerous different combinations and on several platforms, and the results of those tests will trickle in over the coming hours. When I write this, we have around 90 different CI jobs – per pull request – and something like 8 different code analyzers will scrutinize the change to see if there’s any obvious flaws in there.

CI jobs per platform over time. Graph snapped on November 5, 2020

A branch in the actual curl/curl repo

Most contributors who would work on curl would not do like me and make the branch in the curl repository itself, but would rather do them in their own forked version instead. The difference isn’t that big and I could of course also do it that way.

After push, switch branch

As it will take some time to get the full CI results from the PR to come in (generally a few hours), I switch over to the next branch with work on my agenda. On a normal work-day I can easily move over ten different branches, polish them and submit updates in their respective pull-requests.

I can go back to the master branch again with ‘git checkout master‘ and there I can “git pull” to get everything from upstream – like when my fellow developers have pushed stuff in the mean time.

PR comments or CI alerts

If a reviewer or a CI job find a mistake in one of my PRs, that becomes visible on GitHub and I get to work to handle it. To either fix the bug or discuss with the reviewer what the better approach might be.

Unfortunately, flaky CI jobs is a part of life so very often there ends up one or two red markers in the list of CI jobs that can be ignored as the test failures in them are there due to problems in the setup and not because of actual mistakes in the PR…

To get back to my branch for that PR again, I “git checkout bagder/my-new-stuff-or-bugfix“, and fix the issues.

I normally start out by doing follow-up commits that repair the immediate mistake and push them on the branch:

git push origin bagder/my-new-stuff-or-bugfix

If the number of fixup commits gets large, or if the follow-up fixes aren’t small, I usually end up doing a squash to reduce the number of commits into a smaller, simpler set, and then force-push them to the branch.

The reason for that is to make the patch series easy to review, read and understand. When a commit series has too many commits that changes the previous commits, it becomes hard to review.

Ripe to merge?

When the pull request is ripe for merging (independently of who authored it), I switch over to the master branch again and I merge the pull request’s commits into it. In special cases I cherry-pick specific commits from the branch instead. When all the stuff has been yanked into master properly that should be there, I push the changes to the remote.

Usually, and especially if the pull request wasn’t done by me, I also go over the commit messages and polish them somewhat before I push everything. Commit messages should follow our style and mention not only which PR that it closes but also which issue it fixes and properly give credit to the bug reporter and all the helpers – using the right syntax so that our automatic tools can pick them up correctly!

As already mentioned above, I merge fast-forward or rebased into master. No merge commits.

Never merge with GitHub!

There’s a button GitHub that says “rebase and merge” that could theoretically be used for merging pull requests. I never use that (and if I could, I’d disable/hide it). The reasons are simply:

  1. I don’t feel that I have the proper control of the commit message(s)
  2. I can’t select to squash a subset of the commits, only all or nothing
  3. I often want to cleanup the author parts too before push, which the UI doesn’t allow

The downside with not using the merge button is that the message in the PR says “closed by [hash]” instead of “merged in…” which causes confusion to a fair amount of users who don’t realize it means that it actually means the same thing! I consider this is a (long-standing) GitHub UX flaw.

Post merge

If the branch has nothing to be kept around more, I delete the local branch again with “git branch -d [name]” and I remove it remotely too since it was completely merged there’s no reason to keep the work version left.

At any given point in time, I have some 20-30 different local branches alive using this approach so things I work on over time all live in their own branches and also submissions from various people that haven’t been merged into master yet exist in branches of various maturity levels. Out of those local branches, the number of concurrent pull requests I have in progress can be somewhere between just a few up to ten, twelve something.

RELEASE-NOTES

Not strictly related, but in order to keep interested people informed about what’s happening in the tree, we sync the RELEASE-NOTES file every once in a while. Maybe every 5-7 days or so. It thus becomes a file that explains what we’ve worked on since the previous release and it makes it well-maintained and ready by the time the release day comes.

To sync it, all I need to do is:

$ ./scripts/release-notes.pl

Which makes the script add suggested updates to it, so I then load the file into my editor, remove the separation marker and all entries that don’t actually belong there (as the script adds all commits as entries as it can’t judge the importance).

When it looks okay, I run a cleanup round to make it sort it and remove unused references from the file…

$ ./scripts/release-notes.pl cleanup

Then I make sure to get a fresh list of contributors…

$ ./scripts/contributors.sh

… and paste that updated list into the RELEASE-NOTES. Finally, I get refreshed counters for the numbers at the top of the file by running

$ ./scripts/delta

Then I commit the update (which needs to have the commit message RELEASE-NOTES: synced“) and push it to master. Done!

The most up-to-date version of RELEASE-NOTES is then always made available on https://curl.se/dev/release-notes.html

Credits

Picture by me, taken from the passenger seat on a helicopter tour in 2011.

Restored complete curl changelog

For a long time, the curl changelog on the web site showed the history of changes in the curl project all the way back to curl 6.0. Released on September 13 1999. Older changes were not displayed.

The reason for this was always basically laziness. The page in its current form was initially created back in 2001 and then I just went back a little in history and filled up with a set of previous releases. Since we don’t have pre-1999 code in our git tree (because of a sloppy CVS import), everything before 1999 is a bit of manual procedure to extract so we left it like that.

Until now.

I decided to once and for all fix this oversight and make sure that we get a complete changelog from the first curl release all the way up until today. The first curl release was called 4.0 and was shipped on March 20, 1998.

Before 6.0 we weren’t doing very careful release notes and they were very chatty. I got the CHANGES file from the curl 6.0 tarball and converted them over to the style of the current changelog.

Notes on the restoration work

The versions noted as “beta” releases in the old changelog are not counted or mentioned as real releases.

For the released versions between 4.0 and 4.9 there are no release dates recorded, so I’ve “estimated” the release dates based on the knowledge that we did them fairly regularly and that they probably were rather spread out over that 200 day time span. They won’t be exact, but close enough.

Complete!

The complete changelog is now showing on the site, and in the process I realized that I have at some point made a mistake and miscounted the total number of curl releases. Off-by one actually. The official count now says that the next release will become the 188th.

As a bonus from this work, the “releaselog” page is now complete and shows details for all curl releases ever. (Also, note that we provide all that info in a CSV file too if you feel like playing with the data.)

There’s a little caveat on the updated vulnerability information there: when we note how far vulnerabilities go, we have made it a habit to sometimes mark the first vulnerable version as “6.0” if the bad code exists in the first ever git imported code – simply because going back further and checking isn’t easy and usually isn’t worth the effort because that old versions are not used anymore.

Therefore, we will not have accurate vulnerability information for versions before 6.0. The vulnerability table will only show versions back to 6.0 for that reason.

Many bug-fixes

With the complete data, we also get complete numbers. Since the birth of curl until version 7.67.0 we have fixed exactly 5,664 bugs shipped in releases, and there were exactly 7,901 days between the 4.0 the 7.67.0 releases.

Internetmuseum

The Internet Museum translated to Swedish becomes “internetmuseum“. It is a digital, online-only, museum that collects Internet- and Web related historical information, especially focused on the Swedish angle to all of this. It collects stories from people who did the things. The pioneers, the ground breakers, the leaders, the early visionaries. Most of their documentation is done in the form of video interviews.

I was approached and asked to be part of this – as an Internet Pioneer. Me? Internet Pioneer, really?

Internetmuseum’s page about me.

I’m humbled and honored to be considered and I certainly had a lot of fun doing this interview. To all my friends not (yet) fluent in Swedish: here’s your grand opportunity to practice, because this is done entirely in this language of curl founders and muppet chefs.

Photo from Internetmusuem

Back in the morning of October 18th 2019, two guys showed up as planned at my door and I let them in. One of my guests was a photographer who set up his gear in my living room for the interview, and then me and and guest number two, interviewer Jörgen, sat down and talked for almost an hour straight while being recorded.

The result can be seen here below.

The Science museum was first

This is in fact the second Swedish museum to feature me.

I have already been honored with a display about me, at the Tekniska Museet in Stockholm, the “Science museum” which has an exhibition about past Polhem Prize award winners.

Information displayed about me at the Swedish Science museum in Stockholm. I have a private copy of the cardboard posters.

(Top image by just-pics from Pixabay)