Tag Archives: Open Source

3,000 contributors

Thank you everyone who has helped out in making curl into what it is today.

We make an effort to note the names of and say thanks to every single individual who ever reported bugs, fixed problems, ran tests, wrote code, polished the website, spell-fixed documentation, assisted debug sessions, helped interpret protocol standards, reported security problems or co-authored code etc.

In 2005 I decided to go back through the project history and make sure all names that had been involved up to that point in time would also be mentioned in the THANKS file. This is the reason for the visible bump in the graph.

Since then, we add the names of all the helpers. We say thanks and give credits in commit messages and we have scripts to help us collect them and mention them as contributors. We probably miss occasional ones but I hope and believe that most of all the awesome people that ever helped us are recorded accordingly and given credit.

We are nothing without out dear contributors.

Today, this list of people we are thankful for, reached 3,000 entries when Alex Klyubin’s pull request was merged. (The list of names on the website is synced every once in a while so it actually typically shows slightly fewer people than we have logged in git.)

The team behind curl. 3,000 persons over almost 27 years.

We reached 2,000 contributors less than four years ago, in October 2019, so we have added about 250 new names per year to this list the last four years.

We reached 1,000 contributors in March 2013, meaning from then it took about 6.5 years to get the next 1,000. Roughly 150 new names per year.

The first 1,000 took over 16 years to reach.

You too can see it quite clearly in the graph above as well: the rate of which we get new contributors to help out in the project is increasing.

curl has existed for 9338 days. That equals one new contributor every 3.1 days for over 25 years, on average.

Non-code or code alike

It is oftentimes said that Open Source projects in general have a hard time to properly recognize and appreciate non-code contributions. In the curl project, we try hard to not differentiate between help and help.

If a person helps out to take the curl project further, even if just by a little, we say thank you very much and add their name to the list. We need and are grateful for code and non-code contributions alike. One of the most important parts of the curl project is the documentation, and that is clearly not code.

About the contributors

We cannot say much about the contributors because in an effort to lower the bars and reduce friction, we also do not ask them about details. We only know the name they provide the help under, which could be a pseudonym and in some cases clearly are nicknames. We do not know where in the world they originate from or which company they work for, we can’t tell their gender, skin color, religion or other human “properties”. And we don’t care much about those specifics. Our job is to bring curl forward.

This said: there is a risk that we have added the same contributors twice or more, if they have helped out using several different names. That is just not something we can detect or avoid. Unless the contributor themself informs us.

There is also a risk that some of the persons that contributed to curl are not nice people or that they work for reprehensible organizations. We focus on the quality of their submissions and if they hide who they are, we will never know if they actually are animal-hurting nazis hiding behind pseudonyms.

(Lack of) Diversity

As far as I can tell, we have a lousy contributor diversity. I am pretty sure the majority of all help come from old white middle-class western men. Like myself.

I cannot fully know this for sure because I only actually know a small fraction of all the contributors, but out of the ones I have met this is true and I believe I have met or at least communicated with the ones who have done the vast majority of all the changes.

I would much rather see us have many more contributors from other parts of the world, female, and with non-christian backgrounds, but I cannot control who comes to us. I can only do my best to take care of all and appreciate every contribution without discrimination.

Commit authors

This day when we reach 3,000 contributors, we also count 1,201 commit authors. Persons with their names as authors of at least one commit in the curl source repository. 40% of the contributors are committers. Almost 65% of the committers only ever committed once.

3,000 visualized

The top image of this blog post is a photo from FOSDEM a few years back when I did a presentation in front of a packed room with some 1,400 attendees. The largest room at FOSDEM is said to fit 1415 persons. So not even two such giant rooms would be enough to hold all the curl contributors if it would have been possible to get them all in one place…

You too?

You too can be a curl contributor. We are friendly. It is not hard. There is lots to do. Your contributions can end up getting used by literally billions of humans.

Google Open Source Peer Bonus award 2023

I am honored to yet again receive a peer bonus award from Google. This is a Google program for which persons like me can be nominated by Googlers and as a result receive grants.

I previously received such an award in 2020.

Update

A few people noticed and have commented on the fact that this letter is signed by Chris DiBona and dated April 19th 2023, while sources say he was let go from Google back in January. Which means one or two of those things are wrong.

trurl manipulates URLs

trurl is a tool in a similar spirit of tr but for URLs. Here, tr stands for translate or transpose.

trurl is a small command line tool that parses and manipulates URLs, designed to help shell script authors everywhere.

URLs are tricky to parse and there are numerous security problems in software because of this. trurl wants to help soften this problem by taking away the need for script and command line authors everywhere to re-invent the wheel over and over.

trurl uses libcurl’s URL parser and will thus parse and understand URLs exactly the same as curl the command line tool does – making it the perfect companion tool.

I created trurl on March 31, 2023.

Some command line examples

Given just a URL (even without scheme), it will parse it and output a normalized version:

$ trurl ex%61mple.com/
http://example.com/

The above command will guess on a http:// scheme when none was provided. The guess has basic heuristics, like for example FTP server host names often starts with ftp:

$ trurl ftp.ex%61mple.com/
ftp://ftp.example.com/

A user can output selected components of a provided URL. Like if you only want to extract the path or the query components from it.:

$ trurl https://curl.se/?search=foobar --get '{path}'
/

Or both (with extra text intermixed):

$ trurl https://curl.se/?search=foobar --get 'p: {path} q: {query}'
p: / q: search=foobar

A user can create a URL by providing the different components one by one and trurl outputs the URL:

$ trurl --set scheme=https --set host=fool.wrong
https://fool.wrong/

Reset a specific previously populated component by setting it to nothing. Like if you want to clear the user component:

$ trurl https://daniel@curl.se/--set user=
https://curl.se/

trurl tells you the full new URL when the first URL is redirected to a second relative URL:

$ trurl https://curl.se/we/are/here.html --redirect "../next.html"
https://curl.se/we/next.html

trurl provides easy-to-use options for adding new segments to a URL’s path and query components. Not always easily done in shell scripts:

$ trurl https://curl.se/we/are --append path=index.html
https://curl.se/we/are/index.html
$ trurl https://curl.se?info=yes --append query=user=loggedin
https://curl.se/?info=yes&user=loggedin

trurl can work on a single URL or any amount of URLs passed on to it. The modifications and extractions are then performed on them all, one by one.

$ trurl https://curl.se localhost example.com 
https://curl.se/
http://localhost/
http://example.com/

trurl can read URLs to work on off a file or from stdin, and works on them in a streaming fashion suitable for filters etc.

$ cat many-urls.yxy | trurl --url-file -
...

More or different

trurl was born just a few days ago, this is what we have made it do so far. There is a high probability that it will change further going forward before it settles on exactly how things ideally should work.

It also means that we are extra open for and welcoming to feedback, ideas and pull-requests. With some luck, this could become a new everyday tool for all of us.

Tell us on GitHub!

Fossified pilot episode

Henrik, Johan, Magnus and I are all Swedish “FOSS people” and friends since many years back.

We like open source. We work with Open Source. We have contributed to Open Source since a long time back. We also have slightly different backgrounds and areas of expertise so we don’t all just totally overlap.

We decided we wanted to try putting together a podcast and talk about all things FOSS: from lightweight news down to more deep dives and interviews and discussions with peeps who know more. With our takes and personal views applied of course.

We named it Fossified. We have recorded a first pilot where we test the concept a little, but more importantly this is just the beginning and we have created a GitHub repository where we collect program ideas and proposals.

We certainly need and appreciate your help. With ideas for topics and guests. Perhaps even with a logo or why not an intro song?

The Pilot

We have recorded our first episode. You can find it on our very fancy website fossified.com.

Uncurled – the presentation

Uncurled – everything I know and learned about running and maintaining Open Source projects for three decades.

This is me, doing a live English-speaking presentation/webinar on these topics that I cover in my book: Uncurled.

Recording

Date: Tuesday August 23, 2022

Time: 10: 00 UTC (12:00 CEST)

Where: over zoom [Sign up]

The plan is to record this session and make it available after the fact on YouTube. This post will be updated with a link to that once it exists.

Agenda

Here’s the outlook on what I hope to be able to cover in a 40 minutes talk.

This will be followed by a Q&A-session with me answering any questions you might have. Feel most welcome and encouraged to submit your questions ahead of time if you already have some! (comment here, email me, comment or DM on Twitter, send a carrier pigeon, anything!)

I have not done this presentation before. I know the subject very intimately so I have no worries about that. The timing of the thing is what is going to be my bigger challenge I think. I aim for no more than 40 minutes of me blabbing.

I don’t know who uses my code

When I (in spite of knowing better) talk to ordinary people about what I do for a living and the project I work on, one of the details about it that people have the hardest time to comprehend, is the fact that I really and truly don’t know a lot about who uses my code. (Or where. Or what particular features they use.)

I work on curl full-time and we ship releases frequently. Users download the curl source code from us, build curl and put it to use. Most of “my” users never tell me or anyone else in the curl project that they use curl or libcurl. This is of course perfectly fine and I probably could not even handle the flood if every user would tell me.

This not-knowing is a most common situation for Open Source authors and projects. It is not unique for me.

The not knowing your users is otherwise unusual in a world of products and software, and quite frankly, sometimes it is an obstacle for us as well since we lack a good way to communicate with users about plans, changes or ideas. It also makes it really hard to estimate our own success and the always-recurring question: how many users do you have?

To be fair: I do know quite a lot of users as well. But I don’t know how representative they are, nor how big fraction of the totals they consist of etc.

How to find out that someone uses your code

I wrote a section in Uncurled about How to find out that someone uses your code, and these are three main ways I learn about users of my code:

  1. A user of my project runs into an issue and asks for help in a project forum – without hiding where they come from or what product they make.
  2. While vanity-searching I find my project’s Open Source license mentioned for or bundled with a product or tool or device.
  3. A user emails me asking funny questions about a product I never heard of, and then I realize they found my email because that product uses my code and the license has my email in it.

curl is REUSE compliant

The REUSE project is an effort to make Open Source projects provide copyright and license information (for all files) in a machine readable way.

When a project is fully REUSE compliant, you can easily figure out the copyright and license situation for every single file it holds.

The easiest way to accomplish this is to make sure that all files have the correct header with the appropriate copyright info and SPDX-License-Identifier specified, but it also has ways to provide that meta data in adjacent files – for files where prepending that info isn’t sensible.

What we needed to do

We were already in a fairly good place before this push. We have a script that verifies the presence of copyright header in files (including checking the end year vs the latest git commit), with a list of files that were deliberately skipped.

The biggest things we needed to do were

  1. Add the SPDX identifier all over
  2. Make sure that the skipped files also have copyright and licensing info provided
  3. Add a CI job that verifies that we remain compliant

I also ended up adjusting our own copyright scan script to use the REUSE metadata files instead of its own ignore filters which also made it even easier for us to make sure we are and remain compatible — that every single files in the curl git repository has a known and documented license and copyright situation.

As a bonus, the cleanup work helped us detect an example file that stood out which we got relicensed and we removed two older files that had their own unique licenses (without any good reason).

There are 3518 files in the curl git repository this exact moment.

Compliant!

Starting mid-June 2022, curl is 100% REUSE compliant. curl 7.84.0 will be the first release done in this status.

Motivation

I think it is a good idea to have perfect control over the copyright and license situation for every single file, and to make sure that the situation is documented enough and to a level that allows anyone and everyone to check it out and learn how things lie. No surprises.

Companies have obviously figured out this info before to a degree that they have been satisfied with since curl is widely used even commercial since a long time. But I believe that by providing the information in an even easier and more descriptive way makes things even better. For existing and future users.

I also think that the low threshold for us to reach this compliance was a factor. We were almost there already. We just need to polish up some small details and I think it made it worth it.

This cleanup also makes sure we have perfect control and knowledge of the license situation, now and going forward. I think this can be expected from a project aiming for gold standard.

The curl SPDX license identifier

Keen readers will notice that curl has its own license identifier. It is called the curl license. Not MIT, X or a BSD variation. curl.

The reason for this is good old stupidity. In January 2001 we adopted the MIT license for use in the project because we believed it better matches what we want compared to the previous license situation. We started out with a dual license situation together with the MPL license we used previously, but the MPL part was removed completely in October 2002.

For reasons that have since been forgotten, we thought it was a good idea to edit the license text. To trim it a little. Since August 2002, the license text that started out as an MIT/X license is no longer a perfect copy. It is a derivative . Very similar and almost identical. But it’s not the same.

When the SPDX project created their set of identifiers for well-used licenses out in the FOSS world they decided that the curl license is different enough from the MIT/X license to treat it separately and give it its own identifier. I know of no other project than curl that uses this particular edited version of the MIT license.

In hindsight, I believe the editing of the license text back in 2002 was dumb. I regret it, but I will not change it again. I think we can live with this situation pretty good.

Credits

Most of the heavy lifting necessary to make curl compliant was done by Max Mehl.

Meeting the Cyber Safety Review Board

Three Open Source hackers were invited to this meeting with the CSRB and I was one of them.

The board with this name is part of CISA, a US government effort that received a presidential order to work on “Improving the Nation’s Cybersecurity“. Where “the Nation” here is the US.

I’m not in the US and I’m not a US citizen but I felt I should help out when asked and I was able to.

On April 21 2022, I joined the video meeting together with an OpenSSL and a Tomcat contributor and several members of the board. (I am not naming any names of participants in this post because I have not asked for permission nor do I think the names are important here.)

For about an hour we talked to the board how we develop Open Source, how we take on security problems and how we work on making sure we do things as securely as we can. It was striking how similarly the three of us looked at the issues and how we work in our project, despite our projects all being different and having our own specifics.

As projects, we believe we have pretty well-established and working procedures for getting problems reported and we think we fix the issues fairly swiftly. We ship fixes, advisories and updates not long after the issues get known. The CVE system where we register and publish security vulnerabilities in a global registry is working adequately. (I’m not saying things are perfect.)

The main problem

It was pretty clear to me that we agreed that the biggest problem in the Open Source supply chain today is the slow uptake in patching vulnerable software.

Lots of vendors and products have not been made or have any plans for how to handle upgrades when vulnerabilities are found. Many of those that do act, do that with such glacier like speeds that users of such products remain exposed for attackers for a long period after the flaws are already fixed and have become known.

My own analysis of this is that such vendors of course do this because its the cheapest way. Plain capitalistic reasons.

Addressing this is hard

If we had any easy fixes for this, we would already have them in progress. We were also asked by the board what kind of systems that we would not like to see.

Will Software Bill Of Materials (SBOM) fix this? Maybe it can help, by exposing to the world what software and versions are used in products, but it will certainly depend on how it is used and enforced. If done too heavy-handed, it risks causing overhead and added complications but in the other end it might end up too wishy-washy.

Ended there

This was just an hour of conversation with a few follow-up clarifying emails. I hope that we were able to provide insights into how Open Source is made but I have no illusions of us changing anything in drastic ways.

I felt honored to represent “my kind” and help sharing knowledge of Open Source to areas of the world that might not always get informed about it.

Uncurled

– Everything I know and learned about running and maintaining Open Source projects for three decades.

For several years now, I have had a blog post series in mind to describe something about what people could expect to happen in Open Source projects. I had a few already half-started blog post drafts for some sub topics.

I couldn’t really make up my mind how to craft a series of blog posts about this wide topic in a sensible way so I kept postponing it for later. I did this for years.

A book, it has to be a book

It just dawned on my one day: the only way to get all this into a comprehensible way that also can hold all the thoughts I would like it to have, is to put it into a book. By book, I mean a document. An essay. A collection of pages. A booklet maybe. I don’t know how many words it might end up to become and I have no illusions of it ever ending up in print.

I mean to write the document in the open and provide it for free, online. Open Source style.

Day one

I grabbed my original draft for my blog series “You can expect this in your Open Source project”. I had worked on that document in the background for a long time, adding some little thing here and there over years – and it now had maybe twenty-five “lessons” listed with a short paragraph of text next to each.

I also had started three blog posts based on such lessons that were in pending state here on daniel.haxx.se in my queue of drafts.

I first copied the blog post content back into the text file from those potential blog posts, before I deleted them, and converted the entire file to markdown.

I then grouped the “lessons” I had listed in the markdown file and moved them into a few different sections. Like what to expect, code, money, people and project. I put subtitles into separate files for those five main areas.

How hard can it be?

I didn’t want to do a lot of work before I put the thing into git, and I didn’t want to run any private git repository so I had to make a new repo with a name. I went with “How hard can it be” as a working title and created the repo on GitHub. On April 6 I made the first git push with initial contents to that repository.

The first external contributor appeared after just a few minutes with the first pull-request fixing typos. Clearly people are following me on GitHub and spotted the creating of the repository and checked out what it was. I hadn’t told anyone or given any pointers.

I started expanding on subjects in the book.

Let’s get a real title

In the evening of April 7 I posted this question on Twitter:

"If I write a booklet collecting everything I know and learned about running and maintaining Open Source projects for three decades, what should I call it?"

I got a flood of replies. Lots of good ones and also lots of fun and sarcastic ones. The one that I think really talked to me the best was also the shortest: Uncurled.

  • It’s short and sweet
  • It includes a reference to curl without saying it is “a curl book” (it isn’t)
  • The topic is a bit about “untangling” and curl is a project that probably has taught me the most of what I include here
  • It sounds a little like “debriefed” from the curl project, and it is…
  • I can put it up on the domain name un.curl.dev

I figured I could possibly go with a longer subtitle that could explain the book more: “Everything I know and learned about running and maintaining Open Source projects”.

A name

I renamed the GitHub repository and added a description there. I created the URL (by adding the “un” CNAME entry in the “curl.dev” domain) and I setup gitbook.com to render the content to appear on un.curl.dev.

With a little more thoughts and then spilling some beans about my plans in my weekly report on April 8 (but not leaking the URL or repo to anyone yet) that made people provide some more ideas, I added more content.

10,000 words

By the evening of April 9, I surpassed 10,000 words of contents. Still having the contents and the order of everything pretty much in flux and not yet sorted out.

20,000 words

On April 25, I surpassed 20,000 words. It starts to look like something I can announce soon.

Getting there, but not done

The uncurled book is now in a state I think I can show off without feeling embarrassed. I believe I will still need to work on it more going forward to add and polish content and make it more coherent and less of a collection of snippets. I hope that I over time can settle down and gradually slow down the change pace. It will of course also depend a lot on the feedback I get.

Cover

Since it doesn’t exist physically and probably never will, I don’t think it actually needs a cover image, but it would probably be cool to still have one to use as an image and symbol for the book. If someone has a good idea or feels artistically inclined to make one, let me know!

More steel

I didn’t expect this, and this year I wasn’t asked ahead of time if I wanted to receive this gift. It is however something of a collector’s item that I find very enjoyable.

I received my GitHub contribution matrix printed in steel. This is my 2021 contribution skyline. (Click the images for higher resolution.)

The thing is surprisingly heavy and sturdy. If I had papers lying around, this would an awesome paper press.

You might remember that I got a similar gift last year, so it felt natural to do a comparison shot of 2021 and 2020.

For 2020, GitHub counted 2,466 contributions, while I reached 2,543 in 2021. Very similar numbers, but clearly distributed very differently. The two matrix images look like this.

2020

2021

Letter

Enclosed with this gift was also a friendly and encouraging letter.

Oh, and you can of course also see a rendered version in your browsers or download it in STL format so that you can print using your own 3d-printer.

Thank you, all my friends at GitHub!