Tag Archives: cURL and libcurl

Age is just a number or two

Kjell, a friend of mine, mailed me a zip file this morning saying he’d found an earlier version of “urlget” lying around. Meaning: an older version than what we provide on the curl download page. urlget was the name we used for the command line tool before we changed the name to curl in March 1998.

I’ve been reckless with some of the source code and keeping track of early history so this made me curios and when I glanced through the source code for urlget 2.4, shipped in October 1997. Kjell had found a project of his own where he’d imported the urlget sources as that was from before the days curl was also a library.

In this source code I also found the original URL to the home page for urlget and its predecessor, httpget: http://www.inf.ufrgs.br/~sagula/urlget.html

I don’t know if I have this info stored somewhere else too, but the important thing here is that it then struck me that I hadn’t checked the Internet Archive for what it has archived for this URL!

The earliest archived version of the urlget page is from February 16 1998. I checked, but there’s no archived version from slightly earlier when the tool was named just httpget. I did however find source code from httpget that was older than I had saved from before: httpget 1.3! 320 lines of hopelessly naive code. From April 14, 1997.

That date is also the only one that has content as the next archived one is just a redirect to the first curl web site over at http://www.fts.frontec.se/~dast/curl/

Two birthdays?

I used this newly found gem to update the curl history page with exact dates for some of the earliest releases and events, as it was previously not very specific there as I hadn’t kept notes.

I could now also once and for all note that the first release of HttpGet (version 0.1) was done on November 11, 1996. My personal participation in the project began at some days/weeks after that, as it is recorded that I provided improvements in the HttpGet 0.2 release that was done on December 17 the same year.

I’ve always counted the age of curl from March 20, 1998 which is when I first released something under the name “curl”, but since we released it as curl 4.0 that is certainly a sign that the time up to that point could possibly also be counted into its age.

It’s not terribly important of when to start the count.

What’s more fun with the particular HttpGet 0.1 release date, is that it is the exact same date Wget was released the first time under that name! It had previously been developed and released under a different name (“geturl”) and exactly on November 11, 1996 Hrvoje Nikši? released Wget 1.4.0 to the world.

Why not go with Wget?

People sometimes ask me why I didn’t use wget to get currencies that winter day back in 1996 when I found HttpGet and started to work on that HTTP client, but the fact is then that not only was the search engines and software hosting alternatives clearly inferior back in those days so finding software could be difficult, wget was also very new to the world. I didn’t learn about the existence of wget until many months later – although I can’t recall exactly when or how.

I also think, looking back at myself in that time, that if I would’ve found wget then, I would probably have thought it to be overkill for my use case and opted to use something else anyway. I mean, I was “just getting a HTTP page” and the wget package was 171KB compressed, while HttpGet 1.3 was still less than 8K in a single source code file… I’m not saying that way of thinking was right!

The curl year 2020

As we’re approaching the end of the year, I just want to sum up the curl year with a few words.

2020 has been another glorious year in the curl project. We’ve seen a series of accomplishments and introductions of new things during this the year of the plague.

Accomplishments

I personally have done more commits in the git repository since any year after 2004 (890 so far).

The total number of commits done in git is the largest since 2014 (1445 plus some).

The number of published curl related CVEs is the lowest since 2013 (6). For the ones we announced, we could reward record amounts in our bug bounty program!

139 authors wrote commits that were merged (so far).

We did nine curl releases, out of which two unfortunately were quicker “panic releases” that patched up problems in the previous release.

Seven changes to remember

We’ve logged no less than 905 bug-fixes and 30 changes in the releases of this year, but the seven perhaps most memorable things we’ve introduced in 2020 are…

Videos

This year I’ve introduced the concept of doing a “release presentation” for every release. Those are videos where I go through and discuss the changes, the security releases and some interesting bug-fixes. Each release links to those from the changelog page on the website.

New home

This is the year when we finally got ourselves a curl domain. curl.se is our new home.

What didn’t happen

We cancelled curl up 2020 due to Covid-19. It was planned to happen in Berlin. We did it purely online instead. We’re not planning any new physical curl up for 2021 either. Let’s just wait and see what happens with the pandemic next year and hope that we might be able to go back and have a physical meetup in 2022…

It is a curl world

curl supports NASA

Not everyone understands how open source is made. I received the following email from NASA a while ago.

Subject: Curl Country of Origin and NDAA Compliance

Hello, my name is [deleted] and I am a Supply Chain Risk Management Analyst at NASA. As such, I ensure that all NASA acquisitions of Covered Articles comply with Section 208 of the Further Consolidated Appropriations Act, 2020, Public Law 116-94, enacted December 20, 2019. To do so, the Country of Origin (CoO) information must be obtained from the company that develops, produces, manufactures, or assembles the product(s). To do so, please provide an email response or a formal document (a PDF on company letterhead is preferred, but a simple statement is sufficient) specifically identifying the country, or countries, in which Curl is developed and maintained

If the country of origin is outside the United States, please provide any information you may have stating that testing is performed in the United States prior to supplying products to customers. Additionally, if available, please identify all authorized resellers of the product in question.

Lastly, please confirm that Curl is not developed by, contain components developed by, or receive substantial influence from entities prohibited by Section 889 of the 2019 NDAA. These entities include the following companies and any of their subsidiaries or affiliates:

Hytera Communications Corporation
Huawei Technologies Company
ZTE Corporation
Dahua Technology Company
Hangzhou Hikvision Digital Technology Company

Finally, we have a time frame of 5 days for a response.
Thank you,

My answer

Okay, I first considered going with strong sarcasm in my reply due to the complete lack of understanding, and the implied threat in that last line. What would happen if I wouldn’t respond in time?

Then it struck me that this could be my chance to once and for all get a confirmation if curl is already actually used in space or not. So I went with informative and a friendly tone.

Hi [name],

I will answer to these questions below to the best of my ability, and maybe you can answer something for me?

curl (https://curl.se) is an open source project that creates two products, curl the command line tool and libcurl the library. I am the founder, lead developer and core maintainer of the project. To this date, I have done about 57% of the 26,000 changes in the source code repository. The remaining 43% have been done by 841 different volunteers and contributors from all over the world. Their names can be extracted from our git repository: https://github.com/curl/curl

You can also see that I own most, but not all, copyrights in the project.

I am a citizen of Sweden and I’ve been a citizen of Sweden during the entire time I’ve done all and any work on curl. The remaining 841 co-authors are from all over the world, but primarily from western European countries and the US. You could probably say that we live primarily “on the Internet” and not in any particular country.

We don’t have resellers. I work for an American company (wolfSSL) where we do curl support for customers world-wide.

Our testing is done universally and is not bound to any specific country or region. We test our code substantially before release.

Me knowingly, we do not have any components or code authored by people at any of the mentioned companies.

So finally my question: can you tell me anything about where or for what you use curl? Is it used in anything in space?

Regards,
Daniel

Used in space?

Of course my attempt was completely in vain and the answer back was very brief and it just said…

“We are using curl to support NASA’s mission and vision.”

Credits

Space ship image by Elias Sch. from Pixabay

the critical curl

Google has, as part of their involvement in the Open Source Security Foundation (OpnSSF), come up with a “Criticality Score” for open source projects.

It is a score between 0 (least critical) and 1 (most critical)

The input variables are:

  • time since project creation
  • time since last update
  • number of committers
  • number or organizations among the top committers
  • number of commits per week the last year
  • number of releases the last year
  • number of closed issues the last 90 days
  • number of updated issues the last 90 days
  • average number of comments per issue the last 90 days
  • number of project mentions in the commit messages

The best way to figure out exactly how to calculate the score based on these variables is to check out their github page.

The top-10 C based projects

The project has run the numbers on projects hosted on GitHub (which admittedly seriously limits the results) and they host these generated lists of the 200 most critical projects written in various languages.

Checking out the top list for C based projects, we can see the top 10 projects with the highest criticality scores being:

  1. git
  2. Linux (raspberry pi)
  3. Linux (torvald version)
  4. PHP
  5. OpenSSL
  6. systemd
  7. curl
  8. u-boot
  9. qemu
  10. mbed-os

What now then?

After having created the scoring system and generated lists, step 3 is said to be “Use this data to proactively improve the security posture of these critical projects.“.

Now I think we have a pretty strong effort on security already in curl and Google helped us strengthen it even more recently, but I figure we can never have too much help or focus on improving our project.

Credits

Image by Thaliesin from Pixabay

curl 7.74.0 with HSTS

Welcome to another curl release, 56 days since the previous one.

Release presentation

Numbers

the 196th release
1 change
56 days (total: 8,301)

107 bug fixes (total: 6,569)
167 commits (total: 26,484)
0 new public libcurl function (total: 85)
6 new curl_easy_setopt() option (total: 284)

1 new curl command line option (total: 235)
46 contributors, 22 new (total: 2,292)
22 authors, 8 new (total: 843)
3 security fixes (total: 98)
1,600 USD paid in Bug Bounties (total: 4,400 USD)

Security

This time around we have no less than three vulnerabilities fixed and as shown above we’ve paid 1,600 USD in reward money this time, out of which the reporter of the CVE-2020-8286 issue got the new record amount 900 USD. The second one didn’t get any reward simply because it was not claimed. In this single release we doubled the number of vulnerabilities we’ve published this year!

The six announced CVEs during 2020 still means this has been a better year than each of the six previous years (2014-2019) and we have to go all the way back to 2013 to find a year with fewer CVEs reported.

I’m very happy and proud that we as an small independent open source project can reward these skilled security researchers like this. Much thanks to our generous sponsors of course.

CVE-2020-8284: trusting FTP PASV responses

When curl performs a passive FTP transfer, it first tries the EPSV command and if that is not supported, it falls back to using PASV. Passive mode is what curl uses by default.

A server response to a PASV command includes the (IPv4) address and port number for the client to connect back to in order to perform the actual data transfer.

This is how the FTP protocol is designed to work.

A malicious server can use the PASV response to trick curl into connecting back to a given IP address and port, and this way potentially make curl extract information about services that are otherwise private and not disclosed, for example doing port scanning and service banner extractions.

If curl operates on a URL provided by a user (which by all means is an unwise setup), a user can exploit that and pass in a URL to a malicious FTP server instance without needing any server breach to perform the attack.

There’s no really good solution or fix to this, as this is how FTP works, but starting in curl 7.74.0, curl will default to ignoring the IP address in the PASV response and instead just use the address it already uses for the control connection. In other words, we will enable the CURLOPT_FTP_SKIP_PASV_IP option by default! This will cause problems for some rare use cases (which then have to disable this), but we still think it’s worth doing.

CVE-2020-8285: FTP wildcard stack overflow

libcurl offers a wildcard matching functionality, which allows a callback (set with CURLOPT_CHUNK_BGN_FUNCTION) to return information back to libcurl on how to handle a specific entry in a directory when libcurl iterates over a list of all available entries.

When this callback returns CURL_CHUNK_BGN_FUNC_SKIP, to tell libcurl to not deal with that file, the internal function in libcurl then calls itself recursively to handle the next directory entry.

If there’s a sufficient amount of file entries and if the callback returns “skip” enough number of times, libcurl runs out of stack space. The exact amount will of course vary with platforms, compilers and other environmental factors.

The content of the remote directory is not kept on the stack, so it seems hard for the attacker to control exactly what data that overwrites the stack – however it remains a Denial-Of-Service vector as a malicious user who controls a server that a libcurl-using application works with under these premises can trigger a crash.

CVE-2020-8286: Inferior OCSP verification

libcurl offers “OCSP stapling” via the CURLOPT_SSL_VERIFYSTATUS option. When set, libcurl verifies the OCSP response that a server responds with as part of the TLS handshake. It then aborts the TLS negotiation if something is wrong with the response. The same feature can be enabled with --cert-status using the curl tool.

As part of the OCSP response verification, a client should verify that the response is indeed set out for the correct certificate. This step was not performed by libcurl when built or told to use OpenSSL as TLS backend.

This flaw would allow an attacker, who perhaps could have breached a TLS server, to provide a fraudulent OCSP response that would appear fine, instead of the real one. Like if the original certificate actually has been revoked.

Change

There’s really only one “change” this time, and it is an experimental one which means you need to enable it explicitly in the build to get to try it out. We discourage people from using this in production until we no longer consider it experimental but we will of course appreciate feedback on it and help to perfect it.

The change in this release introduces no less than 6 new easy setopts for the library and one command line option: support HTTP Strict-Transport-Security, also known as HSTS. This is a system for HTTPS hosts to tell clients to attempt to contact them over insecure methods (ie clear text HTTP).

One entry-point to the libcurl options for HSTS is the CURLOPT_HSTS_CTRL man page.

Bug-fixes

Yet another release with over one hundred bug-fixes accounted for. I’ve selected a few interesting ones that I decided to highlight below.

enable alt-svc in the build by default

We landed the code and support for alt-svc: headers in early 2019 marked as “experimental”. We feel the time has come for this little baby to grow up and step out into the real world so we removed the labeling and we made sure the support is enabled by default in builds (you can still disable it if you want).

8 cmake fixes bring cmake closer to autotools level

In curl 7.73.0 we removed the “scary warning” from the cmake build that warned users that the cmake build setup might be inferior. The goal was to get more people to use it, and then by extension help out to fix it. The trick might have worked and we’ve gotten several improvements to the cmake build in this cycle. More over, we’ve gotten a whole slew of new bug reports on it as well so now we have a list of known cmake issues in the KNOWN_BUGS document, ready for interested contributors to dig into!

configure now uses pkg-config to find openSSL when cross-compiling

Just one of those tiny weird things. At some point in the past someone had trouble building OpenSSL cross-compiled when pkg-config was used so it got disabled. I don’t recall the details. This time someone had the reversed problem so now the configure script was fixed again to properly use pkg-config even when cross-compiling…

curl.se is the new home

You know it.

curl: only warn not fail, if not finding the home dir

The curl tool attempts to find the user’s home dir, the user who invokes the command, in order to look for some files there. For example the .curlrc file. More importantly, when doing SSH related protocol it is somewhat important to find the file ~/.ssh/known_hosts. So important that the tool would abort if not found. Still, a command line can still work without that during various circumstances and in particular if -k is used so bailing out like that was nothing but wrong…

curl_easy_escape: limit output string length to 3 * max input

In general, libcurl enforces an internal string length limit that prevents any string to grow larger than 8MB. This is done to prevent mistakes or abuse. Due a mistake, the string length limit was enforced wrongly in the curl_easy_escape function which could make the limit a third of the intended size: 2.67 MB.

only set USE_RESOLVE_ON_IPS for Apple’s native resolver use

This define is set internally when the resolver function is used even when a plain IP address is given. On macOS for example, the resolver functions are used to do some conversions and thus this is necessary, while for other resolver libraries we avoid the resolver call when we can convert the IP number to binary internally more effectively.

By a mistake we had enabled this “call getaddrinfo() anyway”-logic even when curl was built to use c-ares on macOS.

fix memory leaks in GnuTLS backend

We used two functions to extract information from the server certificate that didn’t properly free the memory after use. We’ve filed subsequent bug reports in the GnuTLS project asking them to make the required steps much clearer in their documentation so that perhaps other projects can avoid the same mistake going forward.

libssh2: fix transport over HTTPS proxy

SFTP file transfers didn’t work correctly since previous fixes obviously weren’t thorough enough. This fix has been confirmed fine in use.

make curl –retry work for HTTP 408 responses too

Again. We made the --retry logic work for 408 once before, but for some inexplicable reasons the support for that was accidentally dropped when we introduced parallel transfer support in curl. Regression fixed!

use OPENSSL_init_ssl() with >= 1.1.0

Initializing the OpenSSL library the correct way is a task that sounds easy but always been a source for problems and misunderstandings and it has never been properly documented. It is a long and boring story that has been going on for a very long time. This time, we add yet another chapter to this novel when we start using this function call when OpenSSL 1.1.0 or later (or BoringSSL) is used in the build. Hopefully, this is one of the last chapters in this book.

“scheme-less URLs” not longer accept blank port number

curl operates on “URLs”, but as a special shortcut it also supports URLs without the scheme. For example just a plain host name. Such input isn’t at all by any standards an actual URL or URI; curl was made to handle such input to mimic how browsers work. curl “guesses” what scheme the given name is meant to have, and for most names it will go with HTTP.

Further, a URL can provide a specific port number using a colon and a port number following the host name, like “hostname:80” and the path then follows the port number: “hostname:80/path“. To complicate matters, the port number can be blank, and the path can start with more than one slash: “hostname://path“.

curl’s logic that determines if a given input string has a scheme present checks the first 40 bytes of the string for a :// sequence and if that is deemed not present, curl determines that this is a scheme-less host name.

This means [39-letter string]:// as input is treated as a URL with a scheme and a scheme that curl doesn’t know about and therefore is rejected as an input, while [40-letter string]:// is considered a host name with a blank port number field and a path that starts with double slash!

In 7.74.0 we remove that potentially confusing difference. If the URL is determined to not have a scheme, it will not be accepted if it also has a blank port number!

I am an 80 column purist

I write and prefer code that fits within 80 columns in curl and other projects – and there are reasons for it. I’m a little bored by the people who respond and say that they have 400 inch monitors already and they can use them.

I too have multiple large high resolution screens – but writing wide code is still a bad idea! So I decided I’ll write down my reasoning once and for all!

Narrower is easier to read

There’s a reason newspapers and magazines have used narrow texts for centuries and in fact even books aren’t using long lines. For most humans, it is simply easier on the eyes and brain to read texts that aren’t using really long lines. This has been known for a very long time.

Easy-to-read code is easier to follow and understand which leads to fewer bugs and faster debugging.

Side-by-side works better

I never run windows full sized on my screens for anything except watching movies. I frequently have two or more editor windows next to each other, sometimes also with one or two extra terminal/debugger windows next to those. To make this feasible and still have the code readable, it needs to fit “wrapless” in those windows.

Sometimes reading a code diff is easier side-by-side and then too it is important that the two can fit next to each other nicely.

Better diffs

Having code grow vertically rather than horizontally is beneficial for diff, git and other tools that work on changes to files. It reduces the risk of merge conflicts and it makes the merge conflicts that still happen easier to deal with.

It encourages shorter names

A side effect by strictly not allowing anything beyond column 80 is that it becomes really hard to use those terribly annoying 30+ letters java-style names on functions and identifiers. A function name, and especially local ones, should be short. Having long names make them really hard to read and makes it really hard to spot the difference between the other functions with similarly long names where just a sub-word within is changed.

I know especially Java people object to this as they’re trained in a different culture and say that a method name should rather include a lot of details of the functionality “to help the user”, but to me that’s a weak argument as all non-trivial functions will have more functionality than what can be expressed in the name and thus the user needs to know how the function works anyway.

I don’t mean 2-letter names. I mean long enough to make sense but not be ridiculous lengths. Usually within 15 letters or so.

Just a few spaces per indent level

To make this work, and yet allow a few indent levels, the code basically have to have small indent-levels, so I prefer to have it set to two spaces per level.

Many indent levels is wrong anyway

If you do a lot of indent levels it gets really hard to write code that still fits within the 80 column limit. That’s a subtle way to suggest that you should not write functions that needs or uses that many indent levels. It should then rather be split out into multiple smaller functions, where then each function won’t need that many levels!

Why exactly 80?

Once upon the time it was of course because terminals had that limit and these days the exact number 80 is not a must. I just happen to think that the limit has worked fine in the past and I haven’t found any compelling reason to change it since.

It also has to be a hard and fixed limit as if we allow a few places to go beyond the limit we end up on a slippery slope and code slowly grow wider over time – I’ve seen it happen in many projects with “soft enforcement” on code column limits.

Enforced by a tool

In curl, we have ‘checksrc’ which will yell errors at any user trying to build code with a too long line present. This is good because then we don’t have to “waste” human efforts to point this out to contributors who offer pull requests. The tool will point out such mistakes with ruthless accuracy.

Credits

Image by piotr kurpaska from Pixabay

The curl web infrastructure

The purpose of the curl web site is to inform the world about what curl and libcurl are and provide as much information as possible about the project, the products and everything related to that.

The web site has existed in some form for as long as the project has, but it has of course developed and changed over time.

Independent

The curl project is completely independent and stands free from influence from any parent or umbrella organization or company. It is not even a legal entity, just a bunch of random people cooperating over the Internet. And a bunch of awesome sponsors to help us.

This means that we have no one that provides the infrastructure or marketing for us. We need to provide, run and care for our own servers and anything else we think we should offer our users.

I still do a lot of the work in curl and the curl web site and I work full time on curl, for wolfSSL. This might of course “taint” my opinions and views on matters, but doesn’t imply ownership or control. I’m sure we’re all colored by where we work and where we are in our lives right now.

Contents

Most of the web site is static content: generated HTML pages. They are served super-fast and very lightweight by any web server software.

The web site source exists in the curl-www repository (hosted on GitHub) and the web site syncs itself with the latest repository changes several times per hour. The parts of the site that aren’t static are mostly consisting of smaller scripts that run either on demand at the time of a request or on an interval in a cronjob in the background. That is part of the reason why pushing an update to the web site’s repository can take a little while until it shows up on the live site.

There’s a deliberate effort at not duplicating information so a lot of the web pages you can find on the web site are files that are converted and “HTMLified” from the source code git repository.

“Design”

Some people say the curl web site is “retro”, others that it is plain ugly. My main focus with the site is to provide and offer all the info, and have it be accurate and accessible. The look and the design of the web site is a constant battle, as nobody who’s involved in editing or polishing the web site is really interested in or particularly good at design, looks or UX. I personally have done most of the editing of it, including CSS etc and I can tell you that I’m not good at it and I don’t enjoy it. I do it because I feel I have to.

I get occasional offers to “redesign” the web site, but the general problem is that those offers almost always involve rebuilding the entire thing using some current web framework, not just fixing the looks, layout or hierarchy. By replacing everything like that we’d get a lot of problems to get the existing information in there – and again, the information is more important than the looks.

The curl logo is designed by a proper designer however (Adrian Burcea).

If you want to help out designing and improving the web site, you’d be most welcome!

Who

I’ve already touched on it: the web site is mostly available in git so “anyone” can submit issues and pull-requests to improve it, and we are around twenty persons who have push rights that can then make a change on the live site. In reality of course we are not that many who work on the site any ordinary month, or even year. During the last twelve month period, 10 persons authored commits in the web repository and I did 90% of those.

How

Technically, we build the site with traditional makefiles and we generate the web contents mostly by preprocessing files using a C-like preprocessor called fcpp. This is an old and rather crude setup that we’ve used for over twenty years but it’s functional and it allows us to have a mostly static web site that is also fairly easy to build locally so that we can work out and check improvements before we push them to the git repository and then out to the world.

The web site is of course only available over HTTPS.

Hosting

The curl web site is hosted on an origin VPS server in Sweden. The machine is maintained by primarily by me and is paid for by Haxx. The exact hosting is not terribly important because users don’t really interact with our server directly… (Also, as they’re not sponsors we’re just ordinary customers so I won’t mention their name here.)

CDN

A few years ago we experienced repeated server outages simply because our own infrastructure did not handle the load very well, and in particular not the traffic spikes that could occur when I would post a blog post that would suddenly reach a wide audience.

Enter Fastly. Now, when you go to curl.se (or daniel.haxx.se) you don’t actually reach the origin server we admin, you will instead reach one of Fastly’s servers that are distributed across the world. They then fetch the web contents from our origin, cache it on their edge servers and send it to you when you browse the site. This way, your client speaks to a server that is likely (much) closer to you than the origin server is and you’ll get the content faster and experience a “snappier” web site. And our server only gets a tiny fraction of the load.

Technically, this is achieved by the name curl.se resolving to a number of IP addresses that are anycasted. Right now, that’s 4 IPv4 addresses and 4 IPv6 addresses.

The fact that the CDN servers cache content “a while” is another explanation to why updated contents take a little while to “take effect” for all visitors.

DNS

When we just recently switched the site over to curl.se, we also adjusted how we handle DNS.

I run our own main DNS server where I control and admin the zone and the contents of it. We then have four secondary servers to help us really up our reliability. Out of those four secondaries, three are sponsored by Kirei and are anycasted. They should be both fast and reliable for most of the world.

With the help of fabulous friends like Fastly and Kirei, we hope that the curl web site and services shall remain stable and available.

DNS enthusiasts have remarked that we don’t do DNSSEC or registry-lock on the curl.se domain. I think we have reason to consider and possibly remedy that going forward.

Traffic

The curl web site is just the home of our little open source project. Most users out there in the world who run and use curl or libcurl will not download it from us. Most curl users get their software installation from their Linux distribution or operating system provider. The git repository and all issues and pull-requests are done on GitHub.

Relevant here is that we have no logging and we run no ads or any analytics. We do this for maximum user privacy and partly because of laziness, since handling logging from the CDN system is work. Therefore, I only have aggregated statistics.

In this autumn of 2020, over a normal 30 day period, the web site serves almost 11 TB of data to 360 million HTTP requests. The traffic volume is up from 3.5 TB the same time last year. 11 terabytes per 30 days equals about 4 megabytes per second on average.

Without logs we cannot know what people are downloading – but we can guess! We know that the CA cert bundle is popular and we also know that in today’s world of containers and CI systems, a lot of things out there will download the same packages repeatedly. Otherwise the web site is mostly consisting of text and very small images.

One interesting specific pattern on the server load that’s been going on for months: every morning at 05:30 UTC, the site gets over 50,000 requests within that single minute, during which 10 gigabytes of data is downloaded. The clients are distributed world wide as I see the same pattern on access points all over. The minute before and the minute after, the average traffic rate remains at 200MB/minute. It makes for a fun graph:

An eight hour zoomed in window of bytes transferred from the web site. UTC times.

Our servers suffer somewhat from being the target of weird clients like qqgamehall that continuously “hammer” the site with requests at a high frequency many months after we started always returning error to them. An effect they have is that they make the admin dashboard to constantly show a very high error rate.

Software

The origin server runs Debian Linux and Apache httpd. It has a reverse proxy based on nginx. The DNS server is bind. The entire web site is built with free and open source. Primarily: fcpp, make, roffit, perl, curl, hypermail and enscript.

If you curl the curl site, you can see in response headers that Fastly uses Varnish.

This is how I git

Every now and then I get questions on how to work with git in a smooth way when developing, bug-fixing or extending curl – or how I do it. After all, I work on open source full time which means I have very frequent interactions with git (and GitHub). Simply put, I work with git all day long. Ordinary days, I issue git commands several hundred times.

I have a very simple approach and way of working with git in curl. This is how it works.

command line

I use git almost exclusively from the command line in a terminal. To help me see which branch I’m working in, I have this little bash helper script.

brname () {
  a=$(git rev-parse --abbrev-ref HEAD 2>/dev/null)
  if [ -n "$a" ]; then
    echo " [$a]"
  else
    echo ""
  fi
}
PS1="\u@\h:\w\$(brname)$ "

That gives me a prompt that shows username, host name, the current working directory and the current checked out git branch.

In addition: I use Debian’s bash command line completion for git which is also really handy. It allows me to use tab to complete things like git commands and branch names.

git config

I of course also have my customized ~/.gitconfig file to provide me with some convenient aliases and settings. My most commonly used git aliases are:

st = status --short -uno
ci = commit
ca = commit --amend
caa = commit -a --amend
br = branch
co = checkout
df = diff
lg = log -p --pretty=fuller --abbrev-commit
lgg = log --pretty=fuller --abbrev-commit --stat
up = pull --rebase
latest = log @^{/RELEASE-NOTES:.synced}..

The ‘latest’ one is for listing all changes done to curl since the most recent RELEASE-NOTES “sync”. The others should hopefully be rather self-explanatory.

The config also sets gpgsign = true, enables mailmap and a few other things.

master is clean and working

The main curl development is done in the single curl/curl git repository (primarily hosted on GitHub). We keep the master branch the bleeding edge development tree and we work hard to always keep that working and functional. We do our releases off the master branch when that day comes (every eight weeks) and we provide “daily snapshots” from that branch, put together – yeah – daily.

When merging fixes and features into master, we avoid merge commits and use rebases and fast-forward as much as possible. This makes the branch very easy to browse, understand and work with – as it is 100% linear.

Work on a fix or feature

When I start something new, like work on a bug or trying out someone’s patch or similar, I first create a local branch off master and work in that. That is, I don’t work directly in the master branch. Branches are easy and quick to do and there’s no reason to shy away from having loads of them!

I typically name the branch prefixed with my GitHub user name, so that when I push them to the server it is noticeable who is the creator (and I can use the same branch name locally as I do remotely).

$ git checkout -b bagder/my-new-stuff-or-bugfix

Once I’ve reached somewhere, I commit to the branch. It can then end up one or more commits before I consider myself “done for now” with what I was set out to do.

I try not to leave the tree with any uncommitted changes – like if I take off for the day or even just leave for food or an extended break. This puts the repository in a state that allows me to easily switch over to another branch when I get back – should I feel the need to. Plus, it’s better to commit and explain the change before the break rather than having to recall the details again when coming back.

Never stash

“git stash” is therefore not a command I ever use. I rather create a new branch and commit the (temporary?) work in there as a potential new line of work.

Show it off and get reviews

Yes I am the lead developer of the project but I still maintain the same work flow as everyone else. All changes, except the most minuscule ones, are done as pull requests on GitHub.

When I’m happy with the functionality in my local branch. When the bug seems to be fixed or the feature seems to be doing what it’s supposed to do and the test suite runs fine locally.

I then clean up the commit series with “git rebase -i” (or if it is a single commit I can instead use just “git commit --amend“).

The commit series should be a set of logical changes that are related to this change and not any more than necessary, but kept separate if they are separate. Each commit also gets its own proper commit message. Unrelated changes should be split out into its own separate branch and subsequent separate pull request.

git push origin bagder/my-new-stuff-or-bugfix

Make the push a pull request

On GitHub, I then make the newly pushed branch into a pull request (aka “a PR”). It will then become visible in the list of pull requests on the site for the curl source repository, it will be announced in the #curl IRC channel and everyone who follows the repository on GitHub will be notified accordingly.

Perhaps most importantly, a pull request kicks of a flood of CI jobs that will build and test the code in numerous different combinations and on several platforms, and the results of those tests will trickle in over the coming hours. When I write this, we have around 90 different CI jobs – per pull request – and something like 8 different code analyzers will scrutinize the change to see if there’s any obvious flaws in there.

CI jobs per platform over time. Graph snapped on November 5, 2020

A branch in the actual curl/curl repo

Most contributors who would work on curl would not do like me and make the branch in the curl repository itself, but would rather do them in their own forked version instead. The difference isn’t that big and I could of course also do it that way.

After push, switch branch

As it will take some time to get the full CI results from the PR to come in (generally a few hours), I switch over to the next branch with work on my agenda. On a normal work-day I can easily move over ten different branches, polish them and submit updates in their respective pull-requests.

I can go back to the master branch again with ‘git checkout master‘ and there I can “git pull” to get everything from upstream – like when my fellow developers have pushed stuff in the mean time.

PR comments or CI alerts

If a reviewer or a CI job find a mistake in one of my PRs, that becomes visible on GitHub and I get to work to handle it. To either fix the bug or discuss with the reviewer what the better approach might be.

Unfortunately, flaky CI jobs is a part of life so very often there ends up one or two red markers in the list of CI jobs that can be ignored as the test failures in them are there due to problems in the setup and not because of actual mistakes in the PR…

To get back to my branch for that PR again, I “git checkout bagder/my-new-stuff-or-bugfix“, and fix the issues.

I normally start out by doing follow-up commits that repair the immediate mistake and push them on the branch:

git push origin bagder/my-new-stuff-or-bugfix

If the number of fixup commits gets large, or if the follow-up fixes aren’t small, I usually end up doing a squash to reduce the number of commits into a smaller, simpler set, and then force-push them to the branch.

The reason for that is to make the patch series easy to review, read and understand. When a commit series has too many commits that changes the previous commits, it becomes hard to review.

Ripe to merge?

When the pull request is ripe for merging (independently of who authored it), I switch over to the master branch again and I merge the pull request’s commits into it. In special cases I cherry-pick specific commits from the branch instead. When all the stuff has been yanked into master properly that should be there, I push the changes to the remote.

Usually, and especially if the pull request wasn’t done by me, I also go over the commit messages and polish them somewhat before I push everything. Commit messages should follow our style and mention not only which PR that it closes but also which issue it fixes and properly give credit to the bug reporter and all the helpers – using the right syntax so that our automatic tools can pick them up correctly!

As already mentioned above, I merge fast-forward or rebased into master. No merge commits.

Never merge with GitHub!

There’s a button GitHub that says “rebase and merge” that could theoretically be used for merging pull requests. I never use that (and if I could, I’d disable/hide it). The reasons are simply:

  1. I don’t feel that I have the proper control of the commit message(s)
  2. I can’t select to squash a subset of the commits, only all or nothing
  3. I often want to cleanup the author parts too before push, which the UI doesn’t allow

The downside with not using the merge button is that the message in the PR says “closed by [hash]” instead of “merged in…” which causes confusion to a fair amount of users who don’t realize it means that it actually means the same thing! I consider this is a (long-standing) GitHub UX flaw.

Post merge

If the branch has nothing to be kept around more, I delete the local branch again with “git branch -d [name]” and I remove it remotely too since it was completely merged there’s no reason to keep the work version left.

At any given point in time, I have some 20-30 different local branches alive using this approach so things I work on over time all live in their own branches and also submissions from various people that haven’t been merged into master yet exist in branches of various maturity levels. Out of those local branches, the number of concurrent pull requests I have in progress can be somewhere between just a few up to ten, twelve something.

RELEASE-NOTES

Not strictly related, but in order to keep interested people informed about what’s happening in the tree, we sync the RELEASE-NOTES file every once in a while. Maybe every 5-7 days or so. It thus becomes a file that explains what we’ve worked on since the previous release and it makes it well-maintained and ready by the time the release day comes.

To sync it, all I need to do is:

$ ./scripts/release-notes.pl

Which makes the script add suggested updates to it, so I then load the file into my editor, remove the separation marker and all entries that don’t actually belong there (as the script adds all commits as entries as it can’t judge the importance).

When it looks okay, I run a cleanup round to make it sort it and remove unused references from the file…

$ ./scripts/release-notes.pl cleanup

Then I make sure to get a fresh list of contributors…

$ ./scripts/contributors.sh

… and paste that updated list into the RELEASE-NOTES. Finally, I get refreshed counters for the numbers at the top of the file by running

$ ./scripts/delta

Then I commit the update (which needs to have the commit message RELEASE-NOTES: synced“) and push it to master. Done!

The most up-to-date version of RELEASE-NOTES is then always made available on https://curl.se/dev/release-notes.html

Credits

Picture by me, taken from the passenger seat on a helicopter tour in 2011.

HSTS your curl

HTTP Strict Transport Security (HSTS) is a standard HTTP response header for sites to tell the client that for a specified period of time into the future, that host is not to be accessed with plain HTTP but only using HTTPS. Documented in RFC 6797 from 2012.

The idea is of course to reduce the risk for man-in-the-middle attacks when the server resources might be accessible via both HTTP and HTTPS, perhaps due to legacy or just as an upgrade path. Every access to the HTTP version is then a risk that you get back tampered content.

Browsers preload

These headers have been supported by the popular browsers for years already, and they also have a system setup for preloading a set of sites. Sites that exist in their preload list then never get accessed over HTTP since they know of their HSTS state already when the browser is fired up for the first time.

The entire .dev top-level domain is even in that preload list so you can in fact never access a web site on that top-level domain over HTTP with the major browsers.

With the curl tool

Starting in curl 7.74.0, curl has experimental support for HSTS. Experimental means it isn’t enabled by default and we discourage use of it in production. (Scheduled to be released in December 2020.)

You instruct curl to understand HSTS and to load/save a cache with HSTS information using --hsts <filename>. The HSTS cache saved into that file is then updated on exit and if you do repeated invokes with the same cache file, it will effectively avoid clear text HTTP accesses for as long as the HSTS headers tell it.

I envision that users will simply use a small hsts cache file for specific use cases rather than anyone ever really want to have or use a “complete” preload list of domains such as the one the browsers use, as that’s a huge list of sites and for most use cases just completely unnecessary to load and handle.

With libcurl

Possibly, this feature is more useful and appreciated by applications that use libcurl for HTTP(S) transfers. With libcurl the application can set a file name to use for loading and saving the cache but it also gets some added options for more flexibility and powers. Here’s a quick overview:

CURLOPT_HSTS – lets you set a file name to read/write the HSTS cache from/to.

CURLOPT_HSTS_CTRL – enable HSTS functionality for this transfer

CURLOPT_HSTSREADFUNCTION – this callback gets called by libcurl when it is about to start a transfer and lets the application preload HSTS entries – as if they had been read over the wire and been added to the cache.

CURLOPT_HSTSWRITEFUNCTION – this callback gets called repeatedly when libcurl flushes its in-memory cache and allows the application to save the cache somewhere and similar things.

Feedback?

I trust you understand that I’m very very keen on getting feedback on how this works, on the API and your use cases. Both negative and positive. Whatever your thoughts are really!