Category Archives: cURL and libcurl

curl and/or libcurl related

curl’s TLS fingerprint

Every human has a unique fingerprint. With only an impression of a person’s fingertip, it is possible to follow the lead back to the single specific individual wearing that unique pattern.

TLS fingerprints

The phrase TLS fingerprint is of course in this spirit. A pattern in a TLS handshake that allows an involved party to tell or at least guess with a certain level of accuracy what client software that performed it – purely based on how exactly the TLS magic is done. There are numerous different ways and variations a client can perform a TLS handshake and still be standards compliant. There is a long list of extensions that can vary in content, the order of the list of extensions, the ciphers to accept, the allowed TLS versions, steps performed, the order and sequence of those steps and more.

When a network client connects to a remote site and makes a TLS handshake with the server, the server can basically add up all those details and make an educated guess exactly which client that connects to it. One method to do it is called JA3 and produces a 32 digit hexadecimal number as output. (The three creators of this algorithm all have JA as their initials!)

In use out there

I have recently talked with customers and users who have faced servers that refused them access when they connected to sites using curl, but allowed them access to the site when they instead use one of the popular browsers – or if curl was tweaked to look like one of those browsers. It might be a trend in the making. There might be more sites out there now that reject clients that produce the wrong fingerprint then there used to be.

Why

Presumably there are many reasons why servers want to limit access to a subset of clients, but I think the general idea is that they want to prevent “illegitimate” user agents from accessing their sites.

For example, I have seen online market sites use this method in an what I have perceived as an attempt to block bots and web scrapers. Or they do it to block malware or other hostile clients that scour their website.

How

There’s this JA3 page that shows lots of implementations for many services that can figure out clients’ TLS fingerprints and act on them. And there’s nothing that says you have to do it with JA3. There’s likely to be numerous other ways and algorithms as well.

There are also companies that offer commercial services to filter off mismatching clients from your site. This is real business.

A TLS Client hello message has lots of info.

Other fingerprinting

In the earlier days of the web, web sites used more basic ways to detect and filter out bots and non-browser user clients. The original and much simpler way is to check the User-Agent: field that HTTP clients pass on, but has also sometimes been extended to check the order of the sent HTTP headers and in some cases, servers have used elaborate JavaScript schemes in order to try to “smoke out” the clients that don’t seem to act like full-fledged browsers.

If the clients use HTTP/2, that too allows for more details to fingerprint.

As the web has transitioned over to almost exclusively use HTTPS, it has severely increased the ways a server can fingerprint clients, and at the same time made it harder for non-browser clients to look exactly like browsers.

Allow list or block list

Sites that use TLS fingerprints to allow access, of course do not want too many false positives. They want to allow all “normal” browser-based visitors, even if they use a little older versions and also if they use somewhat older or less common operating systems.

This means that they either have to work hard to get an extensive list of acceptable hashes in an accept list or they add known non-desired clients in a block list. I would imagine that if you go the accept list route, that’s how companies can sell this services as that is maintenance intensive work.

Users of alternative and niche browsers are sometimes also victims in this scheme if they stand out enough.

Altering the fingerprint

The TLS fingerprints have the interesting feature compared to human fingertip prints, that they are the result of a set of deliberate actions and not just a pattern you are born to wear. They are therefore a lot easier to change.

With curl version C using TLS library T of version V, the TLS fingerprint is a function that involves C, T and V. And the options O set by curl. By changing one or more of those variables, you are likely to alter the TLS fingerprint.

Match a browser exactly

To be most certain that no site will reject your curl request because of its TLS fingerprint, you would alter the print to look exactly like the one of a popular browser. You can suspect that most sites want their regular human browser-using visitors to be able to access them.

To make curl look exactly like a browser you also likely need to do more than just change C, O, T and V from the section above. You also need to make sure that the TLS library you use produces its lists of extensions and ciphers in exactly the same order etc. This may require that you alter options and maybe even source code.

curl-impersonate

This is a custom build of curl that can impersonate the four major browsers: Chrome, Edge, Safari & Firefox. curl-impersonate performs TLS and HTTP handshakes that are identical to that of a real browser.

curl-impersonate is a modified curl build and the project also provides docker images and more to help users to use it easily.

I cannot say right now if any of the changes done for curl-impersonate will get merged into the upstream curl project, but it will also depend on what users want and how the use of TLS fingerprinting spread or changes going forward.

Program a browser

Another popular way to work around this kind of blocking is to simply program a browser to do the job. Either a headless browser or with tools like Selenium. Since these methods make the TLS handshake using a browser “engine”, they are unlikely to get blocked by these filters.

Cat and mouse

Servers add more hurdles to attempt to block unwanted clients.

Clients change to keep up with the servers and to still access the sites in spite of what the server admins want.

Future

As early as only a few years ago I had never heard of any site that blocked clients because of their TLS handshake. Through recent years I have seen it happen and the use of it seems to have increased. I don’t know of any way to measure if this is actually true or just my feeling.

I cannot rule out that we are going to see this more going forward, even if I also believe that the work on circumventing these fingerprinting filters is just getting started. If the circumvention grows and becomes easy enough, maybe it will stifle servers from adding these filters as they will not be effective anyway?

Let us come back to this topic in a few years and see where it went.

Credits

Fingerprint image by Hebi B. from Pixabay

curl 7.85.0 for you

Welcome to a new curl release, the result of a slightly extend release cycle this time.

Release presentation

Numbers

the 210th release
3 changes
65 days (total: 8,930)

165 bug-fixes (total: 8,145)
230 commits (total: 29,017)
0 new public libcurl function (total: 88)
2 new curl_easy_setopt() option (total: 299)

0 new curl command line option (total: 248)
79 contributors, 38 new (total: 2,690)
44 authors, 22 new (total: 1,065)
1 security fixes (total: 126)
Bug Bounties total: 40,900 USD

Security

We have yet another CVE to disclose.

control code in cookie denial of service

CVE-2022-35252 allows a server to send cookies to curl that contain ASCII control codes. When such cookies subsequently are sent back to a server, they will cause 400 responses from servers that downright refuse such requests. Severity: low. Reward: 480 USD.

Changes

This release counts three changes. They are:

schannel backend supports TLS 1.3

For everyone who uses this backend (which include everyone who uses the curl that Microsoft bundles with Windows) this is great news: now you too can finally use TLS 1.3 with curl. Assuming that you use a new enough version of Windows 10/11 that has the feature present. Let’s hope Microsoft updates the bundled version soon.

CURLOPT_PROTOCOLS_STR and CURLOPT_REDIR_PROTOCOLS_STR

These are two new options meant to replace and be used instead of the options with the same names without the “_STR” extension.

While working on support for new future protocols for libcurl to deal with, we realized that the old options were filled up and there was no way we could safely extend them with additional entries. These new functions instead work on text input and have no limit in number of protocols they can be made to support.

QUIC support via wolfSSL

The ngtcp2 backend can now also be built to use wolfSSL for the TLS parts.

Bugfixes

This was yet again a cycle packed with bugfixes. Here are some of my favorites:

asyn-thread: fix socket leak on OOM

Doing proper and complete memory cleanup even when we exist due to out of memory is sometimes difficult. I found and fixed this very old bug.

cmdline-opts/gen.pl: improve performance

The script that generates the curl.1 man page from all its sub components was improved and now typically executes several times faster then before. curl developers all over rejoice.

configure: if asked to use TLS, fail if no TLS lib was detected

Previously, the configure would instead just silently switch off TLS support which was not always easy to spot and would lead to users going further before they eventually realize this.

configure: introduce CURL_SIZEOF

The configure macro that checks for size of variable types was rewritten. It was the only piece left in the source tree that had the mention of GPL left. The license did not affect the product source code or the built outputs, but it caused questions and therefore some friction we could easily avoid by me completely writing away the need for the license mention.

close the happy eyeballs loser connection when using QUIC

A silly memory-leak when doing HTTP/3 connections on dual-stack machines.

treat a blank domain in Set-Cookie: as non-existing

Another one of those rarely used and tiny little details about following what the spec says.

configure: check whether atomics can link

This, and several other smaller fixes together improved the atomics support in curl quite a lot since the previous version. We conditionally use this C11 feature if present to make the library initialization function thread-safe without requiring a separate library for it.

digest: fix memory leak, fix not quoted ‘opaque’

There were several fixes and cleanups done in the digest department this time around.

remove examples/curlx.c

Another “victim” of the new license awareness in the project. This example was the only file present in the repository using this special license, and since it was also a bit convoluted example we decided it did not really have to be included.

resolve *.localhost to 127.0.0.1/::1

curl is now slightly more compliant with RFC 6761, follows in the browsers’ footsteps and resolves all host names in the “.localhost” domain to the fixed localhost addresses.

enable obs-folded multiline headers for hyper

curl built with hyper now also supports “folded” HTTP/1 headers.

libssh2:+libssh make atime/mtime date overflow return error

Coverity had an update in August and immediately pointed out these two long-standing bugs – in two separate SSH backends – related to time stamps and 32 bits.

curl_multi_remove_handle closes CONNECT_ONLY transfer

When an applications sets the CONNECT_ONLY option for a transfer within a multi stack, that connection was not properly closed until the whole multi handle was closed even if the associated easy handle was terminated. This lead to connections being kept around unnecessarily long (and wasting resources).

use pipe instead of socketpair on apple platforms

Apparently those platform likes to close socketpairs when the application is pushed into the background, while pipes survive the same happening… This is a change that might be preferred for other platforms as well going forward.

use larger dns hash table for multi interface

The hash table used for the DNS cache is now made larger for the multi interface than when created to be used by the easy interface, as it simply is more likely to be used by many host names then and then it performs better like this.

reject URLs with host names longer than 65535 bytes

URLs actually have no actual maximum size in any spec and neither does the host name within one, but the maximum length of a DNS name is 253 characters. Assuming you can resolve the name without DNS, another length limit is the SNI field in TLS that is an unsigned 16 bit number: 65535 bytes. This implies that clients cannot connect to any SNI-using TLS protocol with a longer name. Instead of checking for that limit in many places, it is now done early.

reduce size of several struct fields

As part of the repeated iterative work of making sure the structs are kept as small as possible, we have again reduced the size of numerous struct fields and rearranged the order somewhat to save memory.

Next

The next release is planned to ship on October 25, 2022.

The Travis separation a year later

A little over a year ago, we ditched Travis CI as a service to use for the curl project.

Up until that day, it had been our preferred and favored CI service for many years. At most, we ran 34 CI jobs on it, for every pull-request and commit. It was the service that we leaned on when we transitioned the curl project into a CI-heavy user. Our use of CI really took off 2017 and has been increasing ever since.

A clean cut

We abruptly cut off over 30 jobs from the service just one day. At the time, that was a third of all our CI. The CI jobs that we rely on to verify our work and to keep things working and stable in the project.

More CI services

At the time of the amputation, we run 99 CI jobs distributed over 5 services, so even with one of them cut off we still ran jobs on AppVeyor, Azure Pipelines, GitHub Actions and Cirrus CI. We were not completely stranded.

New friends

Luckily for us, when one solution goes sour there are often alternatives out there that we can move over to and continue our never-ending path forward.

In our case, friendly people helped moving over almost all ex-Travis jobs to the (for us) new service Zuul CI. In July 2021, we had 29 jobs on it. We also added Circle CI to the mix and started running jobs there.

In July 2021 – a month after the cut – we counted 96 running jobs (a few old jobs were just dropped as we reconsidered their value). While the work involved a lot of adjusting scripts, pulling hair over yaml files and more, it did not cause any significant service loss over an extended period. We managed pretty good.

There was no noticeable glitch in quality or backed up “guilt” in the project because of the transition and small period of lesser CI either. Thanks to the other services still running, we were still in a good shape.

Why all the services

In the end of August 2022 we still use 6 different CI services and we now run 113 CI jobs on them, for every push to master and to pull-requests.

There are primarily three reasons why we still use a variety of services.

  1. load balancing: we get more parallelism by running jobs on many services as they all have a limited parallelism per service.
  2. We also get less problems when one of the services has some glitch or downtime, as then we still work with the others. The not all eggs in the same basket thing.
  3. The various services also have different features, offer different platforms and work slightly differently which for several jobs make them necessary to run on a specific service, or rather they cannot run on most of the other services.

It does not end

Over the year since the amputation, we have learned that our new friend Zuul CI has turned out to not work quite as reliable and convenient as we would like it. Since a few months back, we are now gradually moving away from this service. Slowly moving over jobs from there to run on one the other five instead.

Over time, our new most preferred CI service has turned out to be GitHub Actions. At the latest count, it now run 44 CI jobs for us. We still have 12 jobs on Zuul targeted for transition.

Our use of different CI services over time in the curl project

Services come and go. We have different ideas and our requirements and ambitions change. I am sure we will continue to service-jump when needed. It is just a natural development. A part of a software development life.

On flakiness

A big challenge and hurdle with our CI setup remains: to maintain the builds and keep them stable and functioning. With over a hundred jobs running on six services and our code and test suite being portable and things being networked and running on many platforms, it is job that we quite often fail at. It has turned out mighty difficult to avoid that at least a few of the jobs are constantly red, “permafailing”, at any single moment.

If this is stuff you like to tinker with, we could use your help!

curl up 2022 take 2

In June of 2022 we intended to run the curl up 2022 curl conference in person, in California.

Unfortunately, I had the bad taste of catching covid exactly when I was about to use my new US visa for the first time, so I had to remain at home and because of that we cancelled the whole event.

Try again

Now we try again. The curl up 2022 take 2 will be an all-virtual event that is going to be a long Zoom-session with a number of presentations and discussion slots. Feel free to join in for what you want and stay away from the sessions that don’t interest you. We will make our best to keep the schedule, and the agenda will be available ahead of time for you to plan your attendance around.

On September 15, 2022 we will start the show at 15:00 CEST and we have a program that is a full day. Join when you want, leave, come back. You decide.

Sign up!

Agenda

The Agenda is almost complete. The presentations will be done live or be provided prerecorded. The prerecorded talks will still be discussed and have Q&As live afterwards.

Speak!

There is still room left to add some speakers. If you use curl/libcurl somewhere and want to tell us about things you’ve learned and things libcurl devs should learn, come do it! Or anything else that is related to curl or Internet transfers. Big or small – even just 5 minutes works!

This is a day for sharing info, spreading knowledge and having fun.

Attend!

Yeah, sign up and come hang out with us and watch a range of curl related talks. I think you will learn things and I think it will be a lot of fun.

The curl up hours spelled out in different time zones:

  • PDT 6:00 – 15:00 (US west coast)
  • EDT 9:00 – 18:00 (US east coast)
  • UTC 13:00 – 22:00
  • CEST 15:00 – 00:00 (central Europe)
  • IST 18:30 – 03:30 (India)
  • CST 21:00 – 06:00 (China)
  • JST 22:00 – 07:00 (Japan)
  • AEST 23:00 – 08:00 (Sidney, Autralia)
  • NZST 01:00 – 09:00 (New Zealand) (This is on the 16th)

What if I vanished?

I get this question fairly often. How would the projects I run manage if I took off? And really, the primary project people think of then is of course curl.

How would the curl project manage if I took a forever vacation starting now?

Of course I don’t know that. We can’t really know for sure until the day comes (in the distant future) when I actually do this. Then you can come back to this post and see how well I anticipated what would happen.

Let me be clear: I do not have any plans to leave the curl project or in any way stop my work on it, neither in the short nor long term. I hope to play this game for a long time still. I am living the dream after all.

Busfactor

Traditionally we talk about the busfactor in various projects as a way to see how many key people a particular project has. The idea being that if there is only one, the project would effectively die if that single person goes away. A busfactor of one is considered bad.

The less morbid name sometimes used for this number is the holiday factor.

There is also the eternal difficult religious question of how to calculate the busfactor, but let’s ignore that today.

I would maybe argue differently

The theory of the busfactor is the idea that the main contributors of a project are irreplaceable and that the lack of other more active involvement are because of a lack of knowledge or ability or something.

It very well might be. A challenge is that there is no way to tell. It could also be that because one or more of the top active developers do a good enough job, the others do not see a need or a reason for them to step up, join in and work. They can very well keep lingering in the background because everything seems to work well enough from their perspective. And if things work well enough in project X, surely lots of people can then rather find interesting work elsewhere. Perhaps in project Y that may not work as well.

I prefer to think and I actually believe that curl is a project in this second category. It runs pretty well, even if I do a really large chunk of the work in the project. I am not sure curl actually counts as a busfactor one project, but the exact number is not important me. At least I regularly do more than 50% of the commits on an average year.

Graph for the curl source code repository

A safety net

I think an important part of the job for me in the project is to make sure that everything is documented. The code of course, the APIs, the way we build things, how to run tests, how to write tests etc. But also some of the softer things: how we do releases, what the different roles are the project, how we decide merging features, how we work on security issues, how did the project start, how do we want contributions and much much more.

This is the safety net for the day I vanish.

There are no secrets, no hidden handshakes and no surprises. No procedures or ways of working that is not written down and shared with the team.

If you know me and my work in curl, you know that I work fiercely on documentation.

Maybe there is only a single bus driver, but there are wannabe potential drivers following along that can take over the wheel should the seat become empty.

Contingency

There is no designated crown prince/princess or heir who inherits the throne after me. I don’t think that is a valuable thing to even discuss. I am convinced that when the day comes, someone will step up to do what is necessary. If the project is still relevant and in use. Heck, it could even be a great chance to change the way the project is run…

Unless of course the project has already ran out of its use by then, and then it doesn’t matter if someone steps up or not.

Most (95%) of the copyrights in the project are mine. But I have decided to ship the code under a liberal license, so while successors cannot change the copyright situation easily, there should be very little or no reason for doing that.

The keys

If I vanish, the only vital thing you (the successors) will really miss to run and manage the curl project exactly as I have done, is the keys to the building. The passwords and the logins to the various machines and services we use that I can login to, to manage whatever I need to for the project.

If I vanish willingly, I will of course properly hand these over to someone who is willing to step up and take the responsibility. If I instead suddenly get abducted by aliens with no chance for a smooth transition, I have systems setup so that people left behind know what to do and can get access to them.

Increased CVE activity in curl?

Recently I have received curious questions from users, customers and bystanders.

Can you explain the seemingly increased CVE activity in curl over like the last year or so?

(data and stats for this post comes mostly from the curl dashboard)

Pointless but related poll I ran on Twitter

Frequency

In 2022 we have already had 14 CVEs reported so far, and we will announce the 15th when we release curl 7.85.0 at the end of August. Going into September 2022, there have been a total of 18 reported CVEs in the last 12 months.

During the whole of 2021 we had 13 CVEs reported – and already that was a large amount and the most CVEs in a single year since 2016.

There has clearly been an increased CVE issue rate in curl as of late.

Finding and fixing problems is good

While every reported security problem stings my ego and makes my soul hurt since it was yet another mistake I feel I should have found or not made in the first place, the key take away is still that it is good that they are found and reported so that they can be fixed properly. Ideally, we also learn something from each such report and make it less likely that we ever introduce that (kind of) problem again.

That might be the absolutely hardest task around each CVE. To figure out what went wrong, detect a pattern and then lock it down. It’s almost amusing how all bugs look a like a one-off mistake with nothing to learn from…

Ironically, the only way we know people are looking really hard at curl security is when someone reports security problems. If there was no reports at all, can we really be sure that people are scrutinizing the code the way we want?

Who is counting?

Counting the amount of CVEs and giving that a meaning, or even comparing the number between projects, is futile and a bad idea. The number does not say much and comparing two projects this way is impossible and will not tell you anything. Every project is unique.

Just counting CVEs disregards the fact that they all have severity levels. Are they a dozen “severity low” or are they a handful of critical ones? Even if you take severity into account, they might have gotten entirely different severity for virtually the same error, or vice versa.

Further, some projects attract more scrutinize and investigation because they are considered more worthwhile targets. Or perhaps they just pay researchers more for their findings. Projects that don’t get the same amount of focus will naturally get fewer security problems reported for them, which does not necessarily mean that they have fewer problems.

Incentives

The curl bug-bounty really works as an incentive as we do reward security researchers a sizable amount of money for every confirmed security flaw they report. Recently, we have handed over 2,400 USD for each Medium-severity security problem.

In addition to that, finding and getting credited for finding a flaw in a widespread product such as curl is also seen as an extra “feather in the hat” for a lot of security-minded bug hunters.

Over the last year alone, we have paid about 30,000 USD in bug bounty rewards to security researchers, summing up a total of over 40,000 USD since the program started.

Accumulated bug-bounty rewards for curl CVEs over time

Who’s looking?

We have been fortunate to have received the attention of some very skilled, patient and knowledgeable individuals. To find security problems in modern curl, the best bug hunters both know the ins and outs of the curl project source code itself while at the same time they know the protocols curl speaks to a deep level. That’s how you can find mismatches that shouldn’t be there and that could lead to security problems.

The 15 reports we have received in 2022 so far (including the pending one) have been reported by just four individuals. Two of them did one each, the other two did 87% of the reporting. Harry Sintonen alone reported 60% of them.

Hardly any curl security problems are found with source code analyzers or even fuzzers these days. Those low hanging fruits have already been picked.

We care, we act

Our average response time for security reports sent to the curl project has been less than two hours during 2022, for the 56 reports received so far.

We give each report a thorough investigation and we spend a serious amount of time and effort to really make sure we understand all the angles of the claim, that it really is a security problem, that we produce the best possible fix for it and not the least: that we produce a mighty fine advisory for the issue that explains it to the world with detail and accuracy.

Less than 8% of the submissions we get are eventually confirmed actual security problems.

As a general rule all security problems we confirm, are fixed in the pending next release. The only acceptable exception would be if the report arrives just a day or two from the next release date.

We work hard to make curl more secure and to use more ways of writing secure code and tools to detect mistakes than ever, to minimize the risk for introducing security flaws.

Judge a project on how it acts

Since you cannot judge a project by the number of CVEs that come out of it, what you should instead pay more focus on when you assess the health of a software project is how it acts when security problems are reported.

Most problems are still very old

In the curl project we make a habit of tracing back and figuring out exactly in which release each and every security problem was once introduced. Often the exact commit. (Usually that commit was authored by me, but let’s not linger on that fact now.)

One fun thing this allows us to do, is to see how long time the offending code has been present in releases. The period during which all the eyeballs that presumably glanced over the code missed the fact that there was a security bug in there.

On average, curl security problems have been present an extended period of time before there are found and reported.

On average for all CVEs: 2,867 days

The average time bugs were present for CVEs reported during the last 12 months is a whopping 3,245 days. This is very close to nine years.

How many people read the code in those nine years?

The people who find security bugs do not know nor do they care about the age of a source code line when they dig up the problems. The fact that the bugs are usually old could be an indication that we introduced more security bugs in the past than we do now.

Finding vs introducing

Enough about finding issues for a moment. Let’s talk about introducing security problems. I already mentioned we track down exactly when security flaws were introduced. We know.

All CVEs in curl, green are found, red are introduced

With this, we can look at the trend and see if we are improving over time.

The project has existed for almost 25 years, which means that if we introduce problems spread out evenly over time, we would have added 4% of them every year. About 5 CVE problems are introduced per year on average. So, being above or below 5 introduced makes us above or below an average year.

Bugs are probably not introduced as a product of time, but more as a product of number of lines of code or perhaps as a ratio of the commits.

75% of the CVE errors were introduced before March 2014 – and yet the code base “only” had 102,000 lines of code at the end of that period. At that time, we had done 61% of the git commits (17684). One CVE per 189 commits.

15% of the security problems were introduced during the last five years and now we are at 148,000 lines of code. Finding needles in a growing haystack. With more code than ever before, we introduce bugs at a lower-than average rate. One CVE per 352 commits. This probably is also related to the ever-growing number of tests and CI jobs that help us detect more problems before we merge them.

Number of lines of code. Includes comments, excludes blank lines

The commit rate has remained between 1,100 and 1,700 commits per year since 2007 with ups and downs but no obvious growing or declining trend.

Number of commits per year

Harry Sintonen

Harry reported a large amount of the recent curl CVEs as mentioned above, and is credited for a total of 17 reported curl CVEs – no one else is even close to that track record. I figured it is apt to ask him about the curl situation of today, to make sure this is not just me hallucinating things up. What do you think is the reason for the increased number of CVEs in curl in 2022?

Harry replied:

  1. The news of good bounties being paid likely has been attracting more researchers to look at curl.
  2. I did put considerable effort in doing code reviews. I’m sure some other people put in a lot of effort too.
  3. When a certain before unseen type of vulnerability is found, it will attract people to look at the code as a whole for similar or surrounding issues. This quite often results in new, similar CVEs bundling up. This is kind of a clustering effect.

It’s worth mentioning, I think, that even though there has been more CVEs found recently, they have been found (well most of them at least) as part of the bug bounty program and get handled in a controlled manner. While it would be even better to have no vulnerabilities at all, finding and handling them in controlled manner is the 2nd best option.

This might be escaping a random observer who just looks at the recent amount of CVEs and goes “oh this is really bad – something must have gone wrong!”. Some of the issues are rather old and were only found now due to the increased attention.

Conclusion

We introduce CVE problems at a slower rate now than we did in the past even though we have gotten problems reported at a higher than usual frequency recently.

The way I see it, we are good. I suppose the future will tell if I am right.

100,000 words

With just a month left until its seventh birthday, everything curl has now surpassed this amazing milestone. The book now contains more than 100,000 words. Distributed over 883 sections. All written in glorious markdown.

Two years ago when we celebrated its 5th birthday, it was still this measly thin “pamphlet” of 72,000 words. It has grown by almost 40% over the last two years.

The average word length in the book is now 5.25 characters and all this is spread out over 14,900 lines (in the source markdowns).

63 individuals have had their commits merged. I have great help from people to polish off weird language and wrong English.

My ambition with this book remains the same: to document everything there is to tell about curl and libcurl from every aspect. Code, use, development, project, background, future, philosophy and more.

CI

News from the last year for everything curl is that we have several CI jobs now that verify new contributions to make sure we don’t degrade too much. They check that:

  • all markdown links work
  • proselint has no complaints
  • we avoid some words (contractions and some other things) – basically my most common mistakes
  • spellcheck
  • markdown heading level sanity check

The spellcheck part can of course be a bit tedious for such a technical document but I realized that since I am such a sloppy writer I need that check. This has really reduced the inflow of PRs with spelling fixes.

These CI jobs makes the quality of the book much better even though it is a highly moving target.

Keeping up

As curl is a constantly evolving project that adds new features and changes things every now and then, there is also a constant stream of new things to add or update in the book.

Since I also want the book to work for readers that may very well run curl versions from several years ago, we need to keep that in mind and make sure to keep “old behavior” and details around for a while.

Miss anything?

The curl release cycle

In the curl project we do timed releases and we try to do them planned and scheduled long in advance. The dates are planned. The content is not.

I have been the release manager for every single curl release done. 209 releases at current count. Having this role means I make sure things are in decent shape for releases and I do the actual mechanic act of running the release scripts etc on the release days.

Since 2014, we make releases on Wednesdays, every eight weeks. We sometimes adjust the date slightly because of personal events (meaning: if I have a vacation when the release is about to happen, we can move it), and we have done several patch releases within a shorter time when the previous release proved to have a serious enough problem to warrant an out-of-schedule release. Even more specifically, I make the releases available at or around 8 am (central euro time) on the release days.

The main objective is to stick to the 56 day interval.

It probably goes without saying but let me be clear: curl is a software project that quite evidently will never be done or complete. It will keep getting fixes, improvements and features for as long as it lives, and as long as it lives we keep making new releases.

Timed releases

We decided to go with a fixed eight weeks release frequency quite arbitrarily (based on past release history and what felt “right”) but it has over time proved to work well. It is short enough for everyone to never have to wait very long for the next release, and yet it is long enough to give us time for both merging new features and having a period of stabilization.

Timed releases means that we ship releases on the predetermined release date with all the existing features and bugfixes that have landed in the master branch in time. If a change is not done in time and it is not merged before the release, it will simply not be included in the release but will get a new opportunity to get included and shipped in the next release. Forever and ever.

I am a proponent of timed releases compared to feature based ones for projects like curl. For the simplicity of managing the releases, for long term planning, for user communication and more.

The cycle has three windows

Within the eight week release cycle, there are three distinct and different windows or phases.

1 – release margin

The cycle can be said to start at the day of a release. The release is created straight from the master git branch. It gets signed, uploaded, blogged about and the news of it is shouted across the globe.

This day also starts the “release margin window”. In this window we still accept and merge bugfixes into the master branch, but we do not merge changes or new features.

The five day margin this speaks of, is that this window gives us a few calendar days to assess and get a feel for how the previous release is being received. Is there a serious bug reported? Did we somehow royally mess up? If we did an important enough snafu, we may decide to do a follow-up patch release soon and if we do, we are in a better position if we have not merged any changes yet into the release branch.

If we decide on doing a patch release, we skip the feature window and go directly to feature freeze.

2 – feature window

If we survived the margin window without anything alarming happening, we open the feature window. On the Monday following the preceding release.

This is the phase during which we merge changes and new features in addition to the regular normal bugfixes (assuming there are any of course). Things that warrant the minor version number for the next release to get bumped.

The feature window is open 23 days. Ideally, people have already been working and polishing on their pull-requests for a while before this window opens and then the work can get merged fairly quickly – and presumably painless.

What exactly can be considered a change and what is a bugfix can of course become a matter of opinion and discussion, but we tend to take the safer approach when in doubt.

3 – feature freeze

In the 28 days before the pending release we only merge bugfixes into the master branch. All features and changes are queued up and will have to wait until the feature window opens again. (Exceptions might be made for experimental and off-by-default features.)

This phase is of course intended for things to calm down, to smooth out rough edges, to fix any leftover mistakes done in the previous feature merges. With the mindset that the pending next release will be the best, most stable, shiniest, glorious release we ever did. With even fewer bugs than before. The next release is always our best release yet.

The release cycle as an image

(Yes, if you squint and roll your chair away from the screen a bit, it looks like the Chrome logo!)

Release frequency graph

The graph below shows the number of days between all curl releases ever done. You can see the eight week release cycle introduced in 2014 visible in the graph, and you can also easily spot that we have done much quicker releases than eight weeks a number of a times since then – all of them actually a sign of some level of failure since that means we felt urged to do a patch release sooner than we had previously anticipated.

QUIC and HTTP/3 with wolfSSL

Disclaimer: I work for wolfSSL but I don’t speak for wolfSSL. I state my own opinions and I try to be as honest and transparent as possible. As always.

QUIC API

Back in the summer of 2020 I blogged about QUIC support coming in wolfSSL. That work never actually took off, primarily I believe because the team kept busy with other projects and tasks that had more customer focus and interest and yeah, there was not really any noticeable customer demand for QUIC with wolfSSL.

Time passed.

On July 21 2022, Stefan Eissing submitted his work on introducing a QUIC API and after reviews and updates, it was merged into the wolfSSL master branch on August 9th.

The QUIC API is planned to appear “for real” in a coming wolfSSL release version. Until then, we can play with what is available in git.

Let me be clear here: the good people at wolfSSL has not decided to write a full QUIC implementation, because that would be insane when so many good alternatives are already being worked on. This is just a set of new functions to allow wolfSSL to be used as TLS component when a QUIC stack is created.

Having QUIC support in wolfSSL is just one (but important) step along the way as it makes it possible to use wolfSSL to build a QUIC implementation but there are some more steps needed to turn this baby into full HTTP/3.

ngtcp2

Luckily, ngtcp2 exists and it is an established QUIC implementation that was written to be TLS agnostic from the beginning. This “only” needs adaptions provided to make sure it can be built and used with wolSSL as the TLS provider.

Stefan brought wolfSSL support to ngtcp2 in this PR. Merged on August 13th.

nghttp3

nghttp3 is the HTTP/3 library that uses ngtcp2 for QUIC, so once ngtcp2 supports wolfSSL we can use nghttp3 to do HTTP/3.

curl

curl can (as one of the available options) get built to use nghttp3 for HTTP/3, and if we just make sure we use an underlying ngtcp2 built to use a wolfSSL version with QUIC support, we can now do proper curl HTTP/3 transfers powered by wolfSSL.

Stefan made it possible to build curl with the wolfSSL+ngtcp2 combo in this PR. Merged on August 15th.

Available HTTP/3 components

With this new ecosystem addition, the chart of HTTP/3 components for curl did not get any easier to parse!

If you start by selecting which HTTP/3 library (or maybe I should call it HTTP/3 vertical) to use when building, there are three available options to go with: quiche, msh3 or nghttp3. Depending on that choice, the QUIC library is given. quiche does QUIC as well, but the two other HTTP/3 libraries use dedicated QUIC libraries (msquic and ngtcp2 respectively).

Depending on which QUIC solution you use, there is a limited selection of TLS libraries to use. The image above shows TLS libraries that curl also supports for other protocols, meaning that if you pick one of those you can still use that curl build to for example do HTTPS for HTTP version 1 or 2.

TLS options

If you instead rather pick TLS library first, only quictls and BoringSSL are supported by all QUIC libraries (quictls is an OpenSSL fork with a BoringSSL-like QUIC API patched in). If you rather build curl to use Schannel (that’s the native Windows TLS API), GnuTLS or wolfSSL you have also indirectly chosen which QUIC and HTTP/3 libraries to use.

Picotls

ngtcp2 supports Picotls shown in orange in the image above because that is a TLS 1.3-only library that is not supported for other TLS operations within curl. If you build curl and opt to go with a ngtcp2 build using Picotls for QUIC, you would need to have use an second TLS library for other TLS-using protocols. This is possible, but is rarely what users prefer.

No OpenSSL option

It should probably be especially highlighted that the plain vanilla OpenSSL is not an available option. Primarily because they decided that the already created API was not good enough for them so they will instead work on implementing their own QUIC library to be released at some point in the future. That also implies that if we want to build curl to do HTTP/3 with OpenSSL in the future, we probably need to add support for a forth QUIC library – and someone would also have to write a HTTP/3 library to use OpenSSL for QUIC.

Why wolfSSL adding QUIC is good for HTTP/3

People in general want to build applications and infrastructure using released, official and supported libraries and the sad truth is that there is a clear shortage in such TLS libraries with QUIC support.

In your typical current Linux distribution, quictls and BoringSSL are usually not viable options. The first since it is an OpenSSL fork not many even ship as a package and the second because it is done by Google for Google and they don’t do releases and generally care little for outside-Google users.

For the situations where those two TLS options are out of the game, the image above shows you the grim reality: your HTTP/3 options are limited. On Windows you can go with msh3 since it can use Schannel there, but on non-Windows you can only use ngtcp2/nghttp3 and before this wolfSSL support the only TLS option was GnuTLS.

For many embedded solutions, or even FIPS requirements, wolfSSL is now the only viable option for doing HTTP/3 with curl.

The dream of auto-detecting proxies

curl, along with every other Internet tool with aspirations, supports proxies. A proxy is a (usually known) middle-man in a network operation; instead of going directly to the remote end server, the client goes via a proxy.

curl has supported proxies since the day it was born.

Which proxy?

Applications that do Internet transfers often would like to automatically be able to do their transfers even when users are trapped in an environment where they use a proxy. To figure out the proxy situation automatically.

Many proxy users have to use their proxy to do Internet transfers.

A library to detect which proxy?

The challenge to figure out the proxy situation is of course even bigger if your applications can run on multiple platforms. macOS, Windows and Linux have completely different ways of storing and accessing the necessary information.

libproxy

libproxy is a well-known library used for exactly this purpose. The first feature request to add support for this into libcurl that I can find, was filed in December 2007 (I also blogged about it) and it has been popping up occasionally over the years since then.

In August 2016, David Woodhouse submitted a patch for curl that implemented support for libproxy (the PR version is here). I was skeptical then primarily because of the lack of tests and docs in libproxy (and that the project seemed totally unresponsive to bug reports). Subsequently we did not merge that pull request.

Almost six years later, in June 2022, Jan Brummer revived David’s previous work and submitted a fresh pull request to add libproxy support in curl. Another try.

The proxy library dream is clearly still very much alive. There are also a fair amount of applications and systems today that are built to use libproxy to figure out the proxy and then tell curl about it.

What is unfortunately also still present, is the unsatisfying state of libproxy. It seems to have changed and improved somewhat since the last time I looked at it (6 years ago), but there several warning signs remaining that make me hesitate.

This is not a dependency I want to encourage curl users to lean and depend upon.

I greatly appreciate the idea of a libproxy but I do not like the (state of the) implementation.

My responsibility

Or rather, the responsibility I think we have as curl maintainers. We ship a product that is used and depended upon by an almost unfathomable amount of users, tools, products and devices.

Our job is to help guide our users so that the entire product, including third party dependencies, become an as safe and secure solution as possible. I am not saying that we can take full responsibility for the security of code outside of our own domain, but I think we should recognize that what we condone and recommend will be used. People read our support for library X as some level of approval.

We should only add support for third party libraries that meets a certain quality threshold. This threshold or bar maybe is (unfortunately) not written down anywhere but is still mostly a soft “gut feeling” based on human reviews of the situation.

What to do

I think the sensible thing to do first, before trying to get curl to use libproxy, is to make sure that there is a library for proxies that we can and want to lean on. That library can be the libproxy of today but it could also be something else.

Some areas in need of attention in libproxy that I recently also highlighted in the curl PR, that would take it closer to meeting the requirements we can ask of a dependency:

  1. Improve the project with docs. We cannot safely rely on a library and its APIs if we don’t know exactly what to expect.
  2. There needs to be at least basic tests that verify its functionality. We cannot fix and improve the library safely if we cannot check that it still works and behaves as expected. Tests is the only reliable way to make sure of this.
  3. Add CI jobs that build the project and run those tests. To help the project better verify that things don’t break along the way.
  4. Consider improving the API. To be fair, it is extremely simple now, but it’s also so simple that that it becomes ineffective and quirky in some use cases.
  5. Allow for external URL parser and URL retrieving etc to avoid blocking, double-parsing and to reuse caches properly. It would be ridiculous for curl to use a library that has its own separate (semi-broken) HTTP transfer and its own (synchronous) name resolving process.

A solid proxy library?

The current libproxy is not even a lot of code, it could perhaps make more sense to just plainly write a new library based on how you would want a library like this to work – and use all the knowledge, magic and experience from libproxy to get the technical parts done correctly. A libproxy-next-generation.

But

This would require that someone really wanted to see this development take place and happen. It would take someone to take lead in the project and push for changes (like perhaps the 5 bullets I listed above). The best would be if someone who would like to use this kind of setup could sponsor a developer half/full time for a while to get a good head start on this.

Based on history, seeing we have had this known use case and this library around for well over a decade and this is the best we have accomplished so far, I am not optimistic that we can turn this ship around. Until then, we simply cannot allow curl to use this dependency.