89 operating systems

I occasionally do talks about curl. In these talks I often include a few slides that say something abut curl’s coverage and presence on different platforms. Mostly to boast of course, but also to help explain to the audience how curl has manged to reach its ten billion installations.

This is current incarnation of those seven slides in November 2022. I am of course also eager to get your feedback on the specific contents, especially if you miss details in them, that I should add so that my future curl presentations include more accurate data.

curl runs in all your devices

curl is used in (almost) every Internet-connected device out there, and I try to visualize that with this packed slide. Cars, servers, game consoles, medical devices, games, apps, operating systems, watches, robots, TVs, speakers, light-bulbs, freezers, printers, motorcycles, music instruments and more.

The intent being to show with images that it runs in quite a few devices.

28 transfer protocols

More strictly speaking these are the 28 URL schemes curl supports right now, including in experimental builds. The image tries to put them in a sort of hierarchical way so that you can see what underlying protocol that is used for them: TCP, SSH, TLS, QUIC etc.

60 bindings

Volunteers have created and maintain libcurl bindings for at least these 60 different environments, making it possible for developers to access and use libcurl powers from virtually every programming language you can think of.

37 third party dependencies

curl’s modular design and ability letting the developer who builds curl to mix and match features and what particular third party dependencies to use, makes it possible to craft exactly the curl builds you want. Device manufacturers get the combination they like for exactly their purposes and needs.

89 operating systems

This list has been worked on and bounced around several times between friends and it always brings out a few questions and people like to argue with me about a few entries I include and about a few entries I do not include. The problem is both that there is not a clear defining line between the definition of an operating systems and a distribution or a different running environment and sometimes it is just branding differences separating X from Y. With a “flexible” attitude to the definition of operating systems, this is the current collection of no less than 89 individual ones on which curl runs or has run on:

24 CPU architectures

Older versions of this slide used to have x86-64 as a separate one, but I think we have concluded that a large amount of the architectures have separate 64 bit versions so if I were to keep x86-64 I should also include a lot of other 64 bit versions so I removed the x86-64 from the slide. Maybe I should rather go the other way and add all the 64 bit version as separate architectures?

Anyway, curl has been made to run on virtually all modern or semi-modern 32 bit or larger CPU architectures we can find:

2 planets

I admit it. I include this slide in my presentation and in this blog post because it feels like the ultimate show-off. curl was used in the mars 2020 helicopter mission.

Considering C99 for curl

tldr: we stick to C89 for now.

The curl project builds on foundations that started in late 1996 with the tool named httpget.

ANSI C became known as C89

In 1996 there were not too many good alternatives for making a small and efficient command line tool for doing Internet transfers. I am not saying that C was the only available language, but for me the choice was easy and frankly I did not even think about any other languages when this journey started. We called the C flavor “ANSI C” back then, as compared to the K&R “old style” C. The ANSI C version would later be renamed to C89 (confusingly enough it is also sometimes known as C90).

In the year 2000 we introduced libcurl, the library that provides Internet transfer super powers to whoever wants it. This made the choice of using C even better. C made it possible for us to provide a stable API/ABI without problems – something not even C++ could offer at the time. It was also a reasonably portable language that made it possible for us to bring curl and libcurl to virtually all modern operating systems.

As I wanted curl and libcurl to be system level options and I aimed for the widest possible adoption, they could not be written in any of the higher level languages like Perl, Python or similar. That would make them too big and require too much “extra baggage”.

I am convinced that the use of (conservative) C for curl is a key factor to its success and its ability to get used “everywhere”.

C99

C99 was published in (surprise!) 1999 but the adoption in compilers took a long time and it remained a blocker for adoption for us. We want curl available “everywhere” so as long some of the major compilers did not support C99 we did not even consider switching C flavor, as it would risk hamper curl adoption.

The slowest of the “big compilers” to adopt C99 was the Microsoft Visual C++ compiler, which did not adopt it properly until 2015 and added more compliance in 2019. A large number of our users/developers are still stuck on older MSVC versions so not even all users of this compiler suite can build C99 programs even today, in late 2022.

C11, C17 and beyond

Meanwhile, the ISO C Working Group continue to crank out updates to the C language. C11 shipped, C17 came and now they are working on the C2x pending version, presumed to end up called C23.

Bump the requirement for curl?

We are aware that other widely popular C projects are moving forward and have raised their requirements to C99 or beyond. Like the Linux kernel, the git project and more.

The discussion about bumping C flavor has been brought up on the libcurl mailing list as well, in particular as we are already planning a version 8 release to happen in the spring of 2023 so in theory it could be a good moment to make some changes like this.

What C99 features would improve a project like curl? The most interesting parts of C99 that could impact curl code that I could think of are:

  • // comments
  • __func__ predefined identifer
  • boolean type in <stdbool.h>
  • designated struct initializers
  • empty macro arguments
  • extended integer types in <inttypes.h> and <stdint.h>
  • flexible array members (zero size arrays)
  • inline functions
  • integer constant type rules
  • mixed declarations and code
  • the long long type and library functions
  • the snprintf() family of functions
  • trailing comma allowed in enum declaration
  • vararg macros
  • variable-length arrays

So sure, there are lots of cool things we could use. But do we need them?

For several of the features above, we already have decent and functional replacements. Several of the features don’t matter. The rest risk becoming distractions.

Opening up for C99 without conditions in curl code would risk opening the flood gates for people rewriting things, so we would have to go gently and open up for allowing new C99 features slowly. That is also how the git project does it. A challenge with that approach, is that it is hard to verify which features that are allowed vs used as existing tooling normally don’t have that resolution.

The question has also been asked that if we consider bumping the requirement, should we then not bump it to C11 at once instead of staying at C99?

Not now

Ultimately, not a single person has yet been able to clearly articulate what benefits such a C flavor requirement bump would provide for the curl project. We mostly see a risk that we all get caught in rather irrelevant discussions and changes that perhaps will not actually bring the project forward very much. Neither in features nor in quality/security.

I think there are still much better things to do and much more worthwhile efforts to spend our energy on that could actually improve the project and bring it forward.

Like improving the test suite, increasing test coverage, making sure more code is exercised by the fuzzers.

A minor requirement change

We have decided that starting with curl 8, we will require that the compiler supports a 64 bit data type. This is not something that existed in the original C89 version but was introduced in C99. However, there is no longer any modern compiler around that does not support this.

This is a way to allow us to stop caring about those odd platforms and write code and checks for when the large types are not very large. It is hard to verify that code nowadays since virtually nobody actually uses such compilers/systems.

Maybe this is the way we can continue to adapt to and use specific post C89 features going forward. By cherry-picking them one by one and adapting to them slowly over time.

It is not a no to C99 forever

I am sure we will bring up this topic for discussion again in the future. We have not closed the door forever or written anything in stone. We have only decided that for the moment we have not been persuaded to switch. Maybe we will in a future.

Other languages

We do not consider switching or rewriting curl into any other language.

Discussion

See reddit and hacker news.

connection filters in curl

In the curl project, one of the holiest and most sacred rules is:

we do not break the API or ABI

Everything else is a matter of discussion.

More features all the time

We keep adding features and we do improvements at a rather high pace. So much that we actually rarely do a release without introducing something new.

To be able to add features and to keep changing curl and making sure that it keeps up with the world around it and that it provides the features and the abilities that a world of Internet transfers needs, we need to make sure that the internals are written correctly. And by correctly, I mean in a way that allows us to extend and change curl when we want to – that doesn’t break the ABIs nor the tests.

Refactors

curl is old and choices sometimes need to be reconsidered. Over the years we have refactored and changed the curl internals and design quite drastically several times. Thanks to an extensive test suite and a library API that was designed from the start to hide most internal choices, this has been possible to do without being visible to users. The upside has been that the internals have become easier to maintain and to extend with more features.

Refactoring again

This time, we are again on a mission to extend the curl feature set as I blogged about recently, and this time we have Stefan Eissing on board to do it.

So, without changing any API or breaking the ABI and having the large set of test cases remain working in the many CI jobs we have, Stefan introduced a new internal concept for curl: connection filters.

Filters

We call them filters but they could also be seen as layers or maybe even domino pieces. Each filter is a piece of network logic and the idea is that we can chain them together at run-time to create protocol cakes (my word). curl can connect to a HTTP proxy, do TLS and speak HTTP/2 over that. That makes three separate filters put together.

Adding for example TLS to the proxy would just be inserting a filter in the right place in the chain, while using the filter-chain is done the same way no matter the filter chain length and independently of which exact filters it consists of.

The previous logic, before the filters, was a more like a vast number of conditional flag checks done in the right order. This new system reduces the amount of conditional checks and it also moves code for handling the different network filters into more localized and compartmentalized functions.

More protocol combos

In addition to the more localized code for specifics features, this new concept more notably makes it easier to build new protocol layer combinations. Adding support for HTTP/2 to the proxy for example, should now ideally be a matter of adding a filter the right way and the transfer pipeline should otherwise “just work”.

Not everything internally is yet converted to filters even if we have merged the first large pull request. Stefan now works on getting more curl code to use this concept before he can get into the actual protocol changes lined up for him.

Performance

The filters do not impact transfer performance, I/O works the same as before.

Details

If you long for more technical details and explanations about this, maybe to be able to dig into the curl source code yourself, then an excellent starting-point is the document in the curl source made for this purpose CONNECTION-FILTERS.md.

curl’s new CA store cache

When setting up a TLS or QUIC connection, a client like curl needs a CA store in order to verify the certificate(s) the server provides in the TLS handshake.

CA store

A CA store is a fancy name for a number of certificates. Certificates for the Certificate Authorities (CAs) that a TLS client trusts. On the curl website, we offer a PEM version of the CA store that Mozilla maintains, for download. This set currently contains 142 certificates and while the exact amount vary a little over time, it has been more than a hundred for many years. A fair amount. And there is nothing in the pipe that will bring down the number significantly anytime soon, to my knowledge. These 142 certificates make up a file that is exactly 225,403 bytes. 1587 bytes per certificate on average.

Load and parse

When setting up a TLS connection, the 142 certificates need to be loaded from the external file into memory and parsed so that the server’s certificate can be verified. So that curl knows that the server it has connected to is indeed the correct server and not a man in the middle, an impostor.

This procedure is a rather costly one, in terms of CPU cycles needed.

Another cache

A classic approach to avoid heavy work is to cache the results from a previous use to be able to reuse them again. Starting in curl 7.87.0 curl introduces a CA store cache.

Now, curl can keep the loaded and parsed CA store in memory associated with the handle and then subsequent requests can avoid re-loading and re-parsing the CA data when new connections are created – if they use the same CA store of course. The performance gain in doing this shortcut can be enormous. After all, most transfers are done using the same single CA store.

To quote the numbers Michael Drake presented in the pull request for this new feature. He measured number of instructions to load and render a particular web page from BBC with the NetSurf browser (which obviously is using libcurl for its HTTPS transfers). With and without this cache.

CA store cacheTotal instruction fetch cost
None5,168,090,712
Enabled1,020,984,411

I think a reduction to one fifth of the original cost is significant.

Converted into a little graph they compare like this (smaller is better):

But even in simpler applications and curl command lines this caching should have a measurable impact as soon as multiple TLS connections are done using the same handle. An extremely common usage pattern.

Life-time

Keeping the data around after use potentially changes the behavior a little, but the huge performance gain made us decide to still do this by default. We compensate this a little by setting the default life-time to 24 hours, so applications that keep handles alive for a very long time will still get the cache flushed and read from file again every day.

The CA store is typically not updated more frequently than once every few months or weeks.

CURLOPT_CA_CACHE_TIMEOUT

This is a new option for libcurl that allows applications to tweak the life-time and CA cache behavior for when the default as described above is not enough.

Details

This CA cache system is so far only supported when curl is built to use OpenSSL or one of its forks. I hope others will get inspired and bring this support for other TLS backends as well as we go forward.

CA cache support for curl was authored by Michael Drake. Thanks!

Append data to the URL query

A new curl option was born: --url-query.

How it started

curl offered the -d / --data option already in its first release back in 1998. curl 4.0. A trusted old friend.

curl also has some companion versions of this option that work slightly differently, but they all have the common feature that they append data to the the request body. Put simply: with these options users construct the body contents to POST. Very useful and powerful. Still today one of the most commonly used curl options, for apparent reasons.

curl -d name=mrsmith -d color=blue https://example.com

Convert to query

A few years into curl’s lifetime, in 2001, we introduced the -G / --get option. This option let you use -d to create a data set, but the data is not sent as a POST body anymore but is instead converted to a query string and used in a GET request.

curl -G -d name=mrsmith -d color=blue https://example.com

This would make curl send a GET request to this URL: https://example.com/?name=mrsmith&color=blue

The “query” is the part of the URL that sits on the right side of the question mark (but before the fragment that if it exists starts with the first # following the question mark).

URL-encode

In 2008 we added --data-urlencode which made it even easier for users and scripts to use these options correctly as now curl itself can URL-encode the given data instead of relying on the user to do it. Previously, script authors would have to do that encoding before passing the data to curl which was tedious and error prone. This feature also works in combination with -G of course.

How about both?

The -d options family make a POST. The -G converts it to a GET.

If you want convenient curl command line options to both make content to send in the POST body and to create query parameters in the URL you were however out of luck. You would then have to go back to use -d but handcraft and encode the query parameters “manually”.

Until curl 7.87.0. Due to ship on December 21, 2022. (this commit)

--url-query is your new friend

This is curl’s 249th command line option and it lets you append parameters to the query part of the given URL, using the same syntax as --data-urlencode uses.

Using this, a user can now conveniently create a POST request body and at the same time add a set of query parameters for the URL which the request uses.

A basic example that sends the same POST body and URL query:

curl -d name=mrsmith -d color=blue --url-query name=mrsmith --url-query color=blue https://example.com

Syntax

I told you it uses the data-urlencode syntax, but let me remind you how that works. You use --url-query [data] where [data] can be provided using these different ways:

contentThis will make curl URL-encode the content and pass that on. Just be careful so that the content does not contain any = or @ symbols, as that will then make the syntax match one of the other cases below!
=contentThis will make curl URL-encode the content and pass that on. The preceding = symbol is not included in the data.
name=contentThis will make curl URL-encode the content part and pass that on. Note that the name part is expected to be URL-encoded already.
@filenameThis will make curl load data from the given file (including any newlines), URL-encode that data and pass it on in the POST.
name@filenameThis will make curl load data from the given file (including any newlines), URL-encode that data and pass it on in the POST. The name part gets an equal sign appended, resulting in name=urlencoded-file-content. Note that the name is expected to be URL-encoded already.
+contentThe data is provided as-is unencoded

For each new --url-query, curl will insert an ampersand (&) between the parts it adds to the query.

Replaces -G

This new friend we call --url-query makes -G rather pointless, as this is a more powerful option that does everything -G ever did and a lot more. We will of course still keep -G supported and working. Because that is how we work.

A boring fact of life is that new versions of curl trickle out into the world rather slowly to ordinary users. Because of this, we can be certain that scripts and users all over will need to keep using -G for yet another undefined period of time.

Trace

Finally: remember that if you want curl to show you what it sends in a POST request, the normal -v / --verbose does not suffice as it will not show you the request body. You then rather need to use --trace or --trace-ascii.

thehttpworkshop2022-day3.txt

The last day of this edition of the HTTP workshop. Thursday November 3, 2022. A half day only. Many participants at the Workshop are going to continue their UK adventure and attend the IETF 115 in London next week.

We started off the day with a deep dive into connection details. How to make connections for HTTP – in particular on mobile devices. How to decide which IP to use, racing connections, timeouts, when to consider a connection attempt “done” (ie after the TCP SYNACK or after the TLS handshake is complete). QUIC vs TCP vs TLS and early data. IPv4 vs IPv6.

ECH. On testing, how it might work, concerns. Statistics are lies. What is the success expectancy for this and what might be the explanations for failures. What tests should be done and what answers about ECH in the wild would we like to get answered going forward?

Dan Stahr: Making a HTTP client good. Discussions around what to expose, not to expose and how HTTP client APIs have been written or should be written. Adobe has its own version of fetch for server use.

Mike Bishop brought up an old favorite subject: a way for a server to provide hints to a client on how to get its content from another host. Is it time to revive this idea? Blind caching the idea and concept was called in an old IETF 95 presentation by Martin Thomson. A host that might be closer the client or faster etc or simply that has the content in question could be used for providing content.

Mark Nottingham addressed the topic of HTTP (interop) testing. Ideas were raised around some kind of generic infrastructure for doing tests for HTTP implementations. I think people were generally positive but I figure time will tell what if anything will actually happen with this.

What’s next for HTTP? Questions were asked to the room, mostly of the hypothetical kind, but I don’t think anyone actually had the nerve to suggest they actually have any clear ideas for that at this moment. At least not in this way. HTTP/4 has been mentioned several times during these days, but then almost always in a joke context.

The End

This is how the workshop ends for this time. Super fun, super informative, packed with information with awesome people. One of these events that give me motivation and a warm fuzzy feeling inside for an extended period of time into the future.

Thanks everyone for your excellence. Thank you organizers for a fine setup!

Workshop season 5 episode 2

Day one was awesome. Now we take the next step.

The missing people

During the discussions today it was again noticeable that apart from some specific individuals we also lack people in the room from prominent “players” in this area such as Chrome developers or humans with closer knowledge of how larger content deliverer work such as Netflix. Or what about a search engine person?

Years ago Chrome was represented well and the Apple side of the world was weak, but the situation seems to have been totally reversed by now.

The day

Mike Bishop talked about the complexities of current internet. Redirects before, during, after. HTTPS records. Alt-svc. Alt-SvcB. Use the HTTPS record for that alternative name.

This presentation triggered a long discussion on how to do things, how things could be done in a future and how the different TTLs in this scenario should or could interact. How to do multi-CDN, how to interact with DNS and what happens if a CDN wants to disable QUIC?

A very long discussion that mostly took us all back to square one in the end. The alt-svcb proposal as is.

Alan Frindell talked about HTTP priorities. What it is (only within connections, it has to be more than one thing and it something goes faster something else needs to go slower, re-prioritization takes a RTT/2 for a client to take effect), when to prioritize (as late as possible, for h2 just before committing to TCP).

Prioritizing images in Meta’s apps. On screen, close, not close. Guess first, then change priority once it knows better. An experiment going on is doing progressive images. Change priority of image transfers once they have received a certain amount. Results and conclusions are pending.

Priority within a single video. An important clue to successful priority handling seems to be more content awareness in the server side. By knowing what is being delivered, the server can make better decisions without the client necessarily having to say anything.

Lunch

I had Thai food. It was good.

Afternoon

Lucas Pardue talked about in HTTP vs tunneling over HTTP. Layers and layers of tunneling and packets within packets. Then Oblivious HTTP, another way of tunneling data and HTTP over HTTP. Related document: RFC 9297 – HTTP Datagrams and the Capsule Protocol. Example case on the Cloudflare blog.

Mark Nottingham talked on Structured Field Values for HTTP and the Retrofit Structured Fields for HTTP.

Who is doing SF libraries and APIs? Do we need an SF schema?

SF compression? Mark’s experiment makes it pretty much on par with HPACK.

Binary SF. Takes ~40% less time parsing in Mark’s experiment. ~10% larger in size (for now).

The was expressed interest in continuing this experimenting going forward. Reducing the CPU time for header parsing is considered valuable.

Sebastian Poreba talked “Networking in your pocket” about the challenges and architectures of mobile phone and smartwatch networking. Connection migration is an important property of QUIC that is attractive in the mobile world. H3 is good.

A challenge is to keep the modem activity off as much as possible and do network activities when the modem is already on.

Beers

My very important duties during these days also involved spending time in pubs and drinking beers with this awesome group of people. This not only delayed my blog post publish times a little, but it might also have introduced an ever so slight level of “haziness” into the process and maybe, just maybe, my recollection of the many details from the day is not exactly as detailed as it could otherwise have been.

That’s just the inevitable result of me sacrificing myself for the team. I did it for us all. You’re welcome.

Another full day

Another day fully packed with HTTP details from early to late. From 9am in the morning until 11pm in the evening. I’m having a great time. Tomorrow is the last day. I’ll let you know what happens then.

HTTP Workshop 2022 – day 1

The HTTP Workshop is an occasional gathering of HTTP experts and other interested parties to discuss the Web’s foundational protocol.

The fifth HTTP Workshop is a three day event that takes place in Oxford, UK. I’m happy to say that I am attending this one as well, as I have all the previous occasions. This is now more than seven years since the first one.

Attendees

A lot of the people attending this year have attended previous workshops, but in a lot of cases we are now employed by other companies then when we attended our first workshops. Almost thirty HTTP stack implementers, experts and spec authors in a room.

There was a few newcomers and some regulars are absent (and missed). Unfortunately, we also maintain the lack of diversity from previous years. We are of course aware of this, but have still failed to improve things in this area very much.

Setup

All the people gather in the same room. A person talks briefly on a specific topic and then we have a free-form discussion about it. When I write this, the slides from today’s presentations have not yet been made available so I cannot link them here. I will add those later.

Discussions

Mark’s introduction talk.

Lucas Pardue started the morning with a presentation and discussion around logging, tooling and the difficulty of debugging modern HTTP setups. With binary protocols doing streams and QUIC, qlog and qvis are in many ways the only available options but they are not supported by everything and there are gaps in what they offer. Room for improvements.

Anne van Kesteren showed a proposal from Domenic Denicola for a No-Vary-Search response header. An attempt to specify a way for HTTP servers to tell clients (and proxies) that parts of the query string produces the same response/resource and should be cached and treated as the same.

An issue that is more complicated than what if first might seem. The proposal has some fans in the room but I think the general consensus was that it is a bad name for the header…

Martin Thomson talked about new stuff in HTTP.

HTTP versions. HTTP/3 is used in over a third of the requests both as seen by Cloudflare server-side and measured in Firefox telemetry client-side. Extensible and functional protocols. Nobody is talking about or seeing a point in discussing about a HTTP/4.

WebSocket over h2/h3. There does not seem to be any particular usage and nobody mentioned any strong reason or desired to change the status. Web Transport is probably what instead is going to be the future.

Frames. Discussions around the use and non-use of the origin frame. Note widely used. Could help to avoid extra “SNI leakage” and extra connections.

Anne then took a second round in front of the room and questioned us on the topic of cookies. Or perhaps more specifically about details in the spec and how to possible change the spec going forward. At least one person in the room insisted fairly strongly than any such restructures of said documents should be done after the ongoing 6265bis work is done.

Dinner

A company very generously sponsored a group dinner for us in the evening and I had a great time. I was asked to not reveal the name of said company, but I can tell you that a lot of the conversations at the table, at least in the area where I was parked, kept up the theme from the day and were HTTP oriented. Including oblivious HTTP, IPv4 formatting allowed in URLs and why IP addresses should not be put in the SNI field. Like any good conversion among friends.