curl localhost as a local host

When you use the name localhost in a URL, what does it mean? Where does the network traffic go when you ask curl to download http://localhost ?

Is “localhost” just a name like any other or do you think it infers speaking to your local host on a loopback address?

Previously

curl http://localhost

The name was “resolved” using the standard resolver mechanism into one or more IP addresses and then curl connected to the first one that works and gets the data from there.

The (default) resolving phase there involves asking the getaddrinfo() function about the name. In many systems, it will return the IP address(es) specified in /etc/hosts for the name. In some systems things are a bit more unusually setup and causes a DNS query get sent out over the network to answer the question.

In other words: localhost was not really special and using this name in a URL worked just like any other name in curl. In most cases in most systems it would resolve to 127.0.0.1 and ::1 just fine, but in some cases it would mean something completely different. Often as a complete surprise to the user…

Starting now

curl http://localhost

Starting in commit 1a0ebf6632f8, to be released in curl 7.78.0, curl now treats the host name “localhost” specially and will use an internal “hard-coded” set of addresses for it – the ones we typically use for the loopback device: 127.0.0.1 and ::1. It cannot be modified by /etc/hosts and it cannot be accidentally or deliberately tricked by DNS resolves. localhost will now always resolve to a local address!

Does that kind of mistakes or modifications really happen? Yes they do. We’ve seen it and you can find other projects report it as well.

Who knows, it might even be a few microseconds faster than doing the “full” resolve call.

(You can still build curl without IPv6 support at will and on systems without support, for which the ::1 address of course will not be provided for localhost.)

Specs say we can

The RFC 6761 is titled Special-Use Domain Names and in its section 6.3 it especially allows or even encourages this:

Users are free to use localhost names as they would any other domain names.  Users may assume that IPv4 and IPv6 address queries for localhost names will always resolve to the respective IP loopback address.

Followed by

Name resolution APIs and libraries SHOULD recognize localhost names as special and SHOULD always return the IP loopback address for address queries and negative responses for all other query types. Name resolution APIs SHOULD NOT send queries for localhost names to their configured caching DNS server(s).

Mike West at Google also once filed an I-D with even stronger wording suggesting we should always let localhost be local. That wasn’t ever turned into an RFC though but shows a mindset.

(Some) Browsers do it

Chrome has been special-casing localhost this way since 2017, as can be seen in this commit and I think we can safely assume that the other browsers built on their foundation also do this.

Firefox landed their corresponding change during the fall of 2020, as recorded in this bugzilla entry.

Safari (on macOS at least) does however not do this. It rather follows what /etc/hosts says (and presumably DNS of not present in there). I’ve not found any official position on the matter, but I found this source code comment indicating that localhost resolving might change at some point:

// FIXME: Ensure that localhost resolves to the loopback address.

Windows (kind of) does it

Since some time back, Windows already resolves “localhost” internally and it is not present in their /etc/hosts alternative. I believe it is more of a hybrid solution though as I believe you can put localhost into that file and then have that custom address get used for the name.

Secure over http://localhost

When we know for sure that http://localhost is indeed a secure context (that’s a browser term I’m borrowing, sorry), we can follow the example of the browsers and for example curl should be able to start considering cookies with the “secure” property to be dealt with over this host even when done over plain HTTP. Previously, secure in that regard has always just meant HTTPS.

This change in cookie handling has not happened in curl yet, but with localhost being truly local, it seems like an improvement we can proceed with.

Can you still trick curl?

When I mentioned this change proposal on twitter two of the most common questions in response were

  1. can’t you still trick curl by routing 127.0.0.1 somewhere else
  2. can you still use --resolve to “move” localhost?

The answers to both questions are yes.

You can of course commit the most hideous hacks to your system and reroute traffic to 127.0.0.1 somewhere else if you really wanted to. But I’ve never seen or heard of anyone doing it, and it certainly will not be done by mistake. But then you can also just rebuild your curl/libcurl and insert another address than the default as “hardcoded” and it’ll behave even weirder. It’s all just software, we can make it do anything.

The --resolve option is this magic thing to redirect curl operations from the given host to another custom address. It also works for localhost, since curl will check the cache before the internal resolve and --resolve populates the DNS cache with the given entries. (Provided to applications via the CURLOPT_RESOLVE option.)

What will break?

With enough number of users, every single little modification or even improvement is likely to trigger something unexpected and undesired on at least one system somewhere. I don’t think this change is an exception. I fully expect this to cause someone to shake their fist in the sky.

However, I believe there are fairly good ways to make to restore even the most complicated use cases even after this change, even if it might take some hands on to update the script or application. I still believe this change is a general improvement for the vast majority of use cases and users. That’s also why I haven’t provided any knob or option to toggle off this behavior.

Credits

The top photo was taken by me (the symbolism being that there’s a path to take somewhere but we don’t really know where it leads or which one is the right to take…). This curl change was written by me. Mike West provided me the Chrome localhost change URL. Valentin Gosu gave me the Firefox bugzilla link.

Taking hyper-curl further

Thanks to funding by ISRG (via Google), we merged the hyper powered HTTP back-end into curl earlier this year as an alternative HTTP/1 and HTTP/2 implementation. Previously, there was only one way to do HTTP/1 and 2 in curl.

Backends

Core libcurl functionality can be powered by optional and alternative backends in a way that doesn’t change the API or directly affect the application. This is done by featuring internal APIs that can be implemented by independent components. See the illustration below (click for higher resolution).

This is a slide from Daniel’s libcurl under the hood presentation.

curl 7.75.0 became the first curl release that could be built with hyper. The support for it was labeled “experimental” as while most of all common and basic use cases were supported, we still couldn’t run the full test suite when built with it and some edge cases even crashed.

We’ve subsequently fixed a few of the worst flaws so the Hyper powered curl has gradually and slowly improved since then.

Going further

Our best friends at ISRG has now once again put up funding and I’ll spend more work hours on making sure that more (preferably all) tests can run with hyper.

I’ve already started. Right now I’m sitting and staring at test case 154 which is doing a HTTP PUT using Digest authentication and an Expect: 100-continue header and this test case currently doesn’t work correctly when built to use Hyper. I’ll report back in a few weeks and let you know how it goes – and then I don’t mean with just test 154!

Consider yourself invited to join the #curl IRC channel and chat if you want live reports or want to help out!

Fund

You too can fund me to do curl work. Get in touch!

Giving away an insane amount of curl stickers

Part 1. The beginning. (There will be at least one more part later on following up the progress.)

On May 18, 2021 I posted a tweet that I was giving away curl stickers for free to anyone who’d submit their address to me. It looked like this:

Everyone once in a while when I post a photo that involves curl stickers, a few people ask me where they can get hold of such. I figured it was about time I properly offered “the world” some. I expected maybe 50 or a 100 people would take me up on this offer.

The response was totally overwhelming and immediate. Within the first hour 270 persons had already requested stickers. After 24 hours when I closed the form again, 1003 addresses had been submitted. To countries all around the globe. Quite the avalanche.

Assessing the damage

This level of interest put up some challenges I hadn’t planned for. Do I have stickers enough? Now suddenly doing 3 or 5 stickers per parcel will have a major impact. Getting envelops and addresses onto them for a thousand deliveries is quite a job! Not to mention the cost. A “standard mail” to outside Sweden using the regular postal service is 24 SEK. That’s ~2.9 USD. Per parcel. Add the extra expenses and we’re at an adventure north of 3,000 USD.

For this kind of volume, I can get a better rate by registering as a “company customer”. It adds some extra work for me though but I haven’t worked out the details around this yet.

Let me be clear: I already from the beginning planned to ask for reimbursement from the curl fund for my expenses for this stunt. I would mostly add my work on this for free. Maybe “hire” my daughter for an extra set of hands.

Donations

During the time the form was up, we also received 51 donations to Open Collective (as the form mentioned that, and I also mentioned it on Twitter several times). The donated total was 943 USD. The average donation was 18 USD, the largest ones (2) were at 100 USD and the smallest was 2 USD.

Of course some donations might not be related to this and some donations may very well arrive after this form was closed again.

Cleaning up

If I had thought this through better at the beginning, I would not have asked for the address using a free text field like this. People clearly don’t have the same idea of how to do this as I do.

I had to manually go through the addresses to insert newlines, add country names and remove obviously broken addresses. For example, a common pattern was addresses added with only a 6-8 digit number? I think over 20 addresses were specified like that!

Clearly there’s a lesson to be had there.

After removing obviously bad and broken addresses there were 978 addresses left.

Countries

I got postal addresses to 65 different countries. A surprisingly diverse collection I think. The top 10 countries were:

USA174
Sweden103
Germany93
India92
UK64
France56
Spain31
Brazil31
The Netherlands24
Switzerland20

Countries that were only entered once: Dubai, Iran, Japan, Latvia, Morocco, Nicaragua, Philippines, Romania, Serbia, Thailand, Tunisia, UAE, Ukraine, Uruguay, Zimbabwe

Figuring out the process

While I explicitly said I wouldn’t guarantee that everyone gets stickers, I want to do my best in delivering a few to every single one who asked for them.

Volunteers

I have the best community. Without me saying a word or asking for it, several people raised their hands and volunteered to offload the sending to their countries. I could send one big batch to them and they redistribute within their countries. They would handle US, Czechia, Denmark and Switzerland for me.

But why stop at those four? In my next step I put up a public plea for more volunteers on Twitter and man, I got myself a busy evening and after a few hours I had friends signed up from over 20 countries offering to redistributed stickers within the respective countries. This way, we share the expenses and the work load, and mailing out many smaller parcels within countries is also a lot cheaper than me sending them all individually from Sweden.

After a lot of communications I had an army of helpers lined up.

28 distributors will help me do 724 sticker deliveries to 24 countries. Leaving me to do just the remaining 282 packages to the other 41 countries.

Stickers inventory

I’ve offered “a few” stickers and I decided that means 4.

978 * 4 = 3912

Plus I want to add 10 extra stickers to each distributor, and there are 28 distributors.

3912 + 28 * 10 = 4192

Do I have 4200 curl stickers? I emptied my sticker drawer and put them all on the table and took this photo. All of these curl stickers you see on the photo have been donated to us/me by sponsors. Most of the from Sticker Mule.

I think I might be a little “thin”. Luckily, I have friends that can help me stock up

(There are some Haxx and wolfSSL stickers on the photo as well, because I figured I should spice up some packages with some of those as well.)

Schedule

The stickers still haven’t shipped from my place but the plan is to get the bulk of them shipped from me within days. Stay tuned. There will of course be more delays on the route to their destinations, but rest assured that we intend to deliver to all who asked for them!

Will I give away more curl stickers?

Not now, and I don’t have any plans on doing this stunt again very soon. It was already way more than I expected. More attention, more desire and definitely a lot more work!

But at the first opportunity where you meet me physically I will of course give away stickers.

Buy curl stickers?

I’ve started looking into offering stickers for purchase but I’m not ready to make anything public or official yet. Stay tuned and I promise you’ll learn and be told when the sticker shop opens.

If it happens, the stickers will not be very cheap but you should rather see each such sticker as a mini-sponsorship.

Follow up

Stay tuned. I will be back with updates. See Part 2.

QUIC is RFC 9000

The official publication date of the relevant QUIC specifications is: May 27, 2021.

I’ve done many presentations about HTTP and related technologies over the years. HTTP/2 had only just shipped when the QUIC working group had been formed in the IETF and I started to mention and describe what was being done there.

I’ve explained HTTP/3

I started writing the document HTTP/3 explained in February 2018 before the protocol was even called HTTP/3 (and yeah the document itself was also called something else at first). The HTTP protocol for QUIC was just called “HTTP over QUIC” in the beginning and it took until November 2018 before it got the name HTTP/3. I did my first presentation using HTTP/3 in the title and on slides in early December 2018, My first recorded HTTP/3 presentation was in January 2019 (in Stockholm, Sweden).

In that talk I mentioned that the protocol would be “live” by the summer of 2019, which was an optimistic estimate based on the then current milestones set out by the IETF working group.

I think my optimism regarding the release schedule has kept up but as time progressed I’ve updated that estimation many times…

HTTP/3 – not yet

The first four RFC documentations to be ratified and published only concern QUIC, the transport protocol, and not the HTTP/3 parts. The two HTTP/3 documents are also in queue but are slightly delayed as they await some other prerequisite (“generic” HTTP update) documents to ship first, then the HTTP/3 ones can ship and refer to those other documents.

QUIC

QUIC is a new transport protocol. It is done over UDP and can be described as being something of a TCP + TLS replacement, merged into a single protocol.

Okay, the title of this blog is misleading. QUIC is actually documented in four different RFCs:

RFC 8999 – Version-Independent Properties of QUIC

RFC 9000 – QUIC: A UDP-Based Multiplexed and Secure Transport

RFC 9001 – Using TLS to Secure QUIC

RFC 9002 – QUIC Loss Detection and Congestion Control

My role: I’m just a bystander

I initially wanted to keep up closely with the working group and follow what happened and participate on the meetings and interims etc. It turned out to be too difficult for me to do that so I had to lower my ambitions and I’ve mostly had a casual observing role. I just couldn’t muster the energy and spend the time necessary to do it properly.

I’ve participated in many of the meetings, I’ve been present in the QUIC implementers slack, I’ve followed lots of design and architectural discussions on the mailing list and in GitHub issues. I’ve worked on implementing support for QUIC and h3 in curl and thanks to that helped out iron issues and glitches in various implementations, but the now published RFCs have virtually no traces of me or my feedback in them.

curl 7.77.0 – 200 OK

Welcome to the 200th curl release. We call it 200 OK. It coincides with us counting more than 900 commit authors and surpassing 2,400 credited contributors in the project. This is also the first release ever in which we thank more than 80 persons in the RELEASE-NOTES for having helped out making it and we’ve set two new record in the bug-bounty program: the largest single payout ever for a single bug (2,000 USD) and the largest total payout during a single release cycle: 3,800 USD.

This release cycle was 42 days only, two weeks shorter than normal due to the previous 7.76.1 patch release.

Release Presentation

Numbers

the 200th release
5 changes
42 days (total: 8,468)

133 bug-fixes (total: 6,966)
192 commits (total: 27,202)
0 new public libcurl function (total: 85)
2 new curl_easy_setopt() option (total: 290)

2 new curl command line option (total: 242)
82 contributors, 44 new (total: 2,410)
47 authors, 23 new (total: 901)
3 security fixes (total: 103)
3,800 USD paid in Bug Bounties (total: 9,000 USD)

Security

We set two new records in the curl bug-bounty program this time as mentioned above. These are the issues that made them happen.

CVE-2021-22901: TLS session caching disaster

This is a Use-After-Free in the OpenSSL backend code that in the absolutely worst case can lead to an RCE, a Remote Code Execution. The flaw is reasonably recently added and it’s very hard to exploit but you should upgrade or patch immediately.

The issue occurs when TLS session related info is sent from the TLS server when the transfer that previously used it is already done and gone.

The reporter was awarded 2,000 USD for this finding.

CVE-2021-22898: TELNET stack contents disclosure

When libcurl accepts custom TELNET options to send to the server, it the input parser was flawed which could be exploited to have libcurl instead send contents from the stack.

The reporter was awarded 1,000 USD for this finding.

CVE-2021-22897: schannel cipher selection surprise

In the Schannel backend code, the selected cipher for a transfer done with was stored in a static variable. This caused one transfer’s choice to weaken the choice for a single set transfer could unknowingly affect other connections to a lower security grade than intended.

The reporter was awarded 800 USD for this finding.

Changes

In this release we introduce 5 new changes that might be interesting to take a look at!

Make TLS flavor explicit

As explained separately, the curl configure script no longer defaults to selecting a particular TLS library. When you build curl with configure now, you need to select which library to use. No special treatment for any of them!

No more SSL

curl now has no more traces of support for SSLv2 or SSLv3. Those ancient and insecure SSL versions were already disabled by default by TLS libraries everywhere, but now it’s also impossible to activate them even in special build. Stripped out from both the curl tool and the library (thus counted as two changes).

HSTS in the build

We brought HSTS support a while ago, but now we finally remove the experimental label and ship it enabled in the build by default for everyone to use it more easily.

In-memory cert API

We introduce API options for libcurl that allow users to specify certificates in-memory instead of using files in the file system. See CURLOPT_CAINFO_BLOB.

Favorite bug-fixes

Again we manage to perform a large amount of fixes in this release, so I’m highlighting a few of the ones I find most interesting!

Version output

The first line of curl -V output got updated: libcurl now includes OpenLDAP and its version of that was used in the build, and then the curl tool can add libmetalink and its version of that was used in the build!

curl_mprintf: add description

We’ve provided the *printf() clone functions in the API since forever, but we’ve tried to discourage users from using them. Still, now we have a first shot at a man page that clearly describes how they work.

This is important as they’re not quite POSIX compliant and users who against our advice decide to rely on them need to be able to know how they work!

CURLOPT_IPRESOLVE: preventing wrong IP version from being used

This option was made a little stricter than before. Previously, it would be lax about existing connections and prefer reuse instead of resolving again, but starting now this option makes sure to only use a connection with the request IP version.

This allows applications to explicitly create two separate connections to the same host using different IP versions when desired, which previously libcurl wouldn’t easily let you do.

Ignore SIGPIPE in curl_easy_send

libcurl makes its best at ignoring SIGPIPE everywhere and here we identified a spot where we had missed it… We also made sure to enable the ignoring logic when built to use wolfSSL.

Several HTTP/2-fixes

There are no less than 6 separate fixes mentioned in the HTTP/2 module in this release. Some potential memory leaks but also some more behavior improving things. Possibly the most important one was the move of the transfer-related error code from the connection struct to the transfers struct since it was vulnerable to a race condition that could make it wrong. Another related fix is that libcurl no longer forcibly disconnects a connection over which a transfer gets HTTP_1_1_REQUIRED returned.

Partial CONNECT requests

When the CONNECT HTTP request sent to a proxy wasn’t all sent in a single send() call, curl would fail. It is baffling that this bug hasn’t been found or reported earlier but was detected this time when the reporter issued a CONNECT request that was larger than 16 kilobytes…

TLS: add USE_HTTP2 define

There was several remaining bad assumptions that HTTP/2 support in curl relies purely on nghttp2. This is no longer true as HTTP/2 support can also be provide by hyper.

normalize numerical IPv4 hosts

The URL parser now knows about the special IPv4 numerical formats and parses and normalizes URLs with numerical IPv4 addresses.

Timeout, timed out libssh2 disconnects too

When libcurl (built with libssh2 support) stopped an SFTP transfer because a timeout was triggered, the following SFTP disconnect procedure was subsequently also stopped because of the same timeout and therefore wasn’t allowed to properly clean up everything, leading to a memory-leak!

IRC network switch

We moved the #curl IRC channel to the new network libera.chat. Come join us there!

Next release

On Jul 21, 2021 we plan to ship the next release. The version number for that is not yet decided but we have changes in the pipeline, making a minor version number bump very likely.

Credits

7.77.0 release image by Filip Dimitrovski.

The curl user survey 2021

For the eighth consecutive year we run the annual curl user survey again in 2021. The form just went up and I would love to have you spend 10 minutes of your busy life to tell us how you think curl works, what doesn’t work and what we should do next.

We have no tracking on the website and we have no metrics or usage measurements of the curl tool or the libcurl library. The only proper way we have left to learn how users and people in general think of us and how curl works, is to ask. So this is what we do, and we limit the asking to once per year.

You can also view this from your own “selfish” angle: this is a way for you to submit your input, your opinions and we will listen.

The survey will be up two weeks during which I hope to get as many people as possible to respond. If you have friends you know use curl or libcurl, please have them help us out too!

Take the survey

Yes really, please take the survey!

Bonus: see the extensive analysis of the 2020 user survey. There’s a lot of user feedback to learn from it.

“I could rewrite curl”

Collected quotes and snippets from people publicly sneezing off or belittling what curl is, explaining how easy it would be to make a replacement in no time with no effort or generally not being very helpful.

These are statements made seriously. For all I know, they were not ironic. If you find others to add here, please let me know!

Listen. I’ve been young too once and I’ve probably thought similar things myself in the past. But there’s a huge difference between thinking and saying. Quotes included here are mentioned for our collective amusement.

I can do it in less than a 100 lines

[source]

I can do it in a three day weekend

(The yellow marking in the picture was added by me.)

[source]

No reason to be written in C

Maybe not exactly in the same category as the two ones above, but still a significant “I know this” vibe:

[source]

We sold a curl exploit

Some people deliberately decides to play for the other team.

[source]

This isn’t a big deal

It’s easy to say things on Twitter…

This tweet was removed by its author after I and others replied to it so I cannot link it. The name has been blurred on purpose because of this.

100 lines of Python

“I think you could replace 99% of the uses of Curl … with like 100 lines of Python or Rust or Go”

[source]

I should just fix curl

I reported a vulnerable old curl installation being hosted by nuget, only to get this user tell me… to fix the vulnerabilities we already fixed long ago.

Discussions

Hacker news, Reddit

200 OK

One day in March 1998 I released a little file transfer tool I called curl. The first ever curl release. That was good.

10

By the end of July the same year, I released the 10th curl release. I’ve always believed in release early release often as a service to users and developers alike.

20

In December 1998 I released the 20th curl release. I started to get a hang of this.

50

In January 2001, not even three years in, we shipped the 50th curl release (version 7.5.2). We were really cramming them out back then!

200

Next week. 23 years, two months and six days after the first release, we will ship the 200th curl release. We call it curl 7.77.0.

Yes, there are exactly 200 stickers used in the photo. But the visual comparison with 50 is also apt: it isn’t that big difference seen from a distance.

I’ve personally done every release to date, but there’s nothing in the curl release procedure that says it has to be me, as long as the uploader has access to put the new packages on the correct server.

The fact that 200 is an HTTP status code that is indicating success is an awesome combination.

Release cadence

In 2014 we formally switched to an eight week release cycle. It was more or less what we already used at the time, but from then on we’ve had it documented and we’ve tried harder to stick to it.

Assuming no alarmingly bad bugs are found, we let 56 days pass until we ship the next release. We occasionally slip up and fail on this goal, and then we usually do a patch release and cut the next cycle short. We never let the cycle go longer than those eight weeks. This makes us typically manage somewhere between 6 and 10 releases per year.

Lessons learned

  • Make a release checklist, and stick to that when making releases
  • Update the checklist when needed
  • Script as much as possible of the procedure
  • Verify the release tarballs/builds too in CI
  • People never test your code properly until you actually release
  • No matter how hard you try, some releases will need quick follow-up patch releases
  • There is always another release
  • Time-based release scheduling beats feature-based

curl -G vs curl -X GET

(This is a repost of a stackoverflow answer I once wrote on this topic. Slightly edited. Copied here to make sure I own and store my own content properly.)

curl knows the HTTP method

You normally use curl without explicitly saying which request method to use.

If you just pass in a HTTP URL like curl http://example.com, curl will use GET. If you use -d or -F curl will use POST, -I will cause a HEAD and -T will make it a PUT.

If for whatever reason you’re not happy with these default choices that curl does for you, you can override those request methods by specifying -X [WHATEVER]. This way you can for example send a DELETE by doing curl -X DELETE [URL].

It is thus pointless to do curl -X GET [URL] as GET would be used anyway. In the same vein it is pointless to do curl -X POST -d data [URL]... But you can make a fun and somewhat rare request that sends a request-body in a GET request with something like curl -X GET -d data [URL].

Digging deeper

curl -GET (using a single dash) is just wrong for this purpose. That’s the equivalent of specifying the -G, -E and -T options and that will do something completely different.

There’s also a curl option called --get to not confuse matters with either. It is the long form of -G, which is used to convert data specified with -d into a GET request instead of a POST.

(I subsequently used this answer to populate the curl FAQ to cover this.)

Warnings

Modern versions of curl will inform users about this unnecessary and potentially harmful use of -X when verbose mode is enabled (-v) – to make users aware. Further explained and motivated here.

-G converts a POST + body to a GET + query

You can ask curl to convert a set of -d options and instead of sending them in the request body with POST, put them at the end of the URL’s query string and issue a GET, with the use of `-G. Like this:

curl -d name=daniel -d grumpy=yes -G https://example.com/

… which does the exact same thing as this command:

curl https://example.com/?name=daniel&grumpy=yes

The libcurl transfer state machine

I’ve worked hard on making the presentation I ended up calling libcurl under the hood. A part of that presentation is spent on explaining the main libcurl transfer state machine and here I’ll try to document some of what, in a written form. Understanding the main transfer state machine in libcurl could be valuable and interesting for anyone who wants to work on libcurl internals and maybe improve it.

Background

The state is kept in easy handle in the struct field called mstate. The source file for this state machine is called multi.c.

An easy handle is always in exactly one of these states for as long as it exists.

This transfer state machine is designed to work for all protocols libcurl supports, but basically no protocol will transition through all states. As you can see in the drawing, there are many different possible transitions from a lot of the states.

libcurl transfer state machine

(click the image for a larger version)

Start

A transfer starts up there above the surface in the INIT state. That’s a yellow box next to the little start button. Basically the boat shows how it goes from INIT to the right over to MSGSENT with it’s finish flag, but the real path is all done under the surface.

The yellow boxes (states) are the ones that exist before or when a connection is setup. The striped background is for all states that has a single and specific connectdata struct associated with the transfer.

CONNECT

If there’s a connection limit, either in total or per host etc, the transfer can get sent to the PENDING state to wait for conditions to change. If not, the state probably moves on to one of the blue ones to resolve host name and connect to the server etc. If a connection could be reused, it can shortcut immediately over to the green DO state.

The green states are all about setting up the connection to a state of fully connected, authenticated and logged in. Ready to send the first request.

DO

The green DO states are all about sending the request with one or more commands so that the file transfer can begin. There are several such states to properly support all protocols but also for historical reasons. We could probably remove a state there by some clever reorgs if we wanted.

PERFORMING

When a request has been issued and the transfer starts, it transitions over to PERFORMING. In the white states data is flowing. Potentially a lot. Potentially in both or either direction. If during the transfer curl finds out that the transfer is faster than allowed, it will move into RATELIMITING until it has cooled down a bit.

DONE

All the post-transfer states are red in the picture. The DONE is the first of them and after having done what it needs to round up the transfer, it disassociates with the connection and moves to COMPLETED. There’s no stripes behind that state. Disassociate here means that the connection is returned back to the connection pool for later reuse, or in the worst case if deemed that it can’t be reused or if the application has instructed it so, closed.

As you’ll note, there’s no disconnect anywhere in the state machine. This is simply because the disconnect is not really a part of the transfer at all.

COMPLETED

This is the end of the road. In this state a message will be created and put in the outgoing queue for the application to read, and then as a final last step it moves over to MSGSENT where nothing more happens.

A typical handle remains in this state until the transfer is reused and restarted, in which it will be set back to the INIT state again and the journey begins again. Possibly with other transfer parameters and URL this time. Or perhaps not.

State machines within each state

What this state diagram and explanation doesn’t show is of course that in each of these states, there can be protocol specific handling and each of those functions might in themselves of course have their own state machines to control what to do and how to handle the protocol details.

Each protocol in libcurl has its own “protocol handler” and most of the protocol specific stuff in libcurl is then done by calls from the generic parts to the protocol specific parts with calls like protocol_handler->proto_connect() that calls the protocol specific connection procedure.

This allows the generic state machine described in this blog post to not really know the protocol specifics and yet all the currently support 26 transfer protocols can be supported.

libcurl under the hood – the video

Here’s the full video of libcurl under the hood.

If you want to skip directly to the state machine diagram and the following explanation, go here.

Credits

Image by doria150 from Pixabay