The libcurl transfer state machine

I’ve worked hard on making the presentation I ended up calling libcurl under the hood. A part of that presentation is spent on explaining the main libcurl transfer state machine and here I’ll try to document some of what, in a written form. Understanding the main transfer state machine in libcurl could be valuable and interesting for anyone who wants to work on libcurl internals and maybe improve it.

Background

The state is kept in easy handle in the struct field called mstate. The source file for this state machine is called multi.c.

An easy handle is always in exactly one of these states for as long as it exists.

This transfer state machine is designed to work for all protocols libcurl supports, but basically no protocol will transition through all states. As you can see in the drawing, there are many different possible transitions from a lot of the states.

libcurl transfer state machine

(click the image for a larger version)

Start

A transfer starts up there above the surface in the INIT state. That’s a yellow box next to the little start button. Basically the boat shows how it goes from INIT to the right over to MSGSENT with it’s finish flag, but the real path is all done under the surface.

The yellow boxes (states) are the ones that exist before or when a connection is setup. The striped background is for all states that has a single and specific connectdata struct associated with the transfer.

CONNECT

If there’s a connection limit, either in total or per host etc, the transfer can get sent to the PENDING state to wait for conditions to change. If not, the state probably moves on to one of the blue ones to resolve host name and connect to the server etc. If a connection could be reused, it can shortcut immediately over to the green DO state.

The green states are all about setting up the connection to a state of fully connected, authenticated and logged in. Ready to send the first request.

DO

The green DO states are all about sending the request with one or more commands so that the file transfer can begin. There are several such states to properly support all protocols but also for historical reasons. We could probably remove a state there by some clever reorgs if we wanted.

PERFORMING

When a request has been issued and the transfer starts, it transitions over to PERFORMING. In the white states data is flowing. Potentially a lot. Potentially in both or either direction. If during the transfer curl finds out that the transfer is faster than allowed, it will move into RATELIMITING until it has cooled down a bit.

DONE

All the post-transfer states are red in the picture. The DONE is the first of them and after having done what it needs to round up the transfer, it disassociates with the connection and moves to COMPLETED. There’s no stripes behind that state. Disassociate here means that the connection is returned back to the connection pool for later reuse, or in the worst case if deemed that it can’t be reused or if the application has instructed it so, closed.

As you’ll note, there’s no disconnect anywhere in the state machine. This is simply because the disconnect is not really a part of the transfer at all.

COMPLETED

This is the end of the road. In this state a message will be created and put in the outgoing queue for the application to read, and then as a final last step it moves over to MSGSENT where nothing more happens.

A typical handle remains in this state until the transfer is reused and restarted, in which it will be set back to the INIT state again and the journey begins again. Possibly with other transfer parameters and URL this time. Or perhaps not.

State machines within each state

What this state diagram and explanation doesn’t show is of course that in each of these states, there can be protocol specific handling and each of those functions might in themselves of course have their own state machines to control what to do and how to handle the protocol details.

Each protocol in libcurl has its own “protocol handler” and most of the protocol specific stuff in libcurl is then done by calls from the generic parts to the protocol specific parts with calls like protocol_handler->proto_connect() that calls the protocol specific connection procedure.

This allows the generic state machine described in this blog post to not really know the protocol specifics and yet all the currently support 26 transfer protocols can be supported.

libcurl under the hood – the video

Here’s the full video of libcurl under the hood.

If you want to skip directly to the state machine diagram and the following explanation, go here.

Credits

Image by doria150 from Pixabay