{"id":16613,"date":"2021-05-10T09:13:47","date_gmt":"2021-05-10T07:13:47","guid":{"rendered":"https:\/\/daniel.haxx.se\/blog\/?p=16613"},"modified":"2021-05-10T09:13:47","modified_gmt":"2021-05-10T07:13:47","slug":"the-libcurl-transfer-state-machine","status":"publish","type":"post","link":"https:\/\/daniel.haxx.se\/blog\/2021\/05\/10\/the-libcurl-transfer-state-machine\/","title":{"rendered":"The libcurl transfer state machine"},"content":{"rendered":"\n<p>I&#8217;ve worked hard on making the presentation I ended up calling <em>libcurl under the hood<\/em>. A part of that presentation is spent on explaining the main libcurl transfer state machine and here I&#8217;ll try to document some of what, in a written form. Understanding the main transfer state machine in libcurl could be valuable and interesting for anyone who wants to work on libcurl internals and maybe improve it.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Background<\/h2>\n\n\n\n<p>The state is kept in easy handle in the struct field called <em>mstate<\/em>. The source file for this state machine is called <a href=\"https:\/\/github.com\/curl\/curl\/blob\/master\/lib\/multi.c\">multi.c<\/a>.<\/p>\n\n\n\n<p>An easy handle is always in exactly one of these states for as long as it exists.<\/p>\n\n\n\n<p>This transfer state machine is designed to work for all protocols libcurl supports, but basically no protocol will transition through all states. As you can see in the drawing, there are many different possible transitions from a lot of the states.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">libcurl transfer state machine<\/h2>\n\n\n\n<p>(click the image for a larger version)<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><a href=\"https:\/\/daniel.haxx.se\/blog\/wp-content\/uploads\/2021\/05\/slide-transfer-state-machine.jpg\"><img loading=\"lazy\" decoding=\"async\" width=\"2000\" height=\"1125\" src=\"https:\/\/daniel.haxx.se\/blog\/wp-content\/uploads\/2021\/05\/slide-transfer-state-machine.jpg\" alt=\"\" class=\"wp-image-16616\"\/><\/a><\/figure><\/div>\n\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Start<\/h3>\n\n\n\n<p>A transfer starts up there above the surface in the <strong>INIT<\/strong> state. That&#8217;s a yellow box next to the little start button. Basically the boat shows how it goes from <strong>INIT<\/strong> to the right over to <strong>MSGSENT<\/strong> with it&#8217;s finish flag, but the real path is all done under the surface.<\/p>\n\n\n\n<p>The yellow boxes (states) are the ones that exist before or when a connection is setup. The striped background is for all states that has a single and specific <code>connectdata<\/code> struct associated with the transfer.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">CONNECT<\/h3>\n\n\n\n<p>If there&#8217;s a connection limit, either in total or per host etc, the transfer can get sent to the <strong>PENDING<\/strong> state to wait for conditions to change. If not, the state probably moves on to one of the blue ones to resolve host name and connect to the server etc. If a connection could be reused, it can shortcut immediately over to the green <strong>DO<\/strong> state.<\/p>\n\n\n\n<p>The green states are all about setting up the connection to a state of fully connected, authenticated and logged in. Ready to send the first request.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">DO<\/h3>\n\n\n\n<p>The green <strong>DO<\/strong> states are all about <em>sending<\/em> the request with one or more commands so that the file transfer can begin. There are several such states to properly support all protocols but also for historical reasons. We could probably remove a state there by some clever reorgs if we wanted.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">PERFORMING<\/h3>\n\n\n\n<p>When a request has been issued and the transfer starts, it transitions over to <strong>PERFORMING<\/strong>. In the white states data is flowing. Potentially a lot. Potentially in both or either direction. If during the transfer curl finds out that the transfer is faster than allowed, it will move into <strong>RATELIMITING<\/strong> until it has cooled down a bit.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">DONE<\/h3>\n\n\n\n<p>All the post-transfer states are red in the picture. The <strong>DONE<\/strong> is the first of them and  after having done what it needs to round up the transfer, it disassociates with the connection and moves to <strong>COMPLETED<\/strong>. There&#8217;s no stripes behind that state. Disassociate here means that the connection is returned back to the connection pool for later  reuse, or in the worst case if deemed that it can&#8217;t be reused or if the application has instructed it so, closed.<\/p>\n\n\n\n<p>As you&#8217;ll note, there&#8217;s no disconnect anywhere in the state machine. This is simply because the disconnect is not really a part of the transfer at all.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">COMPLETED<\/h3>\n\n\n\n<p>This is the end of the road. In this state a message will be created and put in the outgoing queue for the application to read, and then as a final last step it moves over to <strong>MSGSENT<\/strong> where nothing more happens.<\/p>\n\n\n\n<p>A typical handle remains in this state until the transfer is reused and restarted, in which it will be set back to the <strong>INIT<\/strong> state again and the journey begins again. Possibly with other transfer parameters and URL this time. Or perhaps not.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">State machines within each state<\/h2>\n\n\n\n<p>What this state diagram and explanation doesn&#8217;t show is of course that in each of these states, there can be protocol specific handling and each of those functions might in themselves of course have their own state machines to control what to do and how to handle the protocol details.<\/p>\n\n\n\n<p>Each protocol in libcurl has its own &#8220;protocol handler&#8221; and most of the protocol specific stuff in libcurl is then done by calls from the generic parts to the protocol specific parts with calls like <code><strong>protocol_handler->proto_connect()<\/strong><\/code> that calls the protocol specific connection procedure.<\/p>\n\n\n\n<p>This allows the generic state machine described in this blog post to not really know the protocol specifics and yet all the currently support 26 transfer protocols can be supported.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">libcurl under the hood &#8211; the video<\/h2>\n\n\n\n<p>Here&#8217;s the full video of libcurl under the hood.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"libcurl under the hood - Daniel Stenberg\" width=\"474\" height=\"267\" src=\"https:\/\/www.youtube.com\/embed\/T7Pv5lQ1dAc?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<p>If you want to skip directly to the state machine diagram and the following explanation, <a href=\"https:\/\/youtu.be\/T7Pv5lQ1dAc?t=1522\">go here<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Credits<\/h2>\n\n\n\n<p>Image by <a href=\"https:\/\/pixabay.com\/users\/doria150-7337031\/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=3011368\">doria150<\/a> from <a href=\"https:\/\/pixabay.com\/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=3011368\">Pixabay<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve worked hard on making the presentation I ended up calling libcurl under the hood. A part of that presentation is spent on explaining the main libcurl transfer state machine and here I&#8217;ll try to document some of what, in a written form. Understanding the main transfer state machine in libcurl could be valuable and &hellip; <a href=\"https:\/\/daniel.haxx.se\/blog\/2021\/05\/10\/the-libcurl-transfer-state-machine\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">The libcurl transfer state machine<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":5,"featured_media":16629,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[33,455],"class_list":["post-16613","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-curl","tag-curl-and-libcurl","tag-documentation"],"_links":{"self":[{"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/posts\/16613","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/comments?post=16613"}],"version-history":[{"count":15,"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/posts\/16613\/revisions"}],"predecessor-version":[{"id":16634,"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/posts\/16613\/revisions\/16634"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/media\/16629"}],"wp:attachment":[{"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/media?parent=16613"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/categories?post=16613"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/daniel.haxx.se\/blog\/wp-json\/wp\/v2\/tags?post=16613"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}