Transfers vs connections spring cleanup

Warning: this post is full of libcurl internal architectural details and not much else.

Within libcurl there are two primary objects being handled; transfers and connections. The transfers objects are struct Curl_easy and the connection counterparts are struct connectdata.

This is a separation and architecture as old as libcurl, even if the internal struct names have changed a little through the years. A transfer is associated with none or one connection object and there’s a pool with potentially several previously used, live, connections stored for possible future reuse.

A simplified schematic picture could look something like this:

One transfer, one connection and three idle connections in the pool.

Transfers to connections

These objects are protocol agnostic so they work like this no matter which scheme was used for the URL you’re transferring with curl.

Before the introduction of HTTP/2 into curl, which landed for the first time in September 2013 there was also a fixed relationship that one transfer always used (none or) one connection and that connection then also was used by a single transfer. libcurl then stored the association in the objects both ways. The transfer object got a pointer to the current connection and the connection object got a pointer to the current transfer.

Multiplexing shook things up

Lots of code in libcurl passed around the connection pointer (conn) because well, it was convenient. We could find the transfer object from that (conn->data) just fine.

When multiplexing arrived with HTTP/2, we then could start doing multiple transfers that share a single connection. Since we passed around the conn pointer as input to so many functions internally, we had to make sure we updated the conn->data pointer in lots of places to make sure it pointed to the current driving transfer.

This was always awkward and the source for agony and bugs over the years. At least twice I started to work on cleaning this up from my end but it quickly become a really large work that was hard to do in a single big blow and I abandoned the work. Both times.

Third time’s the charm

This architectural “wart” kept bugging me and on January 8, 2021 I emailed the curl-library list to start a more organized effort to clean this up:

Conclusion: we should stop using ‘conn->data’ in libcurl

Status: there are 939 current uses of this pointer

Mission: reduce the use of this pointer, aiming to reach a point in the future when we can remove it from the connection struct.

Little by little

With the help of fellow curl maintainer Patrick Monnerat I started to submit pull requests that would remove the use of this pointer.

Little by little we changed functions and logic to rather be anchored on the transfer rather than the connection (as data->conn is still fine as that can only ever be NULL or a single connection). I made a wiki page to keep an updated count of the number of references. After the first ten pull requests we were down to just over a hundred from the initial 919 – yeah the mail quote says 939 but it turned out the grep pattern was slightly wrong!

We decided to hold off a bit when we got closer to the 7.75.0 release so that we wouldn’t risk doing something close to the ship date that would jeopardize it. Once the release had been pushed out the door we could continue the journey.

Gone!

As of today, February 16 2021, the internal pointer formerly known as conn->data doesn’t exist anymore in libcurl and therefore it can’t be used and this refactor is completed. It took at least 20 separate commits to get the job done.

I hope this new order will help us do less mistakes as we don’t have to update this pointer anymore.

I’m very happy we could do this revamp without it affecting the API or ABI in any way. These are all just internal artifacts that are not visible to the outside.

One of a thousand little things

This is just a tiny detail but the internals of a project like curl consists of a thousand little details and this is one way we make sure the code remains in a good shape. We identify improvements and we perform them. One by one. We never stop and we’re never done. Together we take this project into the future and help the world do Internet transfers properly.