chart: which host, which protocol

A flow chart describing some steps and decisions done within curl when a HTTP URL is provided. For hostnames, protocol and port numbers.

This flow chart ignores proxies, authentication considerations and use of unix domain sockets to keep things simpler.

URL

An initial step is of course to extract the hostname part from the URL. The hostname in a URL can be provided as a plain IP address or as a name. If a numerical IPv4 or IPv6 address are not provided in the URL, curl checks if the hostname is provided using IDN (International Domain Names) and if so, it converts the name into punycode that it then can continue with.

Existing connection

Given the protocol, the hostname and port number curl checks if it has an existing connection alive suitable for use. Reusing an existing connection is preferred as it is the fastest way to start the new transfer. Connection reuse is done based on the provided name and not the IP address so that curl can skip figuring that out if there already is a connection available.

–connect-to

When trying to connect to a host, curl first checks if there are any tricks selected, like this option that makes curl actually resolve hostname B even when asked to connect to host A.

alt-svc

curl might have a populated alt-svc cache from previous transfers. It is basically a mapping for specific HTTP versions and hostnames over to another HTTP version and hostname for a certain amount of time. This can change hostname A into hostname B.

–resolve

This is an option that populates the DNS cache with one or more user provided IP addresses for a given hostname.

DNS cache

Before curl resolves a hostname into a set of IP addresses, it checks if it already has the information in its DNS cache, as that is usually much faster than having to ask for that data again. Entries are typically only kept in this cache for a minute until evicted.

Resolving

When curl resolves a hostname, it wants the A, AAAA and HTTPS DNS records data. A and AAAA provides a list of IP addresses to try to connect to, and the HTTPS field provides HTTP version information, port number, ECH config and possibly more.

HSTS

curl might also have an HSTS cache, which is another map for when plain HTTP accesses should rather be internally upgraded to instead use HTTPS. This changes protocol to use and default port number.

Racing

Depending on what IP versions and HTTP versions the above steps have determined curl should try to use, curl starts a connection race with potentially quite a few parallel connection attempts, each started a little delayed after the previous.

  1. QUIC connect attempt over IPv6 starts first
  2. QUIC connect attempt over IPv4 runs as number two
  3. TCP connect attempt over IPv6 is third in line
  4. TCP connect attempt over IPv4 is the fourth

Of course, if any of them can’t be done or fails, they are immediately skipped and the next one in line starts. Each of them also possibly start a new one if the previous one has not connected with a certain time.

The first contender to successfully connect to the host wins and the other attempts are quickly discarded.

TLS handshake

If the protocol is HTTPS (which it always is if HTTP/3 is selected), the TLS handshake is performed after the TCP connection is established. For HTTP/3, the TLS handshake is integrated into the QUIC connection setup.

The TLS handshake can make curl reuse an existing session, decide ALPN, use ECH and send early data.

The session id/ticket handling is also a cache curl holds that allows for faster reconnects to hosts it has connected to before.

Connection

Once curl has an established connection to use, it starts with sending off the HTTP request, which begins the transfer.

The chart

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.

This site uses Akismet to reduce spam. Learn how your comment data is processed.