curl speaks etag

The ETag HTTP response header is an identifier for a specific version of a resource. It lets caches be more efficient and save bandwidth, as a web server does not need to resend a full response if the content has not changed. Additionally, etags help prevent simultaneous updates of a resource from overwriting each other (“mid-air collisions”).

That’s a quote from the mozilla ETag documentation. The header is defined in RFC 7232.

In short, a server can include this header when it responds with a resource, and in subsequent requests when a client wants to get an updated version of that document it sends back the same ETag and says “please give me a new version if it doesn’t match this ETag anymore”. The server will then respond with a 304 if there’s nothing new to return.

It is a better way than modification time stamp to identify a specific resource version on the server.

ETag options

Starting in version 7.68.0 (due to ship on January 8th, 2020), curl will feature two new command line options that makes it easier for users to take advantage of these features. You can of course try it out already now if you build from git or get a daily snapshot.

The ETag options are perfect for situations like when you run a curl command in a cron job to update a file if it has indeed changed on the server.

--etag-save <filename>

Issue the curl command and if there’s an ETag coming back in the response, it gets saved in the given file.

--etag-compare <filename>

Load a previously stored ETag from the given file and include that in the outgoing request (the file should only consist of the specific ETag “value” and nothing else). If the server deems that the resource hasn’t changed, this will result in a 304 response code. If it has changed, the new content will be returned.

Update the file if newer than previously stored ETag:

curl --etag-compare etag.txt --etag-save etag.txt https://example.com -o saved-file

If-Modified-Since options

The other method to accomplish something similar is the -z (or --time-cond) option. It has been supported by curl since the early days. Using this, you give curl a specific time that will be used in a conditional request. “only respond with content if the document is newer than this”:

curl -z "5 dec 2019" https:/example.com

You can also do the inversion of the condition and ask for the file to get delivered only if it is older than a certain time (by prefixing the date with a dash):

curl -z "-1 jan 2003" https://example.com

Perhaps this is most conveniently used when you let curl get the time from a file. Typically you’d use the same file that you’ve saved from a previous invocation and now you want to update if the file is newer on the remote site:

curl -z saved-file -o saved-file https://example.com

Tool, not libcurl

It could be noted that these new features are built entirely in the curl tool by using libcurl correctly with the already provided API, so this change is done entirely outside of the library.

Credits

The idea for the ETag goodness came from Paul Hoffman. The implementation was brought by Maros Priputen – as his first contribution to curl! Thanks!