The other day someone filed a bug on curl that we don’t support redirects with the Refresh header. This took me down a rabbit hole of Refresh header research and I’ve returned to share with you what I learned down there.
tl;dr Refresh is not a standard HTTP header.
As you know, an HTTP redirect is specified to use a 3xx response code and a Location: header to point out the new URL (I use the term URL here but you know what I mean). This has been the case since RFC 1945 (HTTP/1.0). According to an old mail from Roy T Fielding (dated June 1996), Refresh “didn’t make it” into that spec. That was the first “real” HTTP specification. (And the HTTP we used before 1.0 didn’t even have headers!)
The little detail that it never made it into the 1.0 spec or any later one, doesn’t seem to have affected the browsers. Still today, browsers keep supporting the Refresh header as a sort of Location: replacement even though it seems to never have been present in a HTTP spec.
In good company
curl is not the only HTTP library that doesn’t support this non-standard header. The popular python library requests apparently doesn’t according to this bug from 2017, and another bug was filed about it already back in 2011 but it was just closed as “old” in 2014.
I’ve found no support in wget or wget2 either for this header.
I didn’t do any further extensive search for other toolkits’ support, but it seems that the browsers are fairly alone in supporting this header.
How common is the the Refresh header?
I decided to make an attempt to figure out, and for this venture I used the Rapid7 data trove. The method that data is collected with may not be the best – it scans the IPv4 address range and sends a HTTP request to each TCP port 80, setting the IP address in the Host: header. The result of that scan is 52+ million HTTP responses from different and current HTTP origins. (Exactly 52254873 responses in my 59GB data dump, dated end of February 2019).
Results from my scans
- Location is used in 18.49% of the responses
- Refresh is used in 0.01738% of the responses (exactly 9080 responses featured them)
- Location is thus used 1064 times more often than Refresh
- In 35% of the cases when Refresh is used, Location is also used
- curl thus handles 99.9939% of the redirects in this test
Additional notes
- When Refresh is the only redirect header, the response code is usually 200 (with 404 being the second most)
- When both headers are used, the response code is almost always 30x
- When both are used, it is common to redirect to the same target and it is also common for the Refresh header value to only contain a number (for the number of seconds until “refresh”).
Refresh from HTML content
Redirects can also be done by meta tags in HTML and sending the refresh that way, but I have not investigated how common as that isn’t strictly speaking HTTP so it is outside of my research (and interest) here.
In use, not documented, not in the spec
Just another undocumented corner of the web.
When I posted about these findings on the HTTPbis mailing list, it was pointed out that WHATWG mentions this header in their iana page. I say mention because calling that documenting would be a stretch…
It is not at all clear exactly what the header is supposed to do and it is not documented anywhere. It’s not exactly a redirect, but almost?
Will/should curl support it?
A decision hasn’t been made about it yet. With such a very low use frequency and since we’ve managed fine without support for it so long, maybe we can just maintain the situation and instead argue that we should just completely deprecate this header use from the web?
Updates
After this post first went live, I got some further feedback and data that are relevant and interesting.
- Yoav Wiess created a patch for Chrome to count how often they see this header used in real life.
- Eric Lawrence pointed out that IE had several incompatibilities in its Refresh parser back in the day.
- Boris pointed out (in the comments below) the WHATWG documented steps for handling the header.
- The use of <meta> tag refresh in contents is fairly high. The Chrome counter says almost 4% of page loads!