no more global dns cache in curl

In January 2002, we added support for a global DNS cache in libcurl. All transfers set to use it would share and use the same global cache.

We rather quickly realized that having a global cache without locking was error-prone and not really advisable, so already in March 2004 we added comments in the header file suggesting that users should not use this option.

It remained in the code and time passed.

In the autumn of 2018, another fourteen years later, we finally addressed the issue when we announced a plan for this options deprecation. We announced a date for when it would become deprecated and disabled in code (7.62.0), and then six months later if no major incidents or outcries would occur, we said we would delete the code completely.

That time has now arrived. All code supporting a global DNS cache in curl has been removed. Any libcurl-using program that sets this option from now on will simply not get a global cache and instead proceed with the default handle-oriented cache, and the documentation is updated to clearly indicate that this is the case. This change will ship in curl 7.65.0 due to be released in May 2019 (merged in this commit).

If a program still uses this option, the only really noticeable effect should be a slightly worse name resolving performance, assuming the global cache had any point previously.

Programs that want to continue to have a DNS cache shared between multiple handles should use the share interface, which allows shared DNS cache and more – with locking. This API has been offered by libcurl since 2003.


curl says bye bye to pipelining

HTTP/1.1 Pipelining is the protocol feature where the client sends off a second HTTP/1.1 request already before the answer to the previous request has arrived (completely) from the server. It is defined in the original HTTP/1.1 spec and is a way to avoid waiting times. To reduce latency.

HTTP/1.1 Pipelining was badly supported by curl for a long time in the sense that we had a series of known bugs and it was a fragile feature without enough tests. Also, pipelining is fairly tricky to debug due to the timing sensitivity so very often enabling debug outputs or similar completely changes the nature of the behavior and things are not reproducing anymore!

HTTP pipelining was never enabled by default by the large desktop browsers due to all the issues with it, like broken server implementations and the likes. Both Firefox and Chrome dropped pipelining support entirely since a long time back now. curl did in fact over time become more and more lonely in supporting pipelining.

The bad state of HTTP pipelining was a primary driving factor behind HTTP/2 and its multiplexing feature. HTTP/2 multiplexing is truly and really “pipelining done right”. It is way more solid, practical and solves the use case in a better way with better performance and fewer downsides and problems. (curl enables multiplexing by default since 7.62.0.)

In 2019, pipelining should be abandoned and HTTP/2 should be used instead.

Starting with this commit, to be shipped in release 7.65.0, curl no longer has any code that supports HTTP/1.1 pipelining. It has been disabled in the code since 7.62.0 already so applications and users that use a recent version already should not notice any difference.

Pipelining was always offered on a best-effort basis and there was never any guarantee that requests would actually be pipelined, so we can remove this feature entirely without breaking API or ABI promises. Applications that ask libcurl to use pipelining can still do that, it just won’t have any effect.

Workshop Season 4 Finale

The 2019 HTTP Workshop ended today. In total over the years, we have now done 12 workshop days up to now. This day was not a full day and we spent it on only two major topics that both triggered long discussions involving large parts of the room.

Cookies

Mike West kicked off the morning with his cookies are bad presentation.

One out of every thousand cookie header values is 10K or larger in size and even at the 50% percentile, the size is 480 bytes. They’re a disaster on so many levels. The additional features that have been added during the last decade are still mostly unused. Mike suggests that maybe the only way forward is to introduce a replacement that avoids the issues, and over longer remove cookies from the web: HTTP state tokens.

A lot of people in the room had opinions and thoughts on this. I don’t think people in general have a strong love for cookies and the way they currently work, but the how-to-replace-them question still triggered lots of concerns about issues from routing performance on the server side to the changed nature of the mechanisms that won’t encourage web developers to move over. Just adding a new mechanism without seeing the old one actually getting removed might not be a win.

We should possibly “worsen” the cookie experience over time to encourage switch over. To cap allowed sizes, limit use to only over HTTPS, reduce lifetimes etc, but even just that will take effort and require that the primary cookie consumers (browsers) have a strong will to hurt some amount of existing users/sites.

(Related: Mike is also one of the authors of the RFC6265bis draft in progress – a future refreshed cookie spec.)

HTTP/3

Mike Bishop did an excellent presentation of HTTP/3 for HTTP people that possibly haven’t kept up fully with the developments in the QUIC working group. From a plain HTTP view, HTTP/3 is very similar feature-wise to HTTP/2 but of course sent over a completely different transport layer. (The HTTP/3 draft.)

Most of the questions and discussions that followed were rather related to the transport, to QUIC. Its encryption, it being UDP, DOS prevention, it being “CPU hungry” etc. Deploying HTTP/3 might be a challenge for successful client side implementation, but that’s just nothing compared the totally new thing that will be necessary server-side. Web developers should largely not even have to care…

One tidbit that was mentioned is that in current Firefox telemetry, it shows about 0.84% of all requests negotiates TLS 1.3 early data (with about 12.9% using TLS 1.3)

Thought-worthy quote of the day comes from Willy: “everything is a buffer”

Future Workshops

There’s no next workshop planned but there might still very well be another one arranged in the future. The most suitable interval for this series isn’t really determined and there might be reasons to try tweaking the format to maybe change who will attend etc.

The fact that almost half the attendees this time were newcomers was certainly good for the community but that not a single attendee traveled here from Asia was less good.

Thanks

Thanks to the organizers, the program committee who set this up so nicely and the awesome sponsors!

More Amsterdamned Workshop

Yesterday we plowed through a large and varied selection of HTTP topics in the Workshop. Today we continued. At 9:30 we were all in that room again. Day two.

Martin Thomson talked about his “hx” proposal and how to refer to future responses in HTTP APIs. He ended up basically concluding that “This is too complicated, I think I’m going to abandon this” and instead threw in a follow-up proposal he called “Reverse Javascript” that would be a way for a client to pass on a script for the server to execute! The room exploded in questions, objections and “improvements” to this idea. There are also apparently a pile of prior art in similar vein to draw inspiration from.

With the audience warmed up like this, Anne van Kasteren took us back to reality with an old favorite topic in the HTTP Workshop: websockets. Not a lot of love for websockets in the room… but this was the first of several discussions during the day where a desire or quest for bidirectional HTTP streams was made obvious.

Woo Xie did a presentation with help from Alan Frindell about Extending h2 for Bidirectional Messaging and how they propose a HTTP/2 extension that adds a new frame to create a bidirectional stream that lets them do messaging over HTTP/2 fine. The following discussion was slightly positive but also contained alternative suggestions and references to some of the many similar drafts for bidirectional and p2p connections over http2 that have been done in the past.

Lucas Pardue and Nick Jones did a presentation about HTTP/2 Priorities, based a lot of research previously done and reported by Pat Meenan. Lucas took us through the history of how the priorities ended up like this, their current state and numbers and also the chaos and something about a possible future, the h3 way of doing prio and mr Meenan’s proposed HTTP/3 prio.

Nick’s second half of the presentation then took us through Cloudflare’s Edge Driven HTTP/2 Prioritisation work/experiments and he showed how they could really improve how prioritization works in nginx by making sure the data is written to the socket as late as possible. This was backed up by audience references to the TAPS guidelines on the topic and a general recollection that reducing the number connections is still a good idea and should be a goal! Server buffering is hard.

Asbjørn Ulsberg presented his case for a new request header: prefer-push. When used, the server can respond to the request with a series of pushed resources and thus save several round-rips. This triggered sympathy in the room but also suggestions of alternative approaches.

Alan Frindell presented Partial POST Replay. It’s a rather elaborate scheme that makes their loadbalancers detect when a POST to one of their servers can’t be fulfilled and they instead replay that POST to another backend server. While Alan promised to deliver a draft for this, the general discussion was brought up again about POST and its “replayability”.

Willy Tarreau followed up with a very similar topic: Retrying failed POSTs. In this this context RFC 2310 – The Safe Response Header Field was mentioned and that perhaps something like this could be considered for requests? The discussion certainly had similarities and overlaps with the SEARCH/POST discussion of yesterday.

Mike West talked about Fetch Metadata Request Headers which is a set of request headers explaining for servers where and what for what purpose requests are made by browsers. He also took us through a brief explained of Origin Policy, meant to become a central “resource” for a manifest that describes properties of the origin.

Mark Nottingham presented Structured Headers (draft). This is a new way of specifying and parsing HTTP headers that will make the lives of most HTTP implementers easier in the future. (Parts of the presentation was also spent debugging/triaging the most weird symptoms seen when his Keynote installation was acting up!) It also triggered a smaller side discussion on what kind of approaches that could be taken for HPACK and QPACK to improve the compression ratio for headers.

Anne van Kesteren talked Web-compatible header value parsers, standardizing on how to parse headers not covered by structured headers.

Yoav Weiss described the current status of client hints (draft). This is shipped by Chrome already and he wanted more implementers to use it and tell how its working.

Roberto Peon presented an idea for doing “Partialy-Reliable HTTP” and after his talk and a discussion he concluded they will implement it, play around and come back and tell us what they’ve learned.

Mark Nottingham talked about HTTP for CDNs. He has this fancy-looking test suite in progress that checks how things are working and what is being supported and there are two drafts in progress: the cache response header and the proxy status header field.

Willy Tarreau talked about a race problem he ran into with closing HTTP/2 streams and he explained how he worked around it with a trailing ping frame and suggested that maybe more users might suffer from this problem.

The oxygen level in the room was certainly not on an optimal level at this point but that didn’t stop us. We knew we had a few more topics to get through and we all wanted to get to the boat ride of the evening on time. So…

Hooman Beheshti polled the room to get a feel for what people think about Early hints. Are people still on board? Turns out it is mostly appreciated but not supported by any browser and a discussion and explainer session followed as to why this is and what general problems there are in supporting 1xx headers in browsers. It is striking that most of us HTTP people in the room don’t know how browsers work! Here I could mention that Cory said something about the craziness of this, but I forget his exact words and I blame the fact that they were expressed to me on a boat. Or perhaps that the time is already approaching 1am the night after this fully packed day.

Good follow-up reads from that discussion is Yoav’s blog post A Tale of Four Caches and Jake Archibalds’s HTTP/2 Push is tougher than I thought.

As the final conversation of the day, Anne van Kesteren talked about Response Sources and the different ways a browser can do requests and get responses.

Boat!

HAproxy had the excellent taste of sponsoring this awesome boat ride on the Amsterdam canals for us at the end of the day

Boating on the Amsterdam canals, sponsored by HAproxy!

Thanks again to Cory Benfield for feeding me his notes of the day to help me keep things straight. All mistakes are mine. But if you tell me about them, I will try to correct the text!

The HTTP Workshop 2019 begins

The forth season of my favorite HTTP series is back! The HTTP Workshop skipped over last year but is back now with a three day event organized by the very best: Mark, Martin, Julian and Roy. This time we’re in Amsterdam, the Netherlands.

35 persons from all over the world walked in the room and sat down around the O-shaped table setup. Lots of known faces and representatives from a large variety of HTTP implementations, client-side or server-side – but happily enough also a few new friends that attend their first HTTP Workshop here. The companies with the most employees present in the room include Apple, Facebook, Mozilla, Fastly, Cloudflare and Google – having three or four each in the room.

Patrick Mcmanus started off the morning with his presentation on HTTP conventional wisdoms trying to identify what have turned out as successes or not in HTTP land in recent times. It triggered a few discussions on the specific points and how to judge them. I believe the general consensus ended up mostly agreeing with the slides. The topic of unshipping HTTP/0.9 support came up but is said to not be possible due to its existing use. As a bonus, Anne van Kesteren posted a new bug on Firefox to remove it.

Mark Nottingham continued and did a brief presentation about the recent discussions in HTTPbis sessions during the IETF meetings in Prague last week.

Martin Thomson did a presentation about HTTP authority. Basically how a client decides where and who to ask for a resource identified by a URI. This triggered an intense discussion that involved a lot of UI and UX but also trust, certificates and subjectAltNames, DNS and various secure DNS efforts, connection coalescing, DNSSEC, DANE, ORIGIN frame, alternative certificates and more.

Mike West explained for the room about the concept for Signed Exchanges that Chrome now supports. A way for server A to host contents for server B and yet have the client able to verify that it is fine.

Tommy Pauly then talked to his slides with the title of Website Fingerprinting. He covered different areas of a browser’s activities that are current possible to monitor and use for fingerprinting and what counter-measures that exist to work against furthering that development. By looking at the full activity, including TCP flows and IP addresses even lots of our encrypted connections still allow for pretty accurate and extensive “Page Load Fingerprinting”. We need to be aware and the discussion went on discussing what can or should be done to help out.

The meeting is going on somewhere behind that red door.

Lucas Pardue discussed and showed how we can do TLS interception with Wireshark (since the release of version 3) of Firefox, Chrome or curl and in the end make sure that the resulting PCAP file can get the necessary key bundled in the same file. This is really convenient when you want to send that PCAP over to your protocol debugging friends.

Roberto Peon presented his new idea for “Generic overlay networks”, a suggested way for clients to get resources from one out of several alternatives. A neighboring idea to Signed Exchanges, but still different. There was an interested to further and deepen this discussion and Roberto ended up saying he’d at write up a draft for it.

Max Hils talked about Intercepting QUIC and how the ability to do this kind of thing is very useful in many situations. During development, for debugging and for checking what potentially bad stuff applications are actually doing on your own devices. Intercepting QUIC and HTTP/3 can thus also be valuable but at least for now presents some challenges. (Max also happened to mention that the project he works on, mitmproxy, has more stars on github than curl, but I’ll just let it slide…)

Poul-Henning Kamp showed us vtest – a tool and framework for testing HTTP implementations that both Varnish and HAproxy are now using. Massaged the right way, this could develop into a generic HTTP test/conformance tool that could be valuable for and appreciated by even more users going forward.

Asbjørn Ulsberg showed us several current frameworks that are doing GET, POST or SEARCH with request bodies and discussed how this works with caching and proposed that SEARCH should be defined as cacheable. The room mostly acknowledged the problem – that has been discussed before and that probably the time is ripe to finally do something about it. Lots of users are already doing similar things and cached POST contents is in use, just not defined generically. SEARCH is a already registered method but could get polished to work for this. It was also suggested that possibly POST could be modified to also allow for caching in an opt-in way and Mark volunteered to author a first draft elaborating how it could work.

Indonesian and Tibetan food for dinner rounded off a fully packed day.

Thanks Cory Benfield for sharing your notes from the day, helping me get the details straight!

Diversity

We’re a very homogeneous group of humans. Most of us are old white men, basically all clones and practically indistinguishable from each other. This is not diverse enough!

A big thank you to the HTTP Workshop 2019 sponsors!


curl up 2019 is over

(I will update this blog post with more links to videos and PDFs to presentations as they get published, so come back later in case your favorite isn’t linked already.)

The third curl developers conference, curl up 2019, is how history. We gathered in the lovely Charles University in central Prague where we sat down in an excellent class room. After the HTTP symposium on the Friday, we spent the weekend to dive in deeper in protocols and curl details.

I started off the Saturday by The state of the curl project (youtube). An overview of how we’re doing right now in terms of stats, graphs and numbers from different aspects and then something about what we’ve done the last year and a quick look at what’s not do good and what we could work on going forward.

James Fuller took the next session and his Newbie guide to contributing to libcurl presentation. Things to consider and general best practices to that could make your first steps into the project more likely to be pleasant!

Long term curl hacker Dan Fandrich (also known as “Daniel two” out of the three Daniels we have among our top committers) followed up with Writing an effective curl test where the detailed what different tests we have in curl, what they’re for and a little about how to write such tests.

Sign seen at the curl up dinner reception Friday night

After that I was back behind the desk in the classroom that we used for this event and I talked The Deprecation of legacy crap (Youtube). How and why we are removing things, some things we are removing and will soon remove and finally a little explainer on our new concept and handling of “experimental” features.

Igor Chubin then explained his new protect for us: curlator: a framework for console services (Youtube). It’s a way and tooling that makes it easier to provide access to shell and console oriented services over the web, using curl.

Me again. Governance, money in the curl project and someone offering commercial support (Youtube) was a presentation about how we intend for the project to join a legal entity SFC, and a little about money we have, what to spend it on and how I feel it is good to keep the project separate from any commercial support ventures any of us might do!

While the list above might seems like more than enough, the day wasn’t over. Christian Schmitz also did his presentation on Using SSL root certificate from Mac/Windows.

Our local hero organizer James Fuller then spoiled us completely when we got around to have dinner at a monastery with beer brewing monks and excellent food. Good food, good company and curl related dinner subjects. That’s almost heaven defined!

Sunday

Daylight saving time morning and you could tell. I’m sure it was not at all related to the beers from the night before…

James Fuller fired off the day by talking to us about Curlpipe (github), a DSL for building http execution pipelines.

The class room we used for the curl up presentations and discussions during Saturday and Sunday.

Robin Marx then put in the next gear and entertained us another hour with a protocol deep dive titled HTTP/3 (QUIC): the details (slides). For me personally this was a exactly what I needed as Robin clearly has kept up with more details and specifics in the QUIC and HTTP/3 protocols specifications than I’ve managed and his talk help the rest of the room get at least little bit more in sync with current development.

Jakub Nesetril and Lukáš Linhart from Apiary then talked us through what they’re doing and thinking around web based APIs and how they and their customers use curl: Real World curl usage at Apiary.

Then I was up again and I got to explain to my fellow curl hackers about HTTP/3 in curl. Internal architecture, 3rd party libs and APIs.

Jakub Klímek explained to us in very clear terms about current and existing problems in his talk IRIs and IDNs: Problems of non-ASCII countries. Some of the problems involve curl and while most of them have their clear explanations, I think we have to lessons to learn from this: URLs are still as messy and undocumented as ever before and that we might have some issues to fix in this area in curl.

To bring my fellow up to speed on the details of the new API introduced the last year I then made a presentation called The new URL API.

Clearly overdoing it for a single weekend, I then got the honors of doing the last presentation of curl up 2019 and for an audience that were about to die from exhaustion I talked Internals. A walk-through of the architecture and what libcurl does when doing a transfer.

Summary

I ended up doing seven presentations during this single weekend. Not all of them stellar or delivered with elegance but I hope they were still valuable to some. I did not steal someone else’s time slot as I would gladly have given up time if we had other speakers wanted to say something. Let’s aim for more non-Daniel talkers next time!

A weekend like this is such a boost for inspiration, for morale and for my ego. All the friendly faces with the encouraging and appreciating comments will keep me going for a long time after this.

Thank you to our awesome and lovely event sponsors – shown in the curl up logo below! Without you, this sort of happening would not happen.

curl up 2020

I will of course want to see another curl up next year. There are no plans yet and we don’t know where to host. I think it is valuable to move it around but I think it is even more valuable that we have a friend on the ground in that particular city to help us out. Once this year’s event has sunken in properly and a month or two has passed, the case for and organization of next year’s conference will commence. Stay tuned, and if you want to help hosting us do let me know!


The future of HTTP Symposium

This year’s version of curl up started a little differently: With an afternoon of HTTP presentations. The event took place the same week the IETF meeting has just ended here in Prague so we got the opportunity to invite people who possibly otherwise wouldn’t have been here… Of course this was only possible thanks to our awesome sponsors, visible in the image above!

Lukáš Linhart from Apiary started out with “Web APIs: The Past, The Present and The Future”. A journey trough XML-RPC, SOAP and more. One final conclusion might be that we’re not quite done yet…

James Fuller from MarkLogic talked about “The Defenestration of Hypermedia in HTTP”. How HTTP web technologies have changed over time while the HTTP paradigms have survived since a very long time.

I talked about DNS-over-HTTPS. A presentation similar to the one I did before at FOSDEM, but in a shorter time so I had to talk a little faster!

Mike Bishop from Akamai (editor of the HTTP/3 spec and a long time participant in the HTTPbis work) talked about “The evolution of HTTP (from HTTP/1 to HTTP/3)” from HTTP/0.9 to HTTP/3 and beyond.

Robin Marx then rounded off the series of presentations with his tongue in cheek “HTTP/3 (QUIC): too big to fail?!” where we provided a long list of challenges for QUIC and HTTP/3 to get deployed and become successful.

We ended this afternoon session with a casual Q&A session with all the presenters discussing various aspects of HTTP, the web, REST, APIs and the benefits and deployment challenges of QUIC.

I think most of us learned things this afternoon and we could leave the very elegant Charles University room enriched and with more food for thoughts about these technologies.

We ended the evening with snacks and drinks kindly provided by Apiary.

(This event was not streamed and not recorded on video, you had to be there in person to enjoy it.)


curl goes 180

The 180th public curl release is a patch release: 7.64.1. There’s been 49 days since 7.64.0 shipped. The first release since our 21st birthday last week. (Full changelog.)

Numbers

the 180th release
2 changes
49 days (total: 7,677)

116 bug fixes (total: 5,029)
184 commits (total: 24,111)
0 new public libcurl functions (total: 80)
2 new curl_easy_setopt() options (total: 267)

1 new curl command line option (total: 221)
49 contributors, 25 new (total: 1,929)
25 authors, 10 new (total: 669)
0 security fixes (total: 87)

News!

This is a patch release but we still managed to introduce some fun news in this version. We ship brand new alt-svc support which we encourage keen and curious users to enable in their builds and test out. We strongly discourage anyone from using that feature in production as we reserve ourselves the right to change it before removing the EXPERIMENTAL label. As mentioned in the blog post linked above, alt-svc is the official way to bootstrap into HTTP/3 so this is a fundamental stepping stone for supporting that protocol version in a future curl.

We also introduced brand new support for the Amiga-specific TLS backend AmiSSL, which is a port of OpenSSL to that platform.

Bug-fixes

With over a hundred bug-fixes landed in this period there are a lot to choose from, but some of the most most fun and important ones from my point of view include the following.

connection check crash

This was a rather bad regression that occasionally caused crashes when libcurl would scan its connection cache for a live connection to reuse. Most likely to trigger with the Schannel backend.

connection sharing crash

The example source code that uses a shared connection cache among many threads was another crash regression. It turned out a thread could accidentally get hold of a connection already in private use by another thread…

“Expire in…” logs removed

Having the harmless but annoying text there was a mistake to begin with. It was a debug-only line that accidentally was pushed and not discovered in time. It’s history now.

curl -M manual removed

The tutorial-like manual piece that was previously included in the -M (or –manual) built-in command documentation, is no longer included. The output shown is now just the curl.1 man page. The reason for this is that the tutorial has gone a bit stale and there is now better updated and better explained documentation elsewhere. Primarily perhaps in everything curl. The online version of that document will eventually also be removed.

TLS terminology cleanups

We now refer to the Windows TLS backend as “Schannel” and the Apple macOS one as “Secure Transport” in all curl code and documentation. Those are the official names and those are the names people in general know them as. No more use of the former names that sometimes made people confused.

Shaving off bytes and mallocs

We rearranged the layout of a few structs and changed to using bitfields instead of booleans and more. This way, we managed to shrink two of the primary internal structs by 5% and 11% with no functionality change or loss.

Similarly, we removed a few mallocs, even in the common code path, so now the number of allocs for my regular test download of 4GB data over a localhost HTTP server claims fewer allocs than ever before.

Next?

We estimate that there will be a 7.65.0 release to ship 56 days from now. Then we will remove some deprecated features, perhaps add something new and quite surely fix a whole bunch of more bugs. Who know what fun we will come up with at curl up this coming weekend?

Keep reporting. Keep posting pull-requests. We love them and you!

Brand new sticker shipment for curl up from our beloved sticker sponsor!


Happy 21st, curl!

Another year has passed. The curl project is now 21 years old.

I think we can now say that it is a grown-up in most aspects. What have we accomplished in the project in these 21 years?

We’ve done 179 releases. Number 180 is just a week away.

We estimate that there are now roughly 6 billion curl installations world-wide. In phones, computers, TVs, cars, video games etc. With 4 billion internet users, that’s like 1.5 curl installation per Internet connected human on earth

669 persons have authored patches that was merged.

The curl source code now consists of 160,000 lines of code made in over 24,000 commits.

1,927 persons have helped out so far. With code, bug reports, advice, help and more.

The curl repository also hosts 429 man pages with a total of 36,900 lines of documentation. That count doesn’t even include the separate project Everything curl which is a dedicated book on curl with an additional 10,165 lines.

In this time we have logged more than 4,900 bug-fixes, out of which 87 were security related problems.

We keep doing more and more CI builds, auto-builds, fuzzing and static code analyzing on our code day-to-day and non-stop. Each commit is now built and tested in over 50 different builds and environments and are checked by at least four different static code analyzers, spending upwards 20-25 CPU hours per commit.

We have had 2 curl developer conferences, with the third curl up about to happen this coming weekend in Prague, Czech Republic.

The curl project was created by me and I’m still the lead developer. Up until today, almost 60% of the commits in the project have my name on them. I have done most commits per month in the project every single month since August 2015, and in 186 months out of the 232 months for which we have logged data.

Looking for the Refresh header

The other day someone filed a bug on curl that we don’t support redirects with the Refresh header. This took me down a rabbit hole of Refresh header research and I’ve returned to share with you what I learned down there.

tl;dr Refresh is not a standard HTTP header.

As you know, an HTTP redirect is specified to use a 3xx response code and a Location: header to point out the new URL (I use the term URL here but you know what I mean). This has been the case since RFC 1945 (HTTP/1.0). According to an old mail from Roy T Fielding (dated June 1996), Refresh “didn’t make it” into that spec. That was the first “real” HTTP specification. (And the HTTP we used before 1.0 didn’t even have headers!)

The little detail that it never made it into the 1.0 spec or any later one, doesn’t seem to have affected the browsers. Still today, browsers keep supporting the Refresh header as a sort of Location: replacement even though it seems to never have been present in a HTTP spec.

In good company

curl is not the only HTTP library that doesn’t support this non-standard header. The popular python library requests apparently doesn’t according to this bug from 2017, and another bug was filed about it already back in 2011 but it was just closed as “old” in 2014.

I’ve found no support in wget or wget2 either for this header.

I didn’t do any further extensive search for other toolkits’ support, but it seems that the browsers are fairly alone in supporting this header.

How common is the the Refresh header?

I decided to make an attempt to figure out, and for this venture I used the Rapid7 data trove. The method that data is collected with may not be the best – it scans the IPv4 address range and sends a HTTP request to each TCP port 80, setting the IP address in the Host: header. The result of that scan is 52+ million HTTP responses from different and current HTTP origins. (Exactly 52254873 responses in my 59GB data dump, dated end of February 2019).

Results from my scans

  • Location is used in 18.49% of the responses
  • Refresh is used in 0.01738% of the responses (exactly 9080 responses featured them)
  • Location is thus used 1064 times more often than Refresh
  • In 35% of the cases when Refresh is used, Location is also used
  • curl thus handles 99.9939% of the redirects in this test

Additional notes

  • When Refresh is the only redirect header, the response code is usually 200 (with 404 being the second most)
  • When both headers are used, the response code is almost always 30x
  • When both are used, it is common to redirect to the same target and it is also common for the Refresh header value to only contain a number (for the number of seconds until “refresh”).

Refresh from HTML content

Redirects can also be done by meta tags in HTML and sending the refresh that way, but I have not investigated how common as that isn’t strictly speaking HTTP so it is outside of my research (and interest) here.

In use, not documented, not in the spec

Just another undocumented corner of the web.

When I posted about these findings on the HTTPbis mailing list, it was pointed out that WHATWG mentions this header in their iana page. I say mention because calling that documenting would be a stretch…

It is not at all clear exactly what the header is supposed to do and it is not documented anywhere. It’s not exactly a redirect, but almost?

Will/should curl support it?

A decision hasn’t been made about it yet. With such a very low use frequency and since we’ve managed fine without support for it so long, maybe we can just maintain the situation and instead argue that we should just completely deprecate this header use from the web?

Updates

After this post first went live, I got some further feedback and data that are relevant and interesting.

  • Yoav Wiess created a patch for Chrome to count how often they see this header used in real life.
  • Eric Lawrence pointed out that IE had several incompatibilities in its Refresh parser back in the day.
  • Boris pointed out (in the comments below) the WHATWG documented steps for handling the header.
  • The use of <meta> tag refresh in contents is fairly high. The Chrome counter says almost 4% of page loads!

curl, open source and networking