Tag Archives: HTTP

curl ootw: –range

--range or -r for short. As the name implies, this option is for doing “range requests”. This flag was available already in the first curl release ever: version 4.0. This option requires an extra argument specifying the specific requested range. Read on the learn how!

What exactly is a range request?

Get a part of the remote resource

Maybe you have downloaded only a part of a remote file, or maybe you’re only interested in getting a fraction of a huge remote resource. Those are two situations in which you might want your internet transfer client to ask the server to only transfer parts of the remote resource back to you.

Let’s say you’ve tried to download a 32GB file (let’s call it a huge file) from a slow server on the other side of the world and when you only had 794 bytes left to transfer, the connection broke and the transfer was aborted. The transfer took a very long time and you prefer not to just restart it from the beginning and yet, with many file formats those final 794 bytes are critical and the content cannot be handled without them.

We need those final 794 bytes! Enter range requests.

With range requests, you can tell curl exactly what byte range to ask for from the server. “Give us bytes 12345-12567” or “give us the last 794 bytes”. Like this:

curl --range 12345-12567 https://example.com/

and:

curl --range -794 https://example.com/

This works with curl with several different protocols: HTTP(S), FTP(S) and SFTP. With HTTP(S), you can even be more fancy and ask for multiple ranges in the same request. Maybe you want the three sections of the resource?

curl --range 0-1000,2000-3000,4000-5000 https://example.com/

Let me again emphasize that this multi-range feature only exists for HTTP(S) with curl and not with the other protocols, and the reason is quite simply that HTTP provides this by itself and we haven’t felt motivated enough to implement it for the other protocols.

Not always that easy

The description above is for when everything is fine and easy. But as you know, life is rarely that easy and straight forward as we want it to be and nether is the --range option. Primarily because of this very important detail:

Range support in HTTP is optional.

It means that when curl asks for a particular byte range to be returned, the server might not obey or care and instead it delivers the whole thing anyway. As a client we can detect this refusal, since a range response has a special HTTP response code (206) which won’t be used if the entire thing is sent back – but that’s often of little use if you really want to get the remaining bytes of a larger resource out of which you already have most downloaded since before.

One reason it is optional for HTTP and why many sites and pages in the wild refuse range requests is that those sites and pages generate contend on demand, dynamically. If we ask for a byte range from a static file on disk in the server offering a byte range is easy. But if the document is instead the result of lots of scripts and dynamic content being generated uniquely in the server-side at the time of each request, it isn’t.

HTTP 416 Range Not Satisfiable

If you ask for a range that is outside of what the server can provide, it will respond with a 416 response code. Let’s say for example you download a complete 200 byte resource and then you ask that server for the range 200-202 – you’ll get a 416 back because 200 bytes are index 0-199 so there’s nothing available at byte index 200 and beyond.

HTTP other ranges

--range for HTTP content implies “byte ranges”. There’s this theoretical support for other units of ranges in HTTP but that’s not supported by curl and in fact is not widely used over the Internet. Byte ranges are complicated enough!

Related command line options

curl also offers the --continue-at (-C) option which is a perhaps more user-friendly way to resume transfers without the user having to specify the exact byte range and handle data concatenation etc.

Expect: tweaks in curl

One of the persistent myths about HTTP is that it is “a simple protocol”.

Expect: – not always expected

One of the dusty spec corners of HTTP/1.1 (Section 5.1.1 of RFC 7231) explains how the Expect: header works. This is still today in 2020 one of the HTTP request headers that is very commonly ignored by servers and intermediaries.

Background

HTTP/1.1 is designed for being sent over TCP (and possibly also TLS) in a serial manner. Setting up a new connection is costly, both in terms of CPU but especially in time – requiring a number of round-trips. (I’ll detail further down how HTTP/2 fixes all these issues in a much better way.)

HTTP/1.1 provides a number of ways to allow it to perform all its duties without having to shut down the connection. One such an example is the ability to tell a client early on that it needs to provide authentication credentials before the clients sends of a large payload. In order to maintain the TCP connection, a client can’t stop sending a HTTP payload prematurely! When the request body has started to get transmitted, the only way to stop it before the end of data is to cut off the connection and create a new one – wasting time and CPU…

“We want a 100 to continue”

A client can include a header in its outgoing request to ask the server to first acknowledge that everything is fine and that it can continue to send the “payload” – or it can return status codes that informs the client that there are other things it needs to fulfill in order to have the request succeed. Most such cases typically that involves authentication.

This “first tell me it’s OK to send it before I send it” request header looks like this:

Expect: 100-continue

Servers

Since this mandatory header is widely not understood or simply ignored by HTTP/1.1 servers, clients that issue this header will have a rather short timeout and if no response has been received within that period it will proceed and send the data even without a 100.

The timeout thing is happening so often that removing the Expect: header from curl requests is a very common answer to question on how to improve POST or PUT requests with curl, when it works against such non-compliant servers.

Popular browsers

Browsers are widely popular HTTP clients but none of the popular ones ever use this. In fact, very few clients do. This is of course a chicken and egg problem because servers don’t support it very well because clients don’t and client’s don’t because servers don’t support it very well…

curl sends Expect:

When we implemented support for HTTP/1.1 in curl back in 2001, we wanted it done proper. We made it have a short, 1000 milliseconds, timeout waiting for the 100 response. We also made curl automatically include the Expect: header in outgoing requests if we know that the body is larger than NNN or we don’t know the size at all before-hand (and obviously, we don’t do it if we send the request chunked-encoded).

The logic being there that if the data amount is smaller than NNN, then the waste is not very big and we can just as well send it (even if we risk sending it twice) as waiting for a response etc is just going to be more time consuming.

That NNN margin value (known in the curl sources as the EXPECT_100_THRESHOLD) in curl was set to 1024 bytes already then back in the early 2000s.

Bumping EXPECT_100_THRESHOLD

Starting in curl 7.69.0 (due to ship on March 4, 2020) we catch up a little with the world around us and now libcurl will only automatically set the Expect: header if the amount of data to send in the body is larger than 1 megabyte. Yes, we raise the limit by 1024 times.

The reasoning is of course that for most Internet users these days, data sizes below a certain size isn’t even noticeable when transferred and so we adapt. We then also reduce the amount of problems for the very small data amounts where waiting for the 100 continue response is a waste of time anyway.

Credits: landed in this commit. (sorry but my WordPress stupidly can’t show the proper Asian name of the author!)

417 Expectation Failed

One of the ways that a HTTP/1.1 server can deal with an Expect: 100-continue header in a request, is to respond with a 417 code, which should tell the client to retry the same request again, only without the Expect: header.

While this response is fairly uncommon among servers, curl users who ran into 417 responses have previously had to resort to removing the Expect: header “manually” from the request for such requests. This was of course often far from intuitive or easy to figure out for users. A source for grief and agony.

Until this commit, also targeted for inclusion in curl 7.69.0 (March 4, 2020). Starting now, curl will automatically act on 417 response if it arrives as a consequence of using Expect: and then retry the request again without using the header!

Credits: I wrote the patch.

HTTP/2 avoids this all together

With HTTP/2 (and HTTP/3) this entire thing is a non-issue because with these modern protocol versions we can abort a request (stop a stream) prematurely without having to sacrifice the connection. There’s therefore no need for this crazy dance anymore!

HTTP/3 for everyone

FOSDEM 2020 is over for this time and I had an awesome time in Brussels once again.

Stickers

I brought a huge collection of stickers this year and I kept going back to the wolfSSL stand to refill the stash and it kept being emptied almost as fast. Hundreds of curl stickers were given away! The photo on the right shows my “sticker bag” as it looked before I left Sweden.

Lesson for next year: bring a larger amount of stickers! If you missed out on curl stickers, get in touch and I’ll do my best to satisfy your needs.

The talk

“HTTP/3 for everyone” was my single talk this FOSDEM. Just two days before the talk, I landed updated commits in curl’s git master branch for doing HTTP/3 up-to-date with the latest draft (-25). Very timely and I got to update the slide mentioning this.

As I talked HTTP/3 already last year in the Mozilla devroom, I also made sure to go through the slides I used then to compare and make sure I wouldn’t do too much of the same talk. But lots of things have changed and most of the content is updated and different this time around. Last year, literally hundreds of people were lining up outside wanting to get into room when the doors were closed. This year, I talked in the room Janson, which features 1415 seats. The biggest one on campus. It was pack full!

It is kind of an adrenaline rush to stand in front of such a wall of people. At one time in my talk I paused for a brief moment and then I felt I could almost hear the complete silence when a huge amount of attentive faces captured what I had to say.

The audience, photographed by Sidsel Jensen who had to sit in the stairs…
Photo by Mirza Krak
Photo by Wolfgang Gassler

I got a lot of positive feedback on the presentation. I also thought that my decision to not even try to take question in the big room was a correct and I ended up talking and discussing details behind the scene for a good while after my talk was done. Really fun!

The video

The video is also available from the FOSDEM site in webm and mp4 formats.

The slides

If you want the slides only, run over to slideshare and view them.

curl speaks etag

The ETag HTTP response header is an identifier for a specific version of a resource. It lets caches be more efficient and save bandwidth, as a web server does not need to resend a full response if the content has not changed. Additionally, etags help prevent simultaneous updates of a resource from overwriting each other (“mid-air collisions”).

That’s a quote from the mozilla ETag documentation. The header is defined in RFC 7232.

In short, a server can include this header when it responds with a resource, and in subsequent requests when a client wants to get an updated version of that document it sends back the same ETag and says “please give me a new version if it doesn’t match this ETag anymore”. The server will then respond with a 304 if there’s nothing new to return.

It is a better way than modification time stamp to identify a specific resource version on the server.

ETag options

Starting in version 7.68.0 (due to ship on January 8th, 2020), curl will feature two new command line options that makes it easier for users to take advantage of these features. You can of course try it out already now if you build from git or get a daily snapshot.

The ETag options are perfect for situations like when you run a curl command in a cron job to update a file if it has indeed changed on the server.

--etag-save <filename>

Issue the curl command and if there’s an ETag coming back in the response, it gets saved in the given file.

--etag-compare <filename>

Load a previously stored ETag from the given file and include that in the outgoing request (the file should only consist of the specific ETag “value” and nothing else). If the server deems that the resource hasn’t changed, this will result in a 304 response code. If it has changed, the new content will be returned.

Update the file if newer than previously stored ETag:

curl --etag-compare etag.txt --etag-save etag.txt https://example.com -o saved-file

If-Modified-Since options

The other method to accomplish something similar is the -z (or --time-cond) option. It has been supported by curl since the early days. Using this, you give curl a specific time that will be used in a conditional request. “only respond with content if the document is newer than this”:

curl -z "5 dec 2019" https:/example.com

You can also do the inversion of the condition and ask for the file to get delivered only if it is older than a certain time (by prefixing the date with a dash):

curl -z "-1 jan 2003" https://example.com

Perhaps this is most conveniently used when you let curl get the time from a file. Typically you’d use the same file that you’ve saved from a previous invocation and now you want to update if the file is newer on the remote site:

curl -z saved-file -o saved-file https://example.com

Tool, not libcurl

It could be noted that these new features are built entirely in the curl tool by using libcurl correctly with the already provided API, so this change is done entirely outside of the library.

Credits

The idea for the ETag goodness came from Paul Hoffman. The implementation was brought by Maros Priputen – as his first contribution to curl! Thanks!

curl says bye bye to pipelining

HTTP/1.1 Pipelining is the protocol feature where the client sends off a second HTTP/1.1 request already before the answer to the previous request has arrived (completely) from the server. It is defined in the original HTTP/1.1 spec and is a way to avoid waiting times. To reduce latency.

HTTP/1.1 Pipelining was badly supported by curl for a long time in the sense that we had a series of known bugs and it was a fragile feature without enough tests. Also, pipelining is fairly tricky to debug due to the timing sensitivity so very often enabling debug outputs or similar completely changes the nature of the behavior and things are not reproducing anymore!

HTTP pipelining was never enabled by default by the large desktop browsers due to all the issues with it, like broken server implementations and the likes. Both Firefox and Chrome dropped pipelining support entirely since a long time back now. curl did in fact over time become more and more lonely in supporting pipelining.

The bad state of HTTP pipelining was a primary driving factor behind HTTP/2 and its multiplexing feature. HTTP/2 multiplexing is truly and really “pipelining done right”. It is way more solid, practical and solves the use case in a better way with better performance and fewer downsides and problems. (curl enables multiplexing by default since 7.62.0.)

In 2019, pipelining should be abandoned and HTTP/2 should be used instead.

Starting with this commit, to be shipped in release 7.65.0, curl no longer has any code that supports HTTP/1.1 pipelining. It has been disabled in the code since 7.62.0 already so applications and users that use a recent version already should not notice any difference.

Pipelining was always offered on a best-effort basis and there was never any guarantee that requests would actually be pipelined, so we can remove this feature entirely without breaking API or ABI promises. Applications that ask libcurl to use pipelining can still do that, it just won’t have any effect.

Workshop Season 4 Finale

The 2019 HTTP Workshop ended today. In total over the years, we have now done 12 workshop days up to now. This day was not a full day and we spent it on only two major topics that both triggered long discussions involving large parts of the room.

Cookies

Mike West kicked off the morning with his cookies are bad presentation.

One out of every thousand cookie header values is 10K or larger in size and even at the 50% percentile, the size is 480 bytes. They’re a disaster on so many levels. The additional features that have been added during the last decade are still mostly unused. Mike suggests that maybe the only way forward is to introduce a replacement that avoids the issues, and over longer remove cookies from the web: HTTP state tokens.

A lot of people in the room had opinions and thoughts on this. I don’t think people in general have a strong love for cookies and the way they currently work, but the how-to-replace-them question still triggered lots of concerns about issues from routing performance on the server side to the changed nature of the mechanisms that won’t encourage web developers to move over. Just adding a new mechanism without seeing the old one actually getting removed might not be a win.

We should possibly “worsen” the cookie experience over time to encourage switch over. To cap allowed sizes, limit use to only over HTTPS, reduce lifetimes etc, but even just that will take effort and require that the primary cookie consumers (browsers) have a strong will to hurt some amount of existing users/sites.

(Related: Mike is also one of the authors of the RFC6265bis draft in progress – a future refreshed cookie spec.)

HTTP/3

Mike Bishop did an excellent presentation of HTTP/3 for HTTP people that possibly haven’t kept up fully with the developments in the QUIC working group. From a plain HTTP view, HTTP/3 is very similar feature-wise to HTTP/2 but of course sent over a completely different transport layer. (The HTTP/3 draft.)

Most of the questions and discussions that followed were rather related to the transport, to QUIC. Its encryption, it being UDP, DOS prevention, it being “CPU hungry” etc. Deploying HTTP/3 might be a challenge for successful client side implementation, but that’s just nothing compared the totally new thing that will be necessary server-side. Web developers should largely not even have to care…

One tidbit that was mentioned is that in current Firefox telemetry, it shows about 0.84% of all requests negotiates TLS 1.3 early data (with about 12.9% using TLS 1.3)

Thought-worthy quote of the day comes from Willy: “everything is a buffer”

Future Workshops

There’s no next workshop planned but there might still very well be another one arranged in the future. The most suitable interval for this series isn’t really determined and there might be reasons to try tweaking the format to maybe change who will attend etc.

The fact that almost half the attendees this time were newcomers was certainly good for the community but that not a single attendee traveled here from Asia was less good.

Thanks

Thanks to the organizers, the program committee who set this up so nicely and the awesome sponsors!

More Amsterdamned Workshop

Yesterday we plowed through a large and varied selection of HTTP topics in the Workshop. Today we continued. At 9:30 we were all in that room again. Day two.

Martin Thomson talked about his “hx” proposal and how to refer to future responses in HTTP APIs. He ended up basically concluding that “This is too complicated, I think I’m going to abandon this” and instead threw in a follow-up proposal he called “Reverse Javascript” that would be a way for a client to pass on a script for the server to execute! The room exploded in questions, objections and “improvements” to this idea. There are also apparently a pile of prior art in similar vein to draw inspiration from.

With the audience warmed up like this, Anne van Kasteren took us back to reality with an old favorite topic in the HTTP Workshop: websockets. Not a lot of love for websockets in the room… but this was the first of several discussions during the day where a desire or quest for bidirectional HTTP streams was made obvious.

Woo Xie did a presentation with help from Alan Frindell about Extending h2 for Bidirectional Messaging and how they propose a HTTP/2 extension that adds a new frame to create a bidirectional stream that lets them do messaging over HTTP/2 fine. The following discussion was slightly positive but also contained alternative suggestions and references to some of the many similar drafts for bidirectional and p2p connections over http2 that have been done in the past.

Lucas Pardue and Nick Jones did a presentation about HTTP/2 Priorities, based a lot of research previously done and reported by Pat Meenan. Lucas took us through the history of how the priorities ended up like this, their current state and numbers and also the chaos and something about a possible future, the h3 way of doing prio and mr Meenan’s proposed HTTP/3 prio.

Nick’s second half of the presentation then took us through Cloudflare’s Edge Driven HTTP/2 Prioritisation work/experiments and he showed how they could really improve how prioritization works in nginx by making sure the data is written to the socket as late as possible. This was backed up by audience references to the TAPS guidelines on the topic and a general recollection that reducing the number connections is still a good idea and should be a goal! Server buffering is hard.

Asbjørn Ulsberg presented his case for a new request header: prefer-push. When used, the server can respond to the request with a series of pushed resources and thus save several round-rips. This triggered sympathy in the room but also suggestions of alternative approaches.

Alan Frindell presented Partial POST Replay. It’s a rather elaborate scheme that makes their loadbalancers detect when a POST to one of their servers can’t be fulfilled and they instead replay that POST to another backend server. While Alan promised to deliver a draft for this, the general discussion was brought up again about POST and its “replayability”.

Willy Tarreau followed up with a very similar topic: Retrying failed POSTs. In this this context RFC 2310 – The Safe Response Header Field was mentioned and that perhaps something like this could be considered for requests? The discussion certainly had similarities and overlaps with the SEARCH/POST discussion of yesterday.

Mike West talked about Fetch Metadata Request Headers which is a set of request headers explaining for servers where and what for what purpose requests are made by browsers. He also took us through a brief explained of Origin Policy, meant to become a central “resource” for a manifest that describes properties of the origin.

Mark Nottingham presented Structured Headers (draft). This is a new way of specifying and parsing HTTP headers that will make the lives of most HTTP implementers easier in the future. (Parts of the presentation was also spent debugging/triaging the most weird symptoms seen when his Keynote installation was acting up!) It also triggered a smaller side discussion on what kind of approaches that could be taken for HPACK and QPACK to improve the compression ratio for headers.

Anne van Kesteren talked Web-compatible header value parsers, standardizing on how to parse headers not covered by structured headers.

Yoav Weiss described the current status of client hints (draft). This is shipped by Chrome already and he wanted more implementers to use it and tell how its working.

Roberto Peon presented an idea for doing “Partialy-Reliable HTTP” and after his talk and a discussion he concluded they will implement it, play around and come back and tell us what they’ve learned.

Mark Nottingham talked about HTTP for CDNs. He has this fancy-looking test suite in progress that checks how things are working and what is being supported and there are two drafts in progress: the cache response header and the proxy status header field.

Willy Tarreau talked about a race problem he ran into with closing HTTP/2 streams and he explained how he worked around it with a trailing ping frame and suggested that maybe more users might suffer from this problem.

The oxygen level in the room was certainly not on an optimal level at this point but that didn’t stop us. We knew we had a few more topics to get through and we all wanted to get to the boat ride of the evening on time. So…

Hooman Beheshti polled the room to get a feel for what people think about Early hints. Are people still on board? Turns out it is mostly appreciated but not supported by any browser and a discussion and explainer session followed as to why this is and what general problems there are in supporting 1xx headers in browsers. It is striking that most of us HTTP people in the room don’t know how browsers work! Here I could mention that Cory said something about the craziness of this, but I forget his exact words and I blame the fact that they were expressed to me on a boat. Or perhaps that the time is already approaching 1am the night after this fully packed day.

Good follow-up reads from that discussion is Yoav’s blog post A Tale of Four Caches and Jake Archibalds’s HTTP/2 Push is tougher than I thought.

As the final conversation of the day, Anne van Kesteren talked about Response Sources and the different ways a browser can do requests and get responses.

Boat!

HAproxy had the excellent taste of sponsoring this awesome boat ride on the Amsterdam canals for us at the end of the day

Boating on the Amsterdam canals, sponsored by HAproxy!

Thanks again to Cory Benfield for feeding me his notes of the day to help me keep things straight. All mistakes are mine. But if you tell me about them, I will try to correct the text!

The HTTP Workshop 2019 begins

The forth season of my favorite HTTP series is back! The HTTP Workshop skipped over last year but is back now with a three day event organized by the very best: Mark, Martin, Julian and Roy. This time we’re in Amsterdam, the Netherlands.

35 persons from all over the world walked in the room and sat down around the O-shaped table setup. Lots of known faces and representatives from a large variety of HTTP implementations, client-side or server-side – but happily enough also a few new friends that attend their first HTTP Workshop here. The companies with the most employees present in the room include Apple, Facebook, Mozilla, Fastly, Cloudflare and Google – having three or four each in the room.

Patrick Mcmanus started off the morning with his presentation on HTTP conventional wisdoms trying to identify what have turned out as successes or not in HTTP land in recent times. It triggered a few discussions on the specific points and how to judge them. I believe the general consensus ended up mostly agreeing with the slides. The topic of unshipping HTTP/0.9 support came up but is said to not be possible due to its existing use. As a bonus, Anne van Kesteren posted a new bug on Firefox to remove it.

Mark Nottingham continued and did a brief presentation about the recent discussions in HTTPbis sessions during the IETF meetings in Prague last week.

Martin Thomson did a presentation about HTTP authority. Basically how a client decides where and who to ask for a resource identified by a URI. This triggered an intense discussion that involved a lot of UI and UX but also trust, certificates and subjectAltNames, DNS and various secure DNS efforts, connection coalescing, DNSSEC, DANE, ORIGIN frame, alternative certificates and more.

Mike West explained for the room about the concept for Signed Exchanges that Chrome now supports. A way for server A to host contents for server B and yet have the client able to verify that it is fine.

Tommy Pauly then talked to his slides with the title of Website Fingerprinting. He covered different areas of a browser’s activities that are current possible to monitor and use for fingerprinting and what counter-measures that exist to work against furthering that development. By looking at the full activity, including TCP flows and IP addresses even lots of our encrypted connections still allow for pretty accurate and extensive “Page Load Fingerprinting”. We need to be aware and the discussion went on discussing what can or should be done to help out.

The meeting is going on somewhere behind that red door.

Lucas Pardue discussed and showed how we can do TLS interception with Wireshark (since the release of version 3) of Firefox, Chrome or curl and in the end make sure that the resulting PCAP file can get the necessary key bundled in the same file. This is really convenient when you want to send that PCAP over to your protocol debugging friends.

Roberto Peon presented his new idea for “Generic overlay networks”, a suggested way for clients to get resources from one out of several alternatives. A neighboring idea to Signed Exchanges, but still different. There was an interested to further and deepen this discussion and Roberto ended up saying he’d at write up a draft for it.

Max Hils talked about Intercepting QUIC and how the ability to do this kind of thing is very useful in many situations. During development, for debugging and for checking what potentially bad stuff applications are actually doing on your own devices. Intercepting QUIC and HTTP/3 can thus also be valuable but at least for now presents some challenges. (Max also happened to mention that the project he works on, mitmproxy, has more stars on github than curl, but I’ll just let it slide…)

Poul-Henning Kamp showed us vtest – a tool and framework for testing HTTP implementations that both Varnish and HAproxy are now using. Massaged the right way, this could develop into a generic HTTP test/conformance tool that could be valuable for and appreciated by even more users going forward.

Asbjørn Ulsberg showed us several current frameworks that are doing GET, POST or SEARCH with request bodies and discussed how this works with caching and proposed that SEARCH should be defined as cacheable. The room mostly acknowledged the problem – that has been discussed before and that probably the time is ripe to finally do something about it. Lots of users are already doing similar things and cached POST contents is in use, just not defined generically. SEARCH is a already registered method but could get polished to work for this. It was also suggested that possibly POST could be modified to also allow for caching in an opt-in way and Mark volunteered to author a first draft elaborating how it could work.

Indonesian and Tibetan food for dinner rounded off a fully packed day.

Thanks Cory Benfield for sharing your notes from the day, helping me get the details straight!

Diversity

We’re a very homogeneous group of humans. Most of us are old white men, basically all clones and practically indistinguishable from each other. This is not diverse enough!

A big thank you to the HTTP Workshop 2019 sponsors!


The future of HTTP Symposium

This year’s version of curl up started a little differently: With an afternoon of HTTP presentations. The event took place the same week the IETF meeting has just ended here in Prague so we got the opportunity to invite people who possibly otherwise wouldn’t have been here… Of course this was only possible thanks to our awesome sponsors, visible in the image above!

Lukáš Linhart from Apiary started out with “Web APIs: The Past, The Present and The Future”. A journey trough XML-RPC, SOAP and more. One final conclusion might be that we’re not quite done yet…

James Fuller from MarkLogic talked about “The Defenestration of Hypermedia in HTTP”. How HTTP web technologies have changed over time while the HTTP paradigms have survived since a very long time.

I talked about DNS-over-HTTPS. A presentation similar to the one I did before at FOSDEM, but in a shorter time so I had to talk a little faster!

Mike Bishop from Akamai (editor of the HTTP/3 spec and a long time participant in the HTTPbis work) talked about “The evolution of HTTP (from HTTP/1 to HTTP/3)” from HTTP/0.9 to HTTP/3 and beyond.

Robin Marx then rounded off the series of presentations with his tongue in cheek “HTTP/3 (QUIC): too big to fail?!” where we provided a long list of challenges for QUIC and HTTP/3 to get deployed and become successful.

We ended this afternoon session with a casual Q&A session with all the presenters discussing various aspects of HTTP, the web, REST, APIs and the benefits and deployment challenges of QUIC.

I think most of us learned things this afternoon and we could leave the very elegant Charles University room enriched and with more food for thoughts about these technologies.

We ended the evening with snacks and drinks kindly provided by Apiary.

(This event was not streamed and not recorded on video, you had to be there in person to enjoy it.)


Looking for the Refresh header

The other day someone filed a bug on curl that we don’t support redirects with the Refresh header. This took me down a rabbit hole of Refresh header research and I’ve returned to share with you what I learned down there.

tl;dr Refresh is not a standard HTTP header.

As you know, an HTTP redirect is specified to use a 3xx response code and a Location: header to point out the new URL (I use the term URL here but you know what I mean). This has been the case since RFC 1945 (HTTP/1.0). According to an old mail from Roy T Fielding (dated June 1996), Refresh “didn’t make it” into that spec. That was the first “real” HTTP specification. (And the HTTP we used before 1.0 didn’t even have headers!)

The little detail that it never made it into the 1.0 spec or any later one, doesn’t seem to have affected the browsers. Still today, browsers keep supporting the Refresh header as a sort of Location: replacement even though it seems to never have been present in a HTTP spec.

In good company

curl is not the only HTTP library that doesn’t support this non-standard header. The popular python library requests apparently doesn’t according to this bug from 2017, and another bug was filed about it already back in 2011 but it was just closed as “old” in 2014.

I’ve found no support in wget or wget2 either for this header.

I didn’t do any further extensive search for other toolkits’ support, but it seems that the browsers are fairly alone in supporting this header.

How common is the the Refresh header?

I decided to make an attempt to figure out, and for this venture I used the Rapid7 data trove. The method that data is collected with may not be the best – it scans the IPv4 address range and sends a HTTP request to each TCP port 80, setting the IP address in the Host: header. The result of that scan is 52+ million HTTP responses from different and current HTTP origins. (Exactly 52254873 responses in my 59GB data dump, dated end of February 2019).

Results from my scans

  • Location is used in 18.49% of the responses
  • Refresh is used in 0.01738% of the responses (exactly 9080 responses featured them)
  • Location is thus used 1064 times more often than Refresh
  • In 35% of the cases when Refresh is used, Location is also used
  • curl thus handles 99.9939% of the redirects in this test

Additional notes

  • When Refresh is the only redirect header, the response code is usually 200 (with 404 being the second most)
  • When both headers are used, the response code is almost always 30x
  • When both are used, it is common to redirect to the same target and it is also common for the Refresh header value to only contain a number (for the number of seconds until “refresh”).

Refresh from HTML content

Redirects can also be done by meta tags in HTML and sending the refresh that way, but I have not investigated how common as that isn’t strictly speaking HTTP so it is outside of my research (and interest) here.

In use, not documented, not in the spec

Just another undocumented corner of the web.

When I posted about these findings on the HTTPbis mailing list, it was pointed out that WHATWG mentions this header in their iana page. I say mention because calling that documenting would be a stretch…

It is not at all clear exactly what the header is supposed to do and it is not documented anywhere. It’s not exactly a redirect, but almost?

Will/should curl support it?

A decision hasn’t been made about it yet. With such a very low use frequency and since we’ve managed fine without support for it so long, maybe we can just maintain the situation and instead argue that we should just completely deprecate this header use from the web?

Updates

After this post first went live, I got some further feedback and data that are relevant and interesting.

  • Yoav Wiess created a patch for Chrome to count how often they see this header used in real life.
  • Eric Lawrence pointed out that IE had several incompatibilities in its Refresh parser back in the day.
  • Boris pointed out (in the comments below) the WHATWG documented steps for handling the header.
  • The use of <meta> tag refresh in contents is fairly high. The Chrome counter says almost 4% of page loads!