Tag Archives: HTTP

Workshop season six, episode three

One positive thing among many others at this version of the HTTP Workshop (day one, day two) is the fact that there have been several new faces showing up here. People who have not previously attended any HTTP Workshops. Getting fresh blood into the mix is great. A chance to maybe lower the average age of the attendees also feels welcome.

This half day was the final session for this time. Three topics were dealt with.

Do you speak HTTP? Getting your HTTP implementation to do right according to the specification can be a challenge. There is a whole range of existing tests for various areas of HTTP but there might still be a place to add HTTP semantic tests in particular for servers. Discussions brought about reflections around testing, doing tests, test formats, other tests, test infrastructure and more. I think the general sense was that yes it would be great. At least if someone else makes it happen…

Every HTTP stack is an intermediary – HTTP semantics is the (requirement for low-level) API. Yes. Lots of nodding around the huge table.

Workshop feedback and thoughts. What is a good cadence for future events, how long should the events be etc. This is probably the maximum amount of attendees we can handle using the same setup. This event was clearly better than several of the past ones in terms of diversity, but I will second our “workshop maestro” in that it could improve further still. We also discussed whether do-arranging together with IETF is good or bad, should it then be before or after IETF?

I think the consensus said that making it biannual event is good. The reasoning for keeping the event in Europe has been because a larger share of the European attendees come from smaller companies compared to the non-Europeans which to a larger degree come from larger companies that might have it easier to pay for longer trips.

My personal take

The HTTP Workshop is a one-of-a-kind event. At these events everything is about and around HTTP with an information density level that is super high. We get to learn how things actually work for people or that do not work. And that we are not alone in whatever struggles or HTTP challenges we have.

Networking with other doers here and absorbing every protocol detail being expressed, gives food for thoughts and lessons to take advantage from in years to come when we for sure are going to take HTTP transfers further. This is in many ways a kind of brain fertilizer event.

Did I mention I enjoyed it? I will certainly try to attend the next one.

The 2024 Workshop, day two

The fun continues. See day one.

In an office building close to the Waterloo station in London, around 40 persons again sat down at this giant table forming a big square that made it possible for us all to see each other. One by one there were brief presentations done with follow-up discussions. The discussions often reiterated old truths, brought up related topics and sometimes went deep down into the weeds about teeny weeny details of the involved protocol specs. The way we love it.

The people around the table represent Ericsson, Google, Microsoft, Apple, Meta, Akamai, Cloudflare, Fastly, Mozilla, Varnish. Caddy, Nginx, Haproxy, Tomcat, Adobe and curl and probably a few more I forget now. One could say with some level of certainty that a large portion of every day HTTP traffic in the world is managed by things managed by people present here.

This morning we all actually understood that the south entrance is actually the east one (yeah, that’s a so called internal joke) and most of us were sitting down, eager and prepared when the day started at 9:30 am.

Capsule. The capsule protocol (RFC 9297) is a way to, put simply, send UDP packets/datagrams over old style HTTP/1 or HTTP/2 proxies.

Cookies. With the 6265bis effort well on its way to ship as an updated RFC, there is an effort and intent to take yet another stab at improving and refreshing the cookie spec situation. In particular to better split off browser management and API related stuff from the more network-oriented over-the-wire details. You know yours truly never ceases an opportunity to voice his opinion on cookies… I approve of this attempt as well, as I think increasing clarity and improving the specification situation can’t but to help improve things.

Declarative web push. There’s an ongoing effort to improve web push – not to be confused with server push, so that it can be done easier and without needing JavaScript to manage it in the client side.

Reverse HTTP. There are origins who want to contact their CDNs without having to listening on any ports/sockets and still be able to provide content. That’s one of the use cases for Reverse HTTP and we got to learn details from internet drafts done on the topics for the last fifteen years and why it might still be a worthwhile effort and why the use cases still exist. But is there an enough demand to put it into HTTP?

Server Stack Detection. A discussion around how someone can detect the origin of the server stack of any given HTTP server implementation. Should there be a better way? What is the downside of introducing what would basically be the server version of the user-agent header field? Lots of productive discussions on how to avoid recreating problems of the past but in a reversed way.

MoQ: What is it and why is it not just HTTP/3? Was an educational session about the ongoing work done in this working group that is wrongly named and would appreciate more input from the general protocol community.

New HTTP stack. A description of the journey of a full HTTP stack rewrite: how components can be chained together and in which order and a follow-up discussion about if this should be included in documentation and if so in which way etc. Lessons included that the spec is one thing, the Internet is another. Maybe not an entirely new revelation.

Multiplexing in the year 2024. There are details in HTTP/2 multiplexing that does not really work, there are assumptions that are now hard to change. To introduce new protocols and features in the modern HTTP stack, things need to be done for both HTTP/2 and HTTP/3 that are similar but still different and it forces additional work and pain.

What if we create a way to do multiplexing over TCP, so called “over streams”, so that the HTTP/3 fallback over TCP could still be done using HTTP/3 framing. This would allow future new protocols to remain HTTP/3-only and just make the transport be either QUIC+UDP or Streams-over TCP+TLS. This triggered a lot of discussions, mostly positive and forward-looking but also a lot of concerns raised about additional work and yet another protocols component to write and implement that then needs to be supported until the end of times because things never truly go away completely.

I think this sounds like a fun challenge! Count me in.

End of day two. I need a beer or two to digest this.

The 2024 HTTP Workshop

Day one.

For the sixth time, this informal group of HTTP implementers and related “interested parties” unite in a room over a couple of days doing a HTTP Workshop. Nine years since that first event in Münster, Germany.

If you are someone like me, obsessed with networking and HTTP in particular this is certainly the place to be. Talking and discussing past lessons, coming changes and protocol dreams in several days with like-minded people is a blast. The people on these events are friends that I don’t always get to hang out with too often. Many of the attendees here have been involved in this community for a long time and have attended all or most of the past workshops as well. I have fortunately been able to attend all of them so far.

This time we are in London.

Let me tell you a little about the topics of day one – without spilling the beans about exactly who said what or what the company they came from. (I will add links to the presentations later once I know where to link to.)

Make Cleartext HTTP harder. The first discussion point of the day. While we have made HTTPS easier over recent years it can only take us so far. What if we considered means to make HTTP harder to use as the next level efforts to further reduce its use over the internet. The HSTS preload list is only growing. Should we instead convert it into a HSTS exclude list that can shrink over time? This triggered a looong a discussion in the group which brought back a lot of old arguments and reasoning from days I thought we had left behind long ago.

HTTP 2/3 abuses. A prominent implementer of a HTTP proxy/load balancer walked us through a whole series of different HTTP/2 and HTTP/3 protocols details that attackers can have been exploiting in recent years, with details about what can be done to mitigate such attacks. It made several other implementers mention how they take similar precautions and some other general discussions around the topics of what can be considered normal use of the protocol and what is not.

Idle connections & mobile: beneficial or harmful. 0-RTT instead of idle connections. There is a non-negligible cost associate with keeping connections alive for clients running on mobile phones. Would it be possible to instead move forward into a world when they are not kept alive but instead closed and 0-RTT opened again next time they are needed? Again the room woke up to a long discussion about the benefits and problems with doing this – which if it would be possible probably would save a lot of battery time on the average mobile phones.

QUIC pacing. We learned that it is very important for servers to implement decent QUIC pacing as it can increase performance up to 20 times compared to no pacing at all. What about using the flow control properly? What about changing the default buffer sizes for UDP sockets in the Linux kernel to something similar to TCP sockets in order to help the default case to perform better?

HTTP prioritization for product performance. The HTTP/2 way of doing prioritization was deemed a failure and too complicated a long time ago but in this presentation we were taught that there are definitely use cases and scenarios where the regular HTTP/3 priority setup is helpful and improves performs. Examples and descriptions for a popular and well used client were shown.

Allowing HTTP clients to use stale DNS data. What if HTTP clients would use stale data instead of having to wait for the DNS resolver response as a means to avoid having to wait a whole RTT to get the date that in a fair amount of the cases is the same as the stale data. Again a long discussion around TTLs for DNS queries and the fact that some clients are already doing this, in more or less explicit ways.

QUIC cache DSR. As the last talk of the the afternoon we got in the details of Direct Server Response for QUIC and how this can improve performance and problems and challenges involved with this. It then indirectly took us into a long sub-thread talking about HTTP caching, Vary headers and what could and should be done to improve things going forward. There seems to be an understanding that it would be good to improve the current situation but it is not entirely clear to this author exactly what that would entail.

When we then took a walk through the streets of London, only to have an awesome dinner during which we could all conclude that HTTP still is not ready. There is still work to be done. There are challenges left to overcome.

See you tomorrow for day two.

thehttpworkshop2022-day3.txt

The last day of this edition of the HTTP workshop. Thursday November 3, 2022. A half day only. Many participants at the Workshop are going to continue their UK adventure and attend the IETF 115 in London next week.

We started off the day with a deep dive into connection details. How to make connections for HTTP – in particular on mobile devices. How to decide which IP to use, racing connections, timeouts, when to consider a connection attempt “done” (ie after the TCP SYNACK or after the TLS handshake is complete). QUIC vs TCP vs TLS and early data. IPv4 vs IPv6.

ECH. On testing, how it might work, concerns. Statistics are lies. What is the success expectancy for this and what might be the explanations for failures. What tests should be done and what answers about ECH in the wild would we like to get answered going forward?

Dan Stahr: Making a HTTP client good. Discussions around what to expose, not to expose and how HTTP client APIs have been written or should be written. Adobe has its own version of fetch for server use.

Mike Bishop brought up an old favorite subject: a way for a server to provide hints to a client on how to get its content from another host. Is it time to revive this idea? Blind caching the idea and concept was called in an old IETF 95 presentation by Martin Thomson. A host that might be closer the client or faster etc or simply that has the content in question could be used for providing content.

Mark Nottingham addressed the topic of HTTP (interop) testing. Ideas were raised around some kind of generic infrastructure for doing tests for HTTP implementations. I think people were generally positive but I figure time will tell what if anything will actually happen with this.

What’s next for HTTP? Questions were asked to the room, mostly of the hypothetical kind, but I don’t think anyone actually had the nerve to suggest they actually have any clear ideas for that at this moment. At least not in this way. HTTP/4 has been mentioned several times during these days, but then almost always in a joke context.

The End

This is how the workshop ends for this time. Super fun, super informative, packed with information with awesome people. One of these events that give me motivation and a warm fuzzy feeling inside for an extended period of time into the future.

Thanks everyone for your excellence. Thank you organizers for a fine setup!

Workshop season 5 episode 2

Day one was awesome. Now we take the next step.

The missing people

During the discussions today it was again noticeable that apart from some specific individuals we also lack people in the room from prominent “players” in this area such as Chrome developers or humans with closer knowledge of how larger content deliverer work such as Netflix. Or what about a search engine person?

Years ago Chrome was represented well and the Apple side of the world was weak, but the situation seems to have been totally reversed by now.

The day

Mike Bishop talked about the complexities of current internet. Redirects before, during, after. HTTPS records. Alt-svc. Alt-SvcB. Use the HTTPS record for that alternative name.

This presentation triggered a long discussion on how to do things, how things could be done in a future and how the different TTLs in this scenario should or could interact. How to do multi-CDN, how to interact with DNS and what happens if a CDN wants to disable QUIC?

A very long discussion that mostly took us all back to square one in the end. The alt-svcb proposal as is.

Alan Frindell talked about HTTP priorities. What it is (only within connections, it has to be more than one thing and it something goes faster something else needs to go slower, re-prioritization takes a RTT/2 for a client to take effect), when to prioritize (as late as possible, for h2 just before committing to TCP).

Prioritizing images in Meta’s apps. On screen, close, not close. Guess first, then change priority once it knows better. An experiment going on is doing progressive images. Change priority of image transfers once they have received a certain amount. Results and conclusions are pending.

Priority within a single video. An important clue to successful priority handling seems to be more content awareness in the server side. By knowing what is being delivered, the server can make better decisions without the client necessarily having to say anything.

Lunch

I had Thai food. It was good.

Afternoon

Lucas Pardue talked about in HTTP vs tunneling over HTTP. Layers and layers of tunneling and packets within packets. Then Oblivious HTTP, another way of tunneling data and HTTP over HTTP. Related document: RFC 9297 – HTTP Datagrams and the Capsule Protocol. Example case on the Cloudflare blog.

Mark Nottingham talked on Structured Field Values for HTTP and the Retrofit Structured Fields for HTTP.

Who is doing SF libraries and APIs? Do we need an SF schema?

SF compression? Mark’s experiment makes it pretty much on par with HPACK.

Binary SF. Takes ~40% less time parsing in Mark’s experiment. ~10% larger in size (for now).

The was expressed interest in continuing this experimenting going forward. Reducing the CPU time for header parsing is considered valuable.

Sebastian Poreba talked “Networking in your pocket” about the challenges and architectures of mobile phone and smartwatch networking. Connection migration is an important property of QUIC that is attractive in the mobile world. H3 is good.

A challenge is to keep the modem activity off as much as possible and do network activities when the modem is already on.

Beers

My very important duties during these days also involved spending time in pubs and drinking beers with this awesome group of people. This not only delayed my blog post publish times a little, but it might also have introduced an ever so slight level of “haziness” into the process and maybe, just maybe, my recollection of the many details from the day is not exactly as detailed as it could otherwise have been.

That’s just the inevitable result of me sacrificing myself for the team. I did it for us all. You’re welcome.

Another full day

Another day fully packed with HTTP details from early to late. From 9am in the morning until 11pm in the evening. I’m having a great time. Tomorrow is the last day. I’ll let you know what happens then.

HTTP Workshop 2022 – day 1

The HTTP Workshop is an occasional gathering of HTTP experts and other interested parties to discuss the Web’s foundational protocol.

The fifth HTTP Workshop is a three day event that takes place in Oxford, UK. I’m happy to say that I am attending this one as well, as I have all the previous occasions. This is now more than seven years since the first one.

Attendees

A lot of the people attending this year have attended previous workshops, but in a lot of cases we are now employed by other companies then when we attended our first workshops. Almost thirty HTTP stack implementers, experts and spec authors in a room.

There was a few newcomers and some regulars are absent (and missed). Unfortunately, we also maintain the lack of diversity from previous years. We are of course aware of this, but have still failed to improve things in this area very much.

Setup

All the people gather in the same room. A person talks briefly on a specific topic and then we have a free-form discussion about it. When I write this, the slides from today’s presentations have not yet been made available so I cannot link them here. I will add those later.

Discussions

Mark’s introduction talk.

Lucas Pardue started the morning with a presentation and discussion around logging, tooling and the difficulty of debugging modern HTTP setups. With binary protocols doing streams and QUIC, qlog and qvis are in many ways the only available options but they are not supported by everything and there are gaps in what they offer. Room for improvements.

Anne van Kesteren showed a proposal from Domenic Denicola for a No-Vary-Search response header. An attempt to specify a way for HTTP servers to tell clients (and proxies) that parts of the query string produces the same response/resource and should be cached and treated as the same.

An issue that is more complicated than what if first might seem. The proposal has some fans in the room but I think the general consensus was that it is a bad name for the header…

Martin Thomson talked about new stuff in HTTP.

HTTP versions. HTTP/3 is used in over a third of the requests both as seen by Cloudflare server-side and measured in Firefox telemetry client-side. Extensible and functional protocols. Nobody is talking about or seeing a point in discussing about a HTTP/4.

WebSocket over h2/h3. There does not seem to be any particular usage and nobody mentioned any strong reason or desired to change the status. Web Transport is probably what instead is going to be the future.

Frames. Discussions around the use and non-use of the origin frame. Note widely used. Could help to avoid extra “SNI leakage” and extra connections.

Anne then took a second round in front of the room and questioned us on the topic of cookies. Or perhaps more specifically about details in the spec and how to possible change the spec going forward. At least one person in the room insisted fairly strongly than any such restructures of said documents should be done after the ongoing 6265bis work is done.

Dinner

A company very generously sponsored a group dinner for us in the evening and I had a great time. I was asked to not reveal the name of said company, but I can tell you that a lot of the conversations at the table, at least in the area where I was parked, kept up the theme from the day and were HTTP oriented. Including oblivious HTTP, IPv4 formatting allowed in URLs and why IP addresses should not be put in the SNI field. Like any good conversion among friends.

A bug that was 23 years old or not

This is a tale of cookies, Internet code and a CVE. It goes back a long time so please take a seat, lean back and follow along.

The scene is of course curl, the internet transfer tool and library I work on.

1998

In October 1998 we shipped curl 4.9. In 1998. Few people had heard of curl or used it back then. This was a few months before the curl website would announce that curl achieved 300 downloads of a new release. curl was still small in every meaning of the word at that time.

curl 4.9 was the first release that shipped with the “cookie engine”. curl could then receive HTTP cookies, parse them, understand them and send back cookies properly in subsequent requests. Like the browsers did. I wrote the bigger part of the curl code for managing cookies.

In 1998, the only specification that existed and described how cookies worked was a very brief document that Netscape used to host called cookie_spec. I keep a copy of that document around for curious readers. It really does not document things very well and it leaves out enormous amounts information that you had to figure out by inspecting other clients.

The cookie code I implemented than was based on that documentation and what the browsers seemed to do at the time. It seemed to work with numerous server implementations. People found good use for the feature.

2000s

This decade passed with a few separate efforts in the IETF to create cookie specifications but they all failed. The authors of these early cookie specs probably thought they could create standards and the world would magically adapt to them, but this did not work. Cookies are somewhat special in the regard that they are implemented by so many different authors, code bases and websites that fundamentally changing the way they work in a “decree from above” like that is difficult if not downright impossible.

RFC 6265

Finally, in 2011 there was a cookie rfc published! This time with the reversed approach: it primarily documented and clarified how cookies were actually already being used.

I was there and I helped it get made by proving my views and opinions. I did not agree to everything that the spec includes (you can find blog posts about some of those details), but finally having a proper spec was still a huge improvement to the previous state of the world.

Double syntax

What did not bother me much at the time, but has been giving me a bad rash ever since, is the peculiar way the spec is written: it provides one field syntax for how servers should send cookies, and a different one for what syntax clients should accept for cookies.

Two syntax for the same cookies.

This has at least two immediate downsides:

  1. It is hard to read the spec as it is very easy to to fall over one of those and assume that syntax is valid for your use case and accidentally get the wrong role’s description.
  2. The syntax defining how to send cookie is not really relevant as the clients are the ones that decide if they should receive and handle the cookies. The existing large cookie parsers (== browsers) are all fairly liberal in what they accept so nobody notices nor cares about if the servers don’t follow the stricter syntax in the spec.

RFC 6265bis

Since a few years back, there is ongoing work in IETF on revising and updating the cookie spec of 2011. Things have evolved and some extensions to cookies have been put into use in the world and deserves to be included in the spec. If you would to implement code today that manage cookies, the old RFC is certainly not enough anymore. This cookie spec update work is called 6265bis.

curl is up to date and compliant with what the draft versions of RFC 6265bis say.

The issue about the double syntax from above is still to be resolved in the document, but I faced unexpectedly tough resistance when I recently shared my options and thoughts about that spec peculiarity.

It can be noted that fundamentally, cookies still work the same way as they did back in 1998. There are added nuances and knobs sure, but the basic principles have remained. And will so even in the cookie spec update.

One of oddities of cookies is that they don’t work on origins like most other web features do.

HTTP Request tunneling

While cookies have evolved slowly over time, the HTTP specs have also been updated and refreshed a few times over the decades, but perhaps even more importantly the HTTP server implementations have implemented stricter parsing policies as they have (together with the rest of the world) that being liberal in what you accept (Postel’s law) easily lead to disasters. Like the dreaded and repeated HTTP request tunneling/smuggling attacks have showed us.

To combat this kind of attack, and probably to reduce the risk of other issues as well, HTTP servers started to reject incoming HTTP requests early if they appear “illegal” or malformed. Block them already at the door and not letting obvious crap in. In particular this goes for control codes in requests. If you try to send a request to a reasonably new HTTP server today that contains a control code, chances are very high that the server will reject the request and just return a 400 response code.

With control code I mean a byte value between 1 and 31 (excluding 9 which is TAB)

The well known HTTP server Apache httpd has this behavior enabled by default since 2.4.25, shipped in December 2016. Modern nginx versions seem to do this as well, but I have not investigated since exactly when.

Cookies for other hosts

If cookies were designed today for the first time, they certainly would be made different.

A website that sets cookies sends cookies to the client. For each cookie it sends, it sets a number of properties for the cookie. In particular it sets matching parameters for when the cookie should be sent back again by the client.

One of these cookie parameters set for a cookie is the domain that need to match for the client to send it. A server that is called www.example.com can set a cookie for the entire example.com domain, meaning that the cookie will then be sent by the client also when visiting second.example.com. Servers can set cookies for “sibling sites!

Eventually the two paths merged

The cookie code added to curl in 1998 was quite liberal in what content it accepted and while it was of course adjusted and polished over the years, it was working and it was compatible with real world websites.

The main driver for changes in that area of the code has always been to make sure that curl works like and interoperates with other cookie-using agents out in the wild.

CVE-2022-35252

In the end of June 2022 we received a report of a suspected security problem in curl, that would later result in our publication of CVE-2022-35252.

As it turned out, the old cookie code from 1998 accepted cookies that contained control codes. The control codes could be part of the name or the the content just fine, and if the user enabled the “cookie engine” curl would store those cookies and send them back in subsequent requests.

Example of a cookie curl would happily accept:

Set-Cookie: name^a=content^b; domain=.example.com

The ^a and ^b represent control codes, byte code one and two. Since the domain can mark the cookie for another host, as mentioned above, this cookie would get included for requests to all hosts within that domain.

When curl sends a cookie like that to a HTTP server, it would include a header field like this in its outgoing request:

Cookie: name^a=content^b

400

… to which a default configure Apache httpd and other servers will respond 400. For a script or an application that received theses cookies, further requests will be denied for as long as the cookies keep getting sent. A denial of service.

What does the spec say?

The client side part of RFC 6265, section 5.2 is not easy to decipher and figuring out that a client should discard cookies with control cookies requires deep studies of the document. There is in fact no mention of “control codes” or this byte range in the spec. I suppose I am just a bad spec reader.

Browsers

It is actually easier to spot what the popular browsers do since their source codes are easily available, and it turns out of course that both Chrome and Firefox already ignore incoming cookies that contain any of the bytes

%01-%08 / %0b-%0c / %0e-%1f / %7f

The range does not include %09, which is TAB and %0a / %0d which are line endings.

The fix

The curl fix was not too surprisingly and quite simply to refuse cookie fields that contain one or more of those banned byte values. As they are not accepted by the browser’s already, the risk that any legitimate site are using them for any benign purpose is very slim and I deem this change to be nearly risk-free.

The age of the bug

The vulnerable code has been in curl versions since version 4.9 which makes it exactly 8,729 days (23.9 years) until the shipped version 7.85.0 that fixed it. It also means that we introduced the bug on project day 201 and fixed it on day 8,930.

The code was not problematic when it shipped and it was not problematic during a huge portion of the time it has been used by a large amount of users.

It become problematic when HTTP servers started to refuse HTTP requests they suspected could be malicious. The way this code turned into a denial of service was therefore more or less just collateral damage. An unfortunate side effect.

Maybe the bug was born first when RFC 6265 was published. Maybe it was born when the first widely used HTTP server started to reject these requests.

Project record

8,729 days is a new project record age for a CVE to have been present in the code until found. It is still the forth CVE that were lingering around for over 8,000 days until found.

Credits

Thanks to Stefan Eissing for digging up historic Apache details.

Axel Chong submitted the CVE-2022-35252 report.

Campfire image by Martin Winkler from Pixabay

New HTTP core specs

Before this, the latest refreshed specification of HTTP/1.1 was done in the RFC 7230 series, published in June 2014. After that, HTTP/2 was done in the spring of 2015 and recently the HTTP/3 spec has been a work in progress.

To better reflect this new world of multiple HTTP versions and an HTTP protocol ecosystem that has some parts that are common for all versions and some other parts that are specific for each particular version, the team behind this refresh has been working on this updated series.

My favorite documents in this “cluster” are:

HTTP Semantics

RFC 9110 basically describes how HTTP works independently of and across versions.

HTTP/1.1

RFC 9112 replaces 7230.

HTTP/2

RFC 9113 replaces 7540.

HTTP/3

RFC 9114 is finally the version three of the protocol in a published specification.

Credits

Top image by Gerhard G. from Pixabay. The HTTP stack image is done by me, Daniel.

Easier header-picking with curl

Okay you might ask, what’s the news here? We’ve been able to get HTTP response headers with curl since virtually the stone age. Yes we have. Get the page and also show the headers:

curl -i https://example.com/

Make a HEAD request and see what headers we get back:

curl -I https://example.com/

Save the response headers in a separate file:

curl -D headers.txt https://example.com/

Get a specific header

This gets a little more complicated but you can always do

curl -I https://example.com/ | grep Date:

Which of course will fail if the casing is different, you need to check for it case insensitively. There might also be another header ending with “date:” that matches so you need to make sure that this an exact match

curl -I https://example.com/ | grep -i ^Date:

Now this shows the entire header, but for most cases you only want the value. So get it with cut:

curl -I https://example.com/ | grep -i ^Date: | cut -d: -f2-

You have the header value extracted now, but the leading and trailing white spaces in the content are probably not what you want in there so let’s strip them as well:

curl -I https://example.com/ | grep -i ^Date: | cut -d: -f2- | sed 's/^ *\(.*\).*/\1/'

There are of course many different ways you can do this operation and some of them are more clever than the methods I’ve used here. They are still often more or less convoluted and error-prone.

If we imagine that this is a fairly common use case for curl users in the world, then this kind of operation is found duplicated in quite a few scripts, applications and devices in the world.

Maybe we could make this easier for curl users?

A headers API

The other day we introduced a new experimental headers API to libcurl. Using this API, an application using libcurl gets an easy to use API to extract individual or several headers and their content.

As curl is such a libcurl-using application, we have expanded it to make use of this new API and this brings some new fun features to the curl tool.

Let me emphasize that since this API is labeled experimental it is not enabled in a default build. You need to explicitly enable it!

Get a single header, the new way

I decided to extend the -w output feature for this.

To extract a single header, get the value with leading and trailing spaces trimmed, use %header{name}. To repeat the operation from above and get the Date: header

curl -I -w '%header{date}' https://example.com/

‘date’ in this example is a case insensitive header name without the trailing colon and you can of course use any header name you please there. If the given header did not actually arrive in the response, it outputs nothing.

If you want more headers output, just repeat the %header{name} construct as many times as you like. If the -w output string gets unwieldy and hard to manage on the command line, then make it into a text file instead and tell -w about it with -w @filename.

curl -I -w @filename https://example.com/

Which headers?

There are several different kinds of headers and there can be multiple requests used for a transfer, but this option outputs the “normal” server response headers from the most recent request done. The option only works for HTTP(S) responses.

All headers – as JSON

As dealing with formatted data in the form of JSON has become very popular, I want to help fertilize this by making curl able to output all response headers as a JSON object.

This way, you can move the header handling, parsing and perhaps filtering to your JSON aware tool.

Tell curl to output the received HTTP headers as a JSON object:

curl -o save -w "%{header_json}" https://example.com/

curl itself does not pretty-print this, but if you pass the JSON from curl to a beautifier such as jq, the output ends up looking like this:

{
  "age": [
    "269578"
  ],
  "cache-control": [
    "max-age=604800"
  ],
  "content-type": [
    "text/html; charset=UTF-8"
  ],
  "date": [
    "Tue, 22 Mar 2022 08:35:21 GMT"
  ],
  "etag": [
    "\"3147526947+ident\""
  ],
  "expires": [
    "Tue, 29 Mar 2022 08:35:21 GMT"
  ],
  "last-modified": [
    "Thu, 17 Oct 2019 07:18:26 GMT"
  ],
  "server": [
    "ECS (nyb/1D2E)"
  ],
  "vary": [
    "Accept-Encoding"
  ],
  "x-cache": [
    "HIT"
  ],
  "content-length": [
    "1256"
  ]
}

JSON details

The headers are presented in the same order as received over the wire. Except if there are duplicated header names, as then they are grouped on the first occurrence and all values are provided there as a JSON array.

All headers are arrays just because there can be multiple headers using the same name .

The casing for the header names are kept unmodified from what was received, but for duplicated headers the casing used for the first occurrence will be used in the output.

Update: we lowercase all header names in the JSON output.

The “status line” of HTTP 1.x response, that first line that says “HTTP1.1 200 OK” when everything is fine, is not counted as a header by this function and will therefor not be included in this output.

Ships in 7.83.0

This feature is present in source code that will ship in curl 7.83.0, scheduled to happen late April 2022. Run your own build with it enabled, or ask your packager to provide an experimental build for you.

With enough positive feedback we should be able to move this out of experimental state fairly quickly.

A headers API for libcurl

For many years we’ve had this outstanding idea to add a new API to libcurl that would offer applications easy access to HTTP response headers.

Applications could already retrieve the headers using existing methods but that requires them to write a callback and to a certain amount of parsing and “understanding” HTTP that we always felt was a little unfortunate, a bit error-prone on the behalf of the applications and perhaps also a thing that forced a lot of applications out there having to write the same kind of extra function logic.

If libcurl provides this functionality, it would remove a lot of (duplicated) code from a lot of applications.

Designing the API

We started this process a while ago when I first wrote down a basic approach to an API for this and sent it off to the curl-library mailing list for feedback and critique.

/* first take */
char *curl_easy_header(CURL *easy,
                       const char *name);

The conversation that followed that first plea for help, made me realize that my first proposal had been far too basic and it wouldn’t at all work to satisfy the needs and use cases we could think of for this API.

Try again

I went back to mull over what I’ve learned and update my design proposal, trying to take the feedback into account in the best possible way. A few weeks later, I returned with a “proposal v2” and again I asked for comments and opinions on what I had put together.

/* second shot */
CURLHcode curl_easy_header(CURL *easy,
                           const char *name,
                           size_t index,
                           struct curl_header **h);

As I had already adjusted the API from feedback the first time around, the feedback this time was perhaps not calling for as big changes or radical differences as they did the first time around. I could adapt my proposal to what people asked and suggested. We arrived at something that seemed like a pretty solid API for offering HTTP headers to applications.

Let’s do this

As the API proposal feedback settled down and the interface felt good and sensible, I decided it was time for me to write up a first implementation so that we can offer code to people to give everyone a chance to try out the API in real life as well. There’s one thing to give feedback on a “paper product”, actually being able to use it and try it in an application is way better. I dove in.

The final take

When the code worked to the level that I started to be able to extract the first headers with the API, it proved to that we needed to adjust the API a little more, so I did. I then ran into more questions and thoughts about specifics that we hadn’t yet dealt with or nailed proper in the discussions up to that point and I took some questions back to the curl community. This became an iterative process and we smoothed out questions about how access different header “sources” as well as how to deal with multiple headers and “request sequences”. All supported now.

/* final version */
CURLHcode curl_easy_header(CURL *easy,
                           const char *name,
                           size_t index,
                           unsigned int origin,
                           int request,
                           struct curl_header **h);

Multiple headers

This API allows applications to extract all headers from a previous transfer. It can get one or many headers when there are duplicated ones, like Set-Cookie: commonly arrive as.

Sources

The application can ask for “normal” headers, for trailers (that arrive after the body), headers associated with the CONNECT request (if such a one was performed), pseudo headers (that might arrive when HTTP/2 and HTTP/3 is used) or headers associated with a HTTP 1xx “intermediate” response.

Multiple responses

The libcurl APIs typically work on transfers, which means that a single transfer may end up doing multiple transfers, multiple HTTP requests. Primarily when redirects are followed but it can also be due to other reasons. This header API therefore allows the caller to extract headers from the entire “chain” of requests a previous transfer was made with.

EXPERIMENTAL

This API is initially merged (in this commit) labeled “experimental” to be included in the upcoming 7.83.0 release. The experimental label means a few different things to us:

  • The API is disabled by default in the build and you need to explicitly ask for it with --enable-headers-api when you run configure
  • There are no ABI and API promises for these functions yet. We might change the functions based on feedback before we remove the label.
  • We strongly discourage anyone from shipping experimentally labeled functions in production.
  • We rely on people to enable and test this and provide feedback, to give us confidence enough to remove the experimental label as soon as possible.

We use the experimental “route” to lower the bar for merging new stuff, so that we get some extra chances to fix up mistakes before the rules and API are carved in stone and we are set to support that for a life time.

This setup relies on users actually trying out the experimental stuff as otherwise it isn’t method for improving the API, it will only delay the introduction of it to the general public. And it risks becoming be less good.

Documentation

The two new functions have detailed man pages: curl_easy_header and curl_easy_nextheader. If there is anything missing on unclear in there, let us know!

I have also created an initial example source snippet showing header API use. See headerapi.c.

This API deserves its own little section in the everything curl book, but I think I will wait for it to get landed “for real” before I work on adding that.