Tag Archives: HTTP

The HTTP Workshop 2019 begins

The forth season of my favorite HTTP series is back! The HTTP Workshop skipped over last year but is back now with a three day event organized by the very best: Mark, Martin, Julian and Roy. This time we’re in Amsterdam, the Netherlands.

35 persons from all over the world walked in the room and sat down around the O-shaped table setup. Lots of known faces and representatives from a large variety of HTTP implementations, client-side or server-side – but happily enough also a few new friends that attend their first HTTP Workshop here. The companies with the most employees present in the room include Apple, Facebook, Mozilla, Fastly, Cloudflare and Google – having three or four each in the room.

Patrick Mcmanus started off the morning with his presentation on HTTP conventional wisdoms trying to identify what have turned out as successes or not in HTTP land in recent times. It triggered a few discussions on the specific points and how to judge them. I believe the general consensus ended up mostly agreeing with the slides. The topic of unshipping HTTP/0.9 support came up but is said to not be possible due to its existing use. As a bonus, Anne van Kesteren posted a new bug on Firefox to remove it.

Mark Nottingham continued and did a brief presentation about the recent discussions in HTTPbis sessions during the IETF meetings in Prague last week.

Martin Thomson did a presentation about HTTP authority. Basically how a client decides where and who to ask for a resource identified by a URI. This triggered an intense discussion that involved a lot of UI and UX but also trust, certificates and subjectAltNames, DNS and various secure DNS efforts, connection coalescing, DNSSEC, DANE, ORIGIN frame, alternative certificates and more.

Mike West explained for the room about the concept for Signed Exchanges that Chrome now supports. A way for server A to host contents for server B and yet have the client able to verify that it is fine.

Tommy Pauly then talked to his slides with the title of Website Fingerprinting. He covered different areas of a browser’s activities that are current possible to monitor and use for fingerprinting and what counter-measures that exist to work against furthering that development. By looking at the full activity, including TCP flows and IP addresses even lots of our encrypted connections still allow for pretty accurate and extensive “Page Load Fingerprinting”. We need to be aware and the discussion went on discussing what can or should be done to help out.

The meeting is going on somewhere behind that red door.

Lucas Pardue discussed and showed how we can do TLS interception with Wireshark (since the release of version 3) of Firefox, Chrome or curl and in the end make sure that the resulting PCAP file can get the necessary key bundled in the same file. This is really convenient when you want to send that PCAP over to your protocol debugging friends.

Roberto Peon presented his new idea for “Generic overlay networks”, a suggested way for clients to get resources from one out of several alternatives. A neighboring idea to Signed Exchanges, but still different. There was an interested to further and deepen this discussion and Roberto ended up saying he’d at write up a draft for it.

Max Hils talked about Intercepting QUIC and how the ability to do this kind of thing is very useful in many situations. During development, for debugging and for checking what potentially bad stuff applications are actually doing on your own devices. Intercepting QUIC and HTTP/3 can thus also be valuable but at least for now presents some challenges. (Max also happened to mention that the project he works on, mitmproxy, has more stars on github than curl, but I’ll just let it slide…)

Poul-Henning Kamp showed us vtest – a tool and framework for testing HTTP implementations that both Varnish and HAproxy are now using. Massaged the right way, this could develop into a generic HTTP test/conformance tool that could be valuable for and appreciated by even more users going forward.

Asbjørn Ulsberg showed us several current frameworks that are doing GET, POST or SEARCH with request bodies and discussed how this works with caching and proposed that SEARCH should be defined as cacheable. The room mostly acknowledged the problem – that has been discussed before and that probably the time is ripe to finally do something about it. Lots of users are already doing similar things and cached POST contents is in use, just not defined generically. SEARCH is a already registered method but could get polished to work for this. It was also suggested that possibly POST could be modified to also allow for caching in an opt-in way and Mark volunteered to author a first draft elaborating how it could work.

Indonesian and Tibetan food for dinner rounded off a fully packed day.

Thanks Cory Benfield for sharing your notes from the day, helping me get the details straight!

Diversity

We’re a very homogeneous group of humans. Most of us are old white men, basically all clones and practically indistinguishable from each other. This is not diverse enough!

A big thank you to the HTTP Workshop 2019 sponsors!


The future of HTTP Symposium

This year’s version of curl up started a little differently: With an afternoon of HTTP presentations. The event took place the same week the IETF meeting has just ended here in Prague so we got the opportunity to invite people who possibly otherwise wouldn’t have been here… Of course this was only possible thanks to our awesome sponsors, visible in the image above!

Lukáš Linhart from Apiary started out with “Web APIs: The Past, The Present and The Future”. A journey trough XML-RPC, SOAP and more. One final conclusion might be that we’re not quite done yet…

James Fuller from MarkLogic talked about “The Defenestration of Hypermedia in HTTP”. How HTTP web technologies have changed over time while the HTTP paradigms have survived since a very long time.

I talked about DNS-over-HTTPS. A presentation similar to the one I did before at FOSDEM, but in a shorter time so I had to talk a little faster!

Mike Bishop from Akamai (editor of the HTTP/3 spec and a long time participant in the HTTPbis work) talked about “The evolution of HTTP (from HTTP/1 to HTTP/3)” from HTTP/0.9 to HTTP/3 and beyond.

Robin Marx then rounded off the series of presentations with his tongue in cheek “HTTP/3 (QUIC): too big to fail?!” where we provided a long list of challenges for QUIC and HTTP/3 to get deployed and become successful.

We ended this afternoon session with a casual Q&A session with all the presenters discussing various aspects of HTTP, the web, REST, APIs and the benefits and deployment challenges of QUIC.

I think most of us learned things this afternoon and we could leave the very elegant Charles University room enriched and with more food for thoughts about these technologies.

We ended the evening with snacks and drinks kindly provided by Apiary.

(This event was not streamed and not recorded on video, you had to be there in person to enjoy it.)


Looking for the Refresh header

The other day someone filed a bug on curl that we don’t support redirects with the Refresh header. This took me down a rabbit hole of Refresh header research and I’ve returned to share with you what I learned down there.

tl;dr Refresh is not a standard HTTP header.

As you know, an HTTP redirect is specified to use a 3xx response code and a Location: header to point out the new URL (I use the term URL here but you know what I mean). This has been the case since RFC 1945 (HTTP/1.0). According to an old mail from Roy T Fielding (dated June 1996), Refresh “didn’t make it” into that spec. That was the first “real” HTTP specification. (And the HTTP we used before 1.0 didn’t even have headers!)

The little detail that it never made it into the 1.0 spec or any later one, doesn’t seem to have affected the browsers. Still today, browsers keep supporting the Refresh header as a sort of Location: replacement even though it seems to never have been present in a HTTP spec.

In good company

curl is not the only HTTP library that doesn’t support this non-standard header. The popular python library requests apparently doesn’t according to this bug from 2017, and another bug was filed about it already back in 2011 but it was just closed as “old” in 2014.

I’ve found no support in wget or wget2 either for this header.

I didn’t do any further extensive search for other toolkits’ support, but it seems that the browsers are fairly alone in supporting this header.

How common is the the Refresh header?

I decided to make an attempt to figure out, and for this venture I used the Rapid7 data trove. The method that data is collected with may not be the best – it scans the IPv4 address range and sends a HTTP request to each TCP port 80, setting the IP address in the Host: header. The result of that scan is 52+ million HTTP responses from different and current HTTP origins. (Exactly 52254873 responses in my 59GB data dump, dated end of February 2019).

Results from my scans

  • Location is used in 18.49% of the responses
  • Refresh is used in 0.01738% of the responses (exactly 9080 responses featured them)
  • Location is thus used 1064 times more often than Refresh
  • In 35% of the cases when Refresh is used, Location is also used
  • curl thus handles 99.9939% of the redirects in this test

Additional notes

  • When Refresh is the only redirect header, the response code is usually 200 (with 404 being the second most)
  • When both headers are used, the response code is almost always 30x
  • When both are used, it is common to redirect to the same target and it is also common for the Refresh header value to only contain a number (for the number of seconds until “refresh”).

Refresh from HTML content

Redirects can also be done by meta tags in HTML and sending the refresh that way, but I have not investigated how common as that isn’t strictly speaking HTTP so it is outside of my research (and interest) here.

In use, not documented, not in the spec

Just another undocumented corner of the web.

When I posted about these findings on the HTTPbis mailing list, it was pointed out that WHATWG mentions this header in their iana page. I say mention because calling that documenting would be a stretch…

It is not at all clear exactly what the header is supposed to do and it is not documented anywhere. It’s not exactly a redirect, but almost?

Will/should curl support it?

A decision hasn’t been made about it yet. With such a very low use frequency and since we’ve managed fine without support for it so long, maybe we can just maintain the situation and instead argue that we should just completely deprecate this header use from the web?

Updates

After this post first went live, I got some further feedback and data that are relevant and interesting.

  • Yoav Wiess created a patch for Chrome to count how often they see this header used in real life.
  • Eric Lawrence pointed out that IE had several incompatibilities in its Refresh parser back in the day.
  • Boris pointed out (in the comments below) the WHATWG documented steps for handling the header.
  • The use of <meta> tag refresh in contents is fairly high. The Chrome counter says almost 4% of page loads!

alt-svc in curl

The RFC 7838 was published already in April 2016. It describes the new HTTP header Alt-Svc, or as the title of the document says HTTP Alternative Services.

HTTP Alternative Services

An alternative service in HTTP lingo is a quite simply another server instance that can provide the same service and act as the same origin as the original one. The alternative service can run on another port, on another host name, on another IP address, or over another HTTP version.

An HTTP server can inform a client about the existence of such alternatives by returning this Alt-Svc header. The header, which has an expiry time, tells the client that there’s an optional alternative to this service that is hosted on that host name, that port number using that protocol. If that client is a browser, it can connect to the alternative in the background and if that works out fine, continue to use that host for the rest of the time that alternative is said to work.

In reality, this header becomes a little similar to the DNS records SRV or URI: it points out a different route to the server than what the A/AAAA records for it say.

The Alt-Svc header came into life as an attempt to help out with HTTP/2 load balancing, since with the introduction of HTTP/2 clients would suddenly use much more persistent and long-living connections instead of the very short ones used for traditional HTTP/1 web browsing which changed the nature of how connections are done. This way, a system that is about to go down can hint the clients on how to continue using the service, elsewhere.

Alt-Svc: h2="backup.example.com:443"; ma=2592000;

HTTP upgrades

Once that header was published, the by then already existing and deployed Google QUIC protocol switched to using the Alt-Svc header to hint clients (read “Chrome users”) that “hey, this service is also available over gQUIC“. (Prior to that, they used their own custom alternative header that basically had the same meaning.)

This is important because QUIC is not TCP. Resources on the web that are pointed out using the traditional HTTPS:// URLs, still imply that you connect to them using TCP on port 443 and you negotiate TLS over that connection. Upgrading from HTTP/1 to HTTP/2 on the same connection was “easy” since they were both still TCP and TLS. All we needed then was to use the ALPN extension and voila: a nice and clean version negotiation.

To upgrade a client and server communication into a post-TCP protocol, the only official way to it is to first connect using the lowest common denominator that the HTTPS URL implies: TLS over TCP, and only once the server tells the client what more there is to try, the client can go on and try out the new toys.

For HTTP/3, this is the official way for HTTP servers to tell users about the availability of an HTTP/3 upgrade option.

curl

I want curl to support HTTP/3 as soon as possible and then as I’ve mentioned above, understanding Alt-Svc is a key prerequisite to have a working “bootstrap”. curl needs to support Alt-Svc. When we’re implementing support for it, we can just as well support the whole concept and other protocol versions and not just limit it to HTTP/3 purposes.

curl will only consider received Alt-Svc headers when talking HTTPS since only then can it know that it actually speaks with the right host that has the authority enough to point to other places.

Experimental

This is the first feature and code that we merge into curl under a new concept we do for “experimental” code. It is a way for us to mark this code as: we’re not quite sure exactly how everything should work so we allow users in to test and help us smooth out the quirks but as a consequence of this we might actually change how it works, both behavior and API wise, before we make the support official.

We strongly discourage anyone from shipping code marked experimental in production. You need to explicitly enable this in the build to get the feature. (./configure –enable-alt-svc)

But at the same time we urge and encourage interested users to test it out, try how it works and bring back your feedback, criticism, praise, bug reports and help us make it work the way we’d like it to work so that we can make it land as a “normal” feature as soon as possible.

Ship

The experimental alt-svc code has been merged into curl as of commit 98441f3586 (merged March 3rd 2019) and will be present in the curl code starting in the public release 7.64.1 that is planned to ship on March 27, 2019. I don’t have any time schedule for when to remove the experimental tag but ideally it should happen within just a few release cycles.

alt-svc cache

The curl implementation of alt-svc has an in-memory cache of known alternatives. It can also both save that cache to a text file and load that file back into memory. Saving the alt-svc cache to disk allows it to survive curl invokes and to truly work the way it was intended. The cache file stores the expire timestamp per entry so it doesn’t matter if you try to use a stale file.

curl –alt-svc

Caveat: I now talk about how a feature works that I’ve just above said might change before it ships. With the curl tool you ask for alt-svc support by pointing out the alt-svc cache file to use. Or pass a “” (empty name) to make it not load or save any file. It makes curl load an existing cache from that file and at the end, also save the cache to that file.

curl also already since a long time features fancy connection options such as –resolve and –connect-to, which both let a user control where curl connects to, which in many cases work a little like a static poor man’s alt-svc. Learn more about those in my curl another host post.

libcurl options for alt-svc

We start out the alt-svc support for libcurl with two separate options. One sets the file name to the alt-svc cache on disk (CURLOPT_ALTSVC), and the other control various aspects of how libcurl should behave in regards to alt-svc specifics (CURLOPT_ALTSVC_CTRL).

I’m quite sure that we will have reason to slightly adjust these when the HTTP/3 support comes closer to actually merging.

HTTP/3 talk in Stockholm on January 22

HTTP/3 – the coming HTTP version

This time TCP is replaced by the new transport protocol QUIC and things are different yet again! This is a presentation by Daniel Stenberg about HTTP/3 and QUIC with a following Q&A about everything HTTP.

The presentation will be done in English. It will be recorded and possibly live-streamed. Organized by me, together with our friends at goto10. It is free of charge, but you need to register.

When

17:30 – 19:00
January 22, 2019

Goto 10: Hörsalen, Hammarby Kaj 10D plan 5

Register here!

Fancy map to goto 10


I’m leaving Mozilla

It’s been five great years, but now it is time for me to move on and try something else.

During these five years I’ve met and interacted with a large number of awesome people at Mozilla, lots of new friends! I got the chance to work from home and yet work with a global team on a widely used product, all done with open source. I have worked on internet protocols during work-hours (in addition to my regular spare-time working with them) and its been great! Heck, lots of the HTTP/2 development and the publication of that was made while I was employed by Mozilla and I fondly participated in that. I shall forever have this time ingrained in my memory as a very good period of my life.

I had already before I joined the Firefox development understood some of the challenges of making a browser in the modern era, but that understanding has now been properly enriched with lots of hands-on and code-digging in sometimes decades-old messy C++, a spaghetti armada of threads and the wild wild west of users on the Internet.

A very big thank you and a warm bye bye go to everyone of my friends at Mozilla. I won’t be far off and I’m sure I will have reasons to see many of you again.

My last day as officially employed by Mozilla is December 11 2018, but I plan to spend some of my remaining saved up vacation days before then so I’ll hand over most of my responsibilities way before.

The future is bright but unknown!

I don’t yet know what to do next.

I have some ideas and communications with friends and companies, but nothing is firmly decided yet. I will certainly entertain you with a totally separate post on this blog once I have that figured out! Don’t worry.

Will it affect curl or other open source I do?

I had worked on curl for a very long time already before joining Mozilla and I expect to keep doing curl and other open source things even going forward. I don’t think my choice of future employer should have to affect that negatively too much, except of course in periods.

With me leaving Mozilla, we’re also losing Mozilla as a primary sponsor of the curl project, since that was made up of them allowing me to spend some of my work days on curl and that’s now over.

Short-term at least, this move might increase my curl activities since I don’t have any new job yet and I need to fill my days with something…

What about toying with HTTP?

I was involved in the IETF HTTPbis working group for many years before I joined Mozilla (for over ten years now!) and I hope to be involved for many years still. I still have a lot of things I want to do with curl and to keep curl the champion of its class I need to stay on top of the game.

I will continue to follow and work with HTTP and other internet protocols very closely. After all curl remains the world’s most widely used HTTP client.

Can I enter the US now?

No. That’s unfortunately not related, and I’m not leaving Mozilla because of this problem and I unfortunately don’t expect my visa situation to change because of this change. My visa counter is now showing more than 214 days since I applied.

HTTP/3

The protocol that’s been called HTTP-over-QUIC for quite some time has now changed name and will officially become HTTP/3. This was triggered by this original suggestion by Mark Nottingham.

The QUIC Working Group in the IETF works on creating the QUIC transport protocol. QUIC is a TCP replacement done over UDP. Originally, QUIC was started as an effort by Google and then more of a “HTTP/2-encrypted-over-UDP” protocol.

When the work took off in the IETF to standardize the protocol, it was split up in two layers: the transport and the HTTP parts. The idea being that this transport protocol can be used to transfer other data too and its not just done explicitly for HTTP or HTTP-like protocols. But the name was still QUIC.

People in the community has referred to these different versions of the protocol using informal names such as iQUIC and gQUIC to separate the QUIC protocols from IETF and Google (since they differed quite a lot in the details). The protocol that sends HTTP over “iQUIC” was called “hq” (HTTP-over-QUIC) for a long time.

Mike Bishop scared the room at the QUIC working group meeting in IETF 103 when he presented this slide with what could be thought of almost a logo…

On November 7, 2018 Dmitri of Litespeed announced that they and Facebook had successfully done the first interop ever between two HTTP/3 implementations. Mike Bihop’s follow-up presentation in the HTTPbis session on the topic can be seen here. The consensus in the end of that meeting said the new name is HTTP/3!

No more confusion. HTTP/3 is the coming new HTTP version that uses QUIC for transport!

curl up 2019 will happen in Prague

The curl project is happy to invite you to the city of Prague, the Czech Republic, where curl up 2019 will take place.

curl up is our annual curl developers conference where we gather and talk Internet protocols, curl’s past, current situation and how to design its future. A weekend of curl.

Previous years we’ve gathered twenty-something people for an intimate meetup in a very friendly atmosphere. The way we like it!

In a spirit to move the meeting around to give different people easier travel, we have settled on the city of Prague for 2019, and we’ll be there March 29-31.

Sign up now!

Symposium on the Future of HTTP

This year, we’re starting off the Friday afternoon with a Symposium dedicated to “the future of HTTP” which is aimed to be less about curl and more about where HTTP is and where it will go next. Suitable for a slightly wider audience than just curl fans.

That’s Friday the 29th of March, 2019.

Program and talks

We are open for registrations and we would love to hear what you would like to come and present for us – on the topics of HTTP, of curl or related matters. I’m sure I will present something too, but it becomes a much better and more fun event if we distribute the talking as much as possible.

The final program for these days is not likely to get set until much later and rather close in time to the actual event.

The curl up 2019 wiki page is where you’ll find more specific details appear over time. Just go back there and see.

Helping out and planning?

If you want to follow the planning, help out, offer improvements or you have questions on any of this? Then join the curl-meet mailing list, which is dedicated for this!

Free or charge thanks to sponsors

We’re happy to call our event free, or “almost free” of charge and we can do this only due to the greatness and generosity of our awesome sponsors. This year we say thanks to Mullvad, Sticker Mule, Apiary and Charles University.

There’s still a chance for your company to help out too! Just get in touch.

curl up 2019 with logos

Would you like some bold with those headers?

Displaying HTTP headers for a URL on the screen is one of those things people commonly use curl for.

curl -I example.com

To help your eyes separate header names from the corresponding values, I’ve been experimenting with a change that makes the header names get shown using a bold type face and the header values to the right of the colons to use the standard font.

Sending a HEAD request to the curl site could look like this:

This seemingly small change required an unexpectedly large surgery.

Now I want to turn this into a discussion if this is good enough, if we need more customization, how to make the code act on windows and perhaps how an option to explicitly enable/disable this should be named.

If you have ideas for any of that or other things around this feature, do comment in the PR.

The feature window for the next curl release is already closed so this change will not be considered for real until curl 7.61.0 at the earliest. Due for release in July 2018. So lots of time left to really “bike shed” all the details!

Update: the PR was merged into master on May 21st.

curl another host

Sometimes you want to issue a curl command against a server, but you don’t really want curl to resolve the host name in the given URL and use that, you want to tell it to go elsewhere. To the “wrong” host, which in this case of course happens to be the right host. Because you know better.

Don’t worry. curl covers this as well, in several different ways…

Fake the host header

The classic and and easy to understand way to send a request to the wrong HTTP host is to simply send a different Host: header so that the server will provide a response for that given server.

If you run your “example.com” HTTP test site on localhost and want to verify that it works:

curl --header "Host: example.com" http://127.0.0.1/

curl will also make cookies work for example.com in this case, but it will fail miserably if the page redirects to another host and you enable redirect-following (--location) since curl will send the fake Host: header in all further requests too.

The --header option cleverly cancels the built-in provided Host: header when a custom one is provided so only the one passed in from the user gets sent in the request.

Fake the host header better

We’re using HTTPS everywhere these days and just faking the Host: header is not enough then. An HTTPS server also needs to get the server name provided already in the TLS handshake so that it knows which cert etc to use. The name is provided in the SNI field. curl also needs to know the correct host name to verify the server certificate against (server certificates are rarely registered for an IP address). curl extracts the name to use in both those case from the provided URL.

As we can’t just put the IP address in the URL for this to work, we reverse the approach and instead give curl the proper URL but with a custom IP address to use for the host name we set. The --resolve command line option is our friend:

curl --resolve example.com:443:127.0.0.1 https://example.com/

Under the hood this option populates curl’s DNS cache with a custom entry for “example.com” port 443 with the address 127.0.0.1, so when curl wants to connect to this host name, it finds your crafted address and connects to that instead of the IP address a “real” name resolve would otherwise return.

This method also works perfectly when following redirects since any further use of the same host name will still resolve to the same IP address and redirecting to another host name will then resolve properly. You can even use this option multiple times on the command line to add custom addresses for several names. You can also add multiple IP addresses for each name if you want to.

Connect to another host by name

As shown above, --resolve is awesome if you want to point curl to a specific known IP address. But sometimes that’s not exactly what you want either.

Imagine you have a host name that resolves to a number of different host names, possibly a number of front end servers for the same site/service. Not completely unheard of. Now imagine you want to issue your curl command to one specific server out of the front end servers. It’s a server that serves “example.com” but the individual server is called “host-47.example.com”.

You could resolve the host name in a first step before curl is used and use --resolve as shown above.

Or you can use --connect-to, which instead works on a host name basis. Using this, you can make curl replace a specific host name + port number pair with another host name + port number pair before the name is resolved!

curl --connect-to example.com:443:host-47.example.com:443 https://example.com/

Crazy combos

Most options in curl are individually controlled which means that there’s rarely logic that prevents you from using them in the awesome combinations that you can think of.

--resolve, --connect-to and --header can all be used in the same command line!

Connect to a HTTPS host running on localhost, use the correct name for SNI and certificate verification, but then still ask for a separate host in the Host: header? Sure, no problem:

curl --resolve example.com:443:127.0.0.1 https://example.com/ --header "Host: diff.example.com"

All the above with libcurl?

When you’re done playing with the curl options as described above and want to convert your command lines to libcurl code instead, your best friend is called --libcurl.

Just append --libcurl example.c to your command line, and curl will generate the C code template for you in that given file name. Based on that template, making use of  that code correctly is usually straight-forward and you’ll get all the options to read up in a convenient way.

Good luck!

Update: thanks to @Manawyrm, I fixed the ndash issues this post originally had.