Category Archives: Web

web stuff

curl localhost as a local host

When you use the name localhost in a URL, what does it mean? Where does the network traffic go when you ask curl to download http://localhost ?

Is “localhost” just a name like any other or do you think it infers speaking to your local host on a loopback address?

Previously

curl http://localhost

The name was “resolved” using the standard resolver mechanism into one or more IP addresses and then curl connected to the first one that works and gets the data from there.

The (default) resolving phase there involves asking the getaddrinfo() function about the name. In many systems, it will return the IP address(es) specified in /etc/hosts for the name. In some systems things are a bit more unusually setup and causes a DNS query get sent out over the network to answer the question.

In other words: localhost was not really special and using this name in a URL worked just like any other name in curl. In most cases in most systems it would resolve to 127.0.0.1 and ::1 just fine, but in some cases it would mean something completely different. Often as a complete surprise to the user…

Starting now

curl http://localhost

Starting in commit 1a0ebf6632f8, to be released in curl 7.78.0, curl now treats the host name “localhost” specially and will use an internal “hard-coded” set of addresses for it – the ones we typically use for the loopback device: 127.0.0.1 and ::1. It cannot be modified by /etc/hosts and it cannot be accidentally or deliberately tricked by DNS resolves. localhost will now always resolve to a local address!

Does that kind of mistakes or modifications really happen? Yes they do. We’ve seen it and you can find other projects report it as well.

Who knows, it might even be a few microseconds faster than doing the “full” resolve call.

(You can still build curl without IPv6 support at will and on systems without support, for which the ::1 address of course will not be provided for localhost.)

Specs say we can

The RFC 6761 is titled Special-Use Domain Names and in its section 6.3 it especially allows or even encourages this:

Users are free to use localhost names as they would any other domain names.  Users may assume that IPv4 and IPv6 address queries for localhost names will always resolve to the respective IP loopback address.

Followed by

Name resolution APIs and libraries SHOULD recognize localhost names as special and SHOULD always return the IP loopback address for address queries and negative responses for all other query types. Name resolution APIs SHOULD NOT send queries for localhost names to their configured caching DNS server(s).

Mike West at Google also once filed an I-D with even stronger wording suggesting we should always let localhost be local. That wasn’t ever turned into an RFC though but shows a mindset.

(Some) Browsers do it

Chrome has been special-casing localhost this way since 2017, as can be seen in this commit and I think we can safely assume that the other browsers built on their foundation also do this.

Firefox landed their corresponding change during the fall of 2020, as recorded in this bugzilla entry.

Safari (on macOS at least) does however not do this. It rather follows what /etc/hosts says (and presumably DNS of not present in there). I’ve not found any official position on the matter, but I found this source code comment indicating that localhost resolving might change at some point:

// FIXME: Ensure that localhost resolves to the loopback address.

Windows (kind of) does it

Since some time back, Windows already resolves “localhost” internally and it is not present in their /etc/hosts alternative. I believe it is more of a hybrid solution though as I believe you can put localhost into that file and then have that custom address get used for the name.

Secure over http://localhost

When we know for sure that http://localhost is indeed a secure context (that’s a browser term I’m borrowing, sorry), we can follow the example of the browsers and for example curl should be able to start considering cookies with the “secure” property to be dealt with over this host even when done over plain HTTP. Previously, secure in that regard has always just meant HTTPS.

This change in cookie handling has not happened in curl yet, but with localhost being truly local, it seems like an improvement we can proceed with.

Can you still trick curl?

When I mentioned this change proposal on twitter two of the most common questions in response were

  1. can’t you still trick curl by routing 127.0.0.1 somewhere else
  2. can you still use --resolve to “move” localhost?

The answers to both questions are yes.

You can of course commit the most hideous hacks to your system and reroute traffic to 127.0.0.1 somewhere else if you really wanted to. But I’ve never seen or heard of anyone doing it, and it certainly will not be done by mistake. But then you can also just rebuild your curl/libcurl and insert another address than the default as “hardcoded” and it’ll behave even weirder. It’s all just software, we can make it do anything.

The --resolve option is this magic thing to redirect curl operations from the given host to another custom address. It also works for localhost, since curl will check the cache before the internal resolve and --resolve populates the DNS cache with the given entries. (Provided to applications via the CURLOPT_RESOLVE option.)

What will break?

With enough number of users, every single little modification or even improvement is likely to trigger something unexpected and undesired on at least one system somewhere. I don’t think this change is an exception. I fully expect this to cause someone to shake their fist in the sky.

However, I believe there are fairly good ways to make to restore even the most complicated use cases even after this change, even if it might take some hands on to update the script or application. I still believe this change is a general improvement for the vast majority of use cases and users. That’s also why I haven’t provided any knob or option to toggle off this behavior.

Credits

The top photo was taken by me (the symbolism being that there’s a path to take somewhere but we don’t really know where it leads or which one is the right to take…). This curl change was written by me. Mike West provided me the Chrome localhost change URL. Valentin Gosu gave me the Firefox bugzilla link.

Where is HTTP/3 right now?

tldr: the level of HTTP/3 support in servers is surprisingly high.

The specs

The specifications are all done. They’re now waiting in queues to get their final edits and approvals before they will get assigned RFC numbers and get published as such – they will not change any further. That’s a set of RFCs (six I believe) for various aspects of this new stack. The HTTP/3 spec is just one of those. Remember: HTTP/3 is the application protocol done over the new transport QUIC. (See http3 explained for a high-level description.)

The HTTP/3 spec was written to refer to, and thus depend on, two other HTTP specs that are in the works: httpbis-cache and https-semantics. Those two are mostly clarifications and cleanups of older HTTP specs, but this forces the HTTP/3 spec to have to get published after the other two, which might introduce a small delay compared to the other QUIC documents.

The working group has started to take on work on new specifications for extensions and improvements beyond QUIC version 1.

HTTP/3 Usage

In early April 2021, the usage of QUIC and HTTP/3 in the world is measured by a few different companies.

QUIC support

netray.io scans the IPv4 address space weekly and checks how many hosts that speak QUIC. Their latest scan found 2.1 million such hosts.

Arguably, the netray number doesn’t say much. Those two million hosts could be very well used or barely used machines.

HTTP/3 by w3techs

w3techs.com has been in the game of scanning web sites for stats purposes for a long time. They scan the top ten million sites and count how large share that runs/supports what technologies and they also check for HTTP/3. In their data they call the old Google QUIC for just “QUIC” which is confusing but that should be seen as the precursor to HTTP/3.

What stands out to me in this data except that the HTTP/3 usage seems very high: the top one-million sites are claimed to have a higher share of HTTP/3 support (16.4%) than the top one-thousand (11.9%)! That’s the reversed for HTTP/2 and not how stats like this tend to look.

It has been suggested that the growth starting at Feb 2021 might be explained by Cloudflare’s enabling of HTTP/3 for users also in their free plan.

HTTP/3 by Cloudflare

On radar.cloudflare.com we can see Cloudflare’s view of a lot of Internet and protocol trends over the world.

The last 30 days according to radar.cloudflare.com

This HTTP/3 number is significantly lower than w3techs’. Presumably because of the differences in how they measure.

Clients

The browsers

All the major browsers have HTTP/3 implementations and most of them allow you to manually enable it if it isn’t already done so. Chrome and Edge have it enabled by default and Firefox will so very soon. The caniuse.com site shows it like this (updated on April 4):

(Earlier versions of this blog post showed the previous and inaccurate data from caniuse.com. Not anymore.)

curl

curl supports HTTP/3 since a while back, but you need to explicitly enable it at build-time. It needs to use third party libraries for the HTTP/3 layer and it needs a QUIC capable TLS library. The QUIC/h3 libraries are still beta versions. See below for the TLS library situation.

curl’s HTTP/3 support is not even complete. There are still unsupported areas and it’s not considered stable yet.

Other clients

Facebook has previously talked about how they use HTTP/3 in their app, and presumably others do as well. There are of course also other implementations available.

TLS libraries

curl supports 14 different TLS libraries at this time. Two of them have QUIC support landed: BoringSSL and GnuTLS. And a third would be the quictls OpenSSL fork. (There are also a few other smaller TLS libraries that support QUIC.)

OpenSSL

The by far most popular TLS library to use with curl, OpenSSL, has postponed their QUIC work:

“It is our expectation that once the 3.0 release is done, QUIC will become a significant focus of our effort.”

At the same time they have delayed the OpenSSL 3.0 release significantly. Their release schedule page still today speaks of a planned release of 3.0.0 in “early Q4 2020”. That plan expects a few months from the beta to final release and we have not yet seen a beta release, only alphas.

Realistically, this makes QUIC in OpenSSL many months off until it can appear even in a first alpha. Maybe even 2022 material?

BoringSSL

The Google powered OpenSSL fork BoringSSL has supported QUIC for a long time and provides the OpenSSL API, but they don’t do releases and mostly focus on getting a library done for Google. People outside the company are generally reluctant to use and depend on this library for those reasons.

The quiche QUIC/h3 library from Cloudflare uses BoringSSL and curl can be built to use quiche (as well as BoringSSL).

quictls

Microsoft and Akamai have made a fork of OpenSSL available that is based on OpenSSL 1.1.1 and has the QUIC pull-request applied in order to offer a QUIC capable OpenSSL flavor to the world before the official OpenSSL gets their act together. This fork is called quictls. This should be compatible with OpenSSL in all other regards and provide QUIC with an API that is similar to BoringSSL’s.

The ngtcp2 QUIC library uses quictls. curl can be built to use ngtcp2 as well as with quictls,

Is HTTP/3 faster?

I realize I can’t blog about this topic without at least touching this question. The main reason for adding support for HTTP/3 on your site is probably that it makes it faster for users, so does it?

According to cloudflare’s tests, it does, but the difference is not huge.

We’ve seen other numbers say h3 is faster shown before but it’s hard to find up-to-date performance measurements published for the current version of HTTP/3 vs HTTP/2 in real world scenarios. Partly of course because people have hesitated to compare before there are proper implementations to compare with, and not just development versions not really made and tweaked to perform optimally.

I think there are reasons to expect h3 to be faster in several situations, but for people with high bandwidth low latency connections in the western world, maybe the difference won’t be noticeable?

Future

I’ve previously shown the slide below to illustrate what needs to be done for curl to ship with HTTP/3 support enabled in distros and “widely” and I think the same works for a lot of other projects and clients who don’t control their TLS implementation and don’t write their own QUIC/h3 layer code.

This house of cards of h3 is slowly getting some stable components, but there are still too many moving parts for most of us to ship.

I assume that the rest of the browsers will also enable HTTP/3 by default soon, and the specs will be released not too long into the future. That will make HTTP/3 traffic on the web increase significantly.

The QUIC and h3 libraries will ship their first non-beta versions once the specs are out.

The TLS library situation will continue to hamper wider adoption among non-browsers and smaller players.

The big players already deploy HTTP/3.

Updates

I’ve updated this post after the initial publication, and the biggest corrections are in the Chrome/Edge details. Thanks to immediate feedback from Eric Lawrence. Remaining errors are still all mine! Thanks also to Barry Pollard who filed the PR to update the previously flawed caniuse.com data.

everything.curl.dev

The online version of the curl book “everything curl” has been moved to the address shown in the title:

everything.curl.dev

This, after I did a very unscientific and highly self-selective poll on twitter on January 18 2020

The old name (which you can see what the least selected in the poll) will now redirect to the new host name and so will everything.curl.se .

curl.dev

I am the owner of this domain since a little while back but we haven’t yet figured out what to do with the domain – this is the first use of curl.dev for real content.

If you have ideas of how we can improve curl’s web presence with this domain, please let me know! I do not want to move the official curl web site again from its new home at curl.se, that’s not what I would call a productive idea.

The curl web infrastructure

The purpose of the curl web site is to inform the world about what curl and libcurl are and provide as much information as possible about the project, the products and everything related to that.

The web site has existed in some form for as long as the project has, but it has of course developed and changed over time.

Independent

The curl project is completely independent and stands free from influence from any parent or umbrella organization or company. It is not even a legal entity, just a bunch of random people cooperating over the Internet. And a bunch of awesome sponsors to help us.

This means that we have no one that provides the infrastructure or marketing for us. We need to provide, run and care for our own servers and anything else we think we should offer our users.

I still do a lot of the work in curl and the curl web site and I work full time on curl, for wolfSSL. This might of course “taint” my opinions and views on matters, but doesn’t imply ownership or control. I’m sure we’re all colored by where we work and where we are in our lives right now.

Contents

Most of the web site is static content: generated HTML pages. They are served super-fast and very lightweight by any web server software.

The web site source exists in the curl-www repository (hosted on GitHub) and the web site syncs itself with the latest repository changes several times per hour. The parts of the site that aren’t static are mostly consisting of smaller scripts that run either on demand at the time of a request or on an interval in a cronjob in the background. That is part of the reason why pushing an update to the web site’s repository can take a little while until it shows up on the live site.

There’s a deliberate effort at not duplicating information so a lot of the web pages you can find on the web site are files that are converted and “HTMLified” from the source code git repository.

“Design”

Some people say the curl web site is “retro”, others that it is plain ugly. My main focus with the site is to provide and offer all the info, and have it be accurate and accessible. The look and the design of the web site is a constant battle, as nobody who’s involved in editing or polishing the web site is really interested in or particularly good at design, looks or UX. I personally have done most of the editing of it, including CSS etc and I can tell you that I’m not good at it and I don’t enjoy it. I do it because I feel I have to.

I get occasional offers to “redesign” the web site, but the general problem is that those offers almost always involve rebuilding the entire thing using some current web framework, not just fixing the looks, layout or hierarchy. By replacing everything like that we’d get a lot of problems to get the existing information in there – and again, the information is more important than the looks.

The curl logo is designed by a proper designer however (Adrian Burcea).

If you want to help out designing and improving the web site, you’d be most welcome!

Who

I’ve already touched on it: the web site is mostly available in git so “anyone” can submit issues and pull-requests to improve it, and we are around twenty persons who have push rights that can then make a change on the live site. In reality of course we are not that many who work on the site any ordinary month, or even year. During the last twelve month period, 10 persons authored commits in the web repository and I did 90% of those.

How

Technically, we build the site with traditional makefiles and we generate the web contents mostly by preprocessing files using a C-like preprocessor called fcpp. This is an old and rather crude setup that we’ve used for over twenty years but it’s functional and it allows us to have a mostly static web site that is also fairly easy to build locally so that we can work out and check improvements before we push them to the git repository and then out to the world.

The web site is of course only available over HTTPS.

Hosting

The curl web site is hosted on an origin VPS server in Sweden. The machine is maintained by primarily by me and is paid for by Haxx. The exact hosting is not terribly important because users don’t really interact with our server directly… (Also, as they’re not sponsors we’re just ordinary customers so I won’t mention their name here.)

CDN

A few years ago we experienced repeated server outages simply because our own infrastructure did not handle the load very well, and in particular not the traffic spikes that could occur when I would post a blog post that would suddenly reach a wide audience.

Enter Fastly. Now, when you go to curl.se (or daniel.haxx.se) you don’t actually reach the origin server we admin, you will instead reach one of Fastly’s servers that are distributed across the world. They then fetch the web contents from our origin, cache it on their edge servers and send it to you when you browse the site. This way, your client speaks to a server that is likely (much) closer to you than the origin server is and you’ll get the content faster and experience a “snappier” web site. And our server only gets a tiny fraction of the load.

Technically, this is achieved by the name curl.se resolving to a number of IP addresses that are anycasted. Right now, that’s 4 IPv4 addresses and 4 IPv6 addresses.

The fact that the CDN servers cache content “a while” is another explanation to why updated contents take a little while to “take effect” for all visitors.

DNS

When we just recently switched the site over to curl.se, we also adjusted how we handle DNS.

I run our own main DNS server where I control and admin the zone and the contents of it. We then have four secondary servers to help us really up our reliability. Out of those four secondaries, three are sponsored by Kirei and are anycasted. They should be both fast and reliable for most of the world.

With the help of fabulous friends like Fastly and Kirei, we hope that the curl web site and services shall remain stable and available.

DNS enthusiasts have remarked that we don’t do DNSSEC or registry-lock on the curl.se domain. I think we have reason to consider and possibly remedy that going forward.

Traffic

The curl web site is just the home of our little open source project. Most users out there in the world who run and use curl or libcurl will not download it from us. Most curl users get their software installation from their Linux distribution or operating system provider. The git repository and all issues and pull-requests are done on GitHub.

Relevant here is that we have no logging and we run no ads or any analytics. We do this for maximum user privacy and partly because of laziness, since handling logging from the CDN system is work. Therefore, I only have aggregated statistics.

In this autumn of 2020, over a normal 30 day period, the web site serves almost 11 TB of data to 360 million HTTP requests. The traffic volume is up from 3.5 TB the same time last year. 11 terabytes per 30 days equals about 4 megabytes per second on average.

Without logs we cannot know what people are downloading – but we can guess! We know that the CA cert bundle is popular and we also know that in today’s world of containers and CI systems, a lot of things out there will download the same packages repeatedly. Otherwise the web site is mostly consisting of text and very small images.

One interesting specific pattern on the server load that’s been going on for months: every morning at 05:30 UTC, the site gets over 50,000 requests within that single minute, during which 10 gigabytes of data is downloaded. The clients are distributed world wide as I see the same pattern on access points all over. The minute before and the minute after, the average traffic rate remains at 200MB/minute. It makes for a fun graph:

An eight hour zoomed in window of bytes transferred from the web site. UTC times.

Our servers suffer somewhat from being the target of weird clients like qqgamehall that continuously “hammer” the site with requests at a high frequency many months after we started always returning error to them. An effect they have is that they make the admin dashboard to constantly show a very high error rate.

Software

The origin server runs Debian Linux and Apache httpd. It has a reverse proxy based on nginx. The DNS server is bind. The entire web site is built with free and open source. Primarily: fcpp, make, roffit, perl, curl, hypermail and enscript.

If you curl the curl site, you can see in response headers that Fastly uses Varnish.

HSTS your curl

HTTP Strict Transport Security (HSTS) is a standard HTTP response header for sites to tell the client that for a specified period of time into the future, that host is not to be accessed with plain HTTP but only using HTTPS. Documented in RFC 6797 from 2012.

The idea is of course to reduce the risk for man-in-the-middle attacks when the server resources might be accessible via both HTTP and HTTPS, perhaps due to legacy or just as an upgrade path. Every access to the HTTP version is then a risk that you get back tampered content.

Browsers preload

These headers have been supported by the popular browsers for years already, and they also have a system setup for preloading a set of sites. Sites that exist in their preload list then never get accessed over HTTP since they know of their HSTS state already when the browser is fired up for the first time.

The entire .dev top-level domain is even in that preload list so you can in fact never access a web site on that top-level domain over HTTP with the major browsers.

With the curl tool

Starting in curl 7.74.0, curl has experimental support for HSTS. Experimental means it isn’t enabled by default and we discourage use of it in production. (Scheduled to be released in December 2020.)

You instruct curl to understand HSTS and to load/save a cache with HSTS information using --hsts <filename>. The HSTS cache saved into that file is then updated on exit and if you do repeated invokes with the same cache file, it will effectively avoid clear text HTTP accesses for as long as the HSTS headers tell it.

I envision that users will simply use a small hsts cache file for specific use cases rather than anyone ever really want to have or use a “complete” preload list of domains such as the one the browsers use, as that’s a huge list of sites and for most use cases just completely unnecessary to load and handle.

With libcurl

Possibly, this feature is more useful and appreciated by applications that use libcurl for HTTP(S) transfers. With libcurl the application can set a file name to use for loading and saving the cache but it also gets some added options for more flexibility and powers. Here’s a quick overview:

CURLOPT_HSTS – lets you set a file name to read/write the HSTS cache from/to.

CURLOPT_HSTS_CTRL – enable HSTS functionality for this transfer

CURLOPT_HSTSREADFUNCTION – this callback gets called by libcurl when it is about to start a transfer and lets the application preload HSTS entries – as if they had been read over the wire and been added to the cache.

CURLOPT_HSTSWRITEFUNCTION – this callback gets called repeatedly when libcurl flushes its in-memory cache and allows the application to save the cache somewhere and similar things.

Feedback?

I trust you understand that I’m very very keen on getting feedback on how this works, on the API and your use cases. Both negative and positive. Whatever your thoughts are really!

A server transition

The main physical server (we call it giant) we’ve been using at Haxx for a very long time to host sites and services for 20+ domains and even more mailing lists. The machine – a physical one – has been colocated in an ISP server room for over a decade and has served us very well. It has started to show its age.

Some of the more known sites and services it hosts are perhaps curl, c-ares, libssh2 and this blog (my entire daniel.haxx.se site). Some of these services are however primarily accessed via fronting CDN servers.

giant is a physical Dell PowerEdge 1850 server from 2005, which has undergone upgrades of CPU, disks and memory through the years.

giant featured an Intel X3440 Xeon CPU at 2.53GHz with 8GB of ram when decommissioned.

New host

The new host is of course entirely virtual and we’ve finally taken the step into the modern world of VPSes. The new machine is hosted by the same provider as before but as an entirely new instance.

We’ve upgraded the OS, all packages and we’ve remodeled how we run the web services and all our jobs and services from before have been moved into this new fresh server in an attempt to leave some of the worst legacies behind.

The former server will not be used anymore and will be powered down and sent for recycling.

Glitches in this new world

We’ve tried really hard to make this transition transparent and ideally not many users will notice anything or have a reason to bother about this, but of course we also realize that we probably have not managed this to 100% perfection. If you detect something on any of the services we run that used to work or exist but isn’t anymore, do let us know so that become aware of it and can work on a fix!

This site (daniel.haxx.se) already moved weeks ago and nobody noticed. The curl site changed on October 23 and are much more likely to get glitches because of all the many more scripts and automatic things setup for it. Both sites are served via Fastly so ordinary users will not detect or spot that there’s a new host in the back end.

A QQGameHall storm

Mar 31 2020, 11:13:38: I get a message from Frank in the #curl IRC channel over on Freenode. I’m always “hanging out” on IRC and Frank is a long time friend and fellow frequent IRCer in that channel. This time, Frank informs me that the curl web site is acting up:

“I’m getting 403s for some mailing list archive pages. They go away when I reload”

That’s weird and unexpected. An important detail here is that the curl web site is “CDNed” by Fastly. This means that every visitor of the web site is actually going to one of Fastly’s servers and in most cases they get cached content from those servers, and only infrequently do these servers come back to my “origin” server and ask for an updated file to send out to a web site visitor.

A 403 error for a valid page is not a good thing. I started checking out some of my logs – which then only are for the origin as I don’t do any logging at all at CDN level (more about that later) – and I could verify the 403 errors. So they’re in my log meaning it isn’t caused by (a misconfiguration of) the CDN. Why would a perfectly legitimate URL suddenly return 403 to have it go away again after a reload?

Why does he get a 403?

I took a look at Fastly’s management web interface and I spotted that the curl web site was sending out data at an unusual high speed at the moment. An average speed of around 50mbps, while we typically average at below 20. Hm… something is going on.

While I continued to look for the answers to these things I noted that my logs were growing really rapidly. There were POSTs being sent to the same single URL at a high frequency (10-20 reqs/second) and each of those would get some 225Kbytes of data returned. And they all used the same User-agent: QQGameHall. It seems this started within the last 24 hours or so. They’re POSTs so Fastly basically always pass them through to my server.

Before I could figure out Franks’s 403s, I decided to slow down this madness by temporarily forbidding this user-agent access so that the bot or program or whatever would notice it starts to fail, and it would of course then stop bombarding the site.

Deny

Ok, a quick deny of the user-agent made my server start responding with 403s to all those requests and instead of a 225K response it now sent back 465 bytes per request. The average bandwidth on the site immediately dropped down to below 20Mbps again. Back to looking for Frank’s 403-problem

First the 403s seen due to the ratelimiting, then I removed the ratelmiting and finally I added a block of the user-agent. Screenshotted error rates from Fastly’s admin interface. This is errors per minute.

The answer was pretty simple and I didn’t have to search a lot. The clues existed in the error logs and it turned out we had “mod_evasive” enabled since another heavy bot load “attack” a while back. It is a module for “rate limiting” incoming requests and since a lot of requests to our server now comes from Fastly’s limited set of IP addresses and we had this crazy QQ thing hitting us, my server would return a 403 every now and then when it considered the rate too high.

I whitelisted Fastly’s requests and Frank’s 403 problems were solved.

Deny a level up

The bot traffic showed no sign of slowing down. Easily 20 requests per second, to the same URL and they all get an error back and obviously they don’t care. I decided to up my game a little so with help, I moved my blocking of this service to Fastly. I now block their user-agent already there so the traffic doesn’t ever reach my server. Phew, my server was finally back to its regular calm state. They way it should be.

It doesn’t stop there. Here’s a follow-up graph I just grabbed, a little over a week since I started the blocking. 16.5 million blocked requests (and counting). This graph here shows number of requests/hour on the Y axis, peeking at almost 190k; around 50 requests/second. The load is of course not actually a problem, just a nuisance now. QQGameHall keeps on going.

Errors per hour over the period of several days.

QQGameHall

What we know about this.

Friends on Twitter and googling for this name informs us that this is a “game launcher” done by Tencent. I’ve tried to contact them via Twitter (as I have no means of contacting them otherwise that seems even remotely likely to work).

I have not checked what these user-agent POSTs, because I didn’t log that. I suspect it was just a zero byte POST.

The URL they post to is the CA cert bundle file with provide on the curl CA extract web page. The one we convert from the Mozilla version into a PEM for users of the world to enjoy. (Someone seems to enjoy this maybe just a little too much.)

The user-agents seemed to come (mostly) from China which seems to add up. Also, the look of the graph when it goes up and down could indicate an eastern time zone.

This program uses libcurl. Harry in the #curl channel found files in Virus Total and had a look. It is, I think, therefore highly likely that this “storm” is caused by an application using curl!

My theory: this is some sort of service that was deployed, or an upgrade shipped, that wants to get an updated CA store and they get that from our site with this request. Either they get it far too often or maybe there are just a very large amount them or similar. I cannot understand why they issue a POST though. If they would just have done a GET I would never have noticed and they would’ve fetched perfectly fine cached versions from the CDN…

Feel free to speculate further!

Logging, privacy, analytics

I don’t have any logging of the CDN traffic to the curl site. Primarily because I haven’t had to, but also because I appreciate the privacy gain for our users and finally because handling logs at this volume pretty much requires a separate service and they all seem to be fairly pricey – for something I really don’t want. So therefore I don’t see the source IP addresses these things. (But yes, I can ask Fastly to check and tell me if I really really wanted to know.)

Also: I don’t run any analytics (Google or otherwise) on the site, primarily for privacy reasons. So that won’t give me that data or other clues either.

Update: it has been proposed I could see the IP address in the X-Forwarded-For: headers and it seems accurate. Of course I didn’t log that header during this period but I will consider starting doing it for better control and info in the future.

Update 2: As of May 18 2020, this flood has not diminished. Logs show that we still block about 5 million requests/day from this service, peaking at over 100 requests/minute.

Credits

Top image by Elias Sch. from Pixabay

HTTP/3 for everyone

FOSDEM 2020 is over for this time and I had an awesome time in Brussels once again.

Stickers

I brought a huge collection of stickers this year and I kept going back to the wolfSSL stand to refill the stash and it kept being emptied almost as fast. Hundreds of curl stickers were given away! The photo on the right shows my “sticker bag” as it looked before I left Sweden.

Lesson for next year: bring a larger amount of stickers! If you missed out on curl stickers, get in touch and I’ll do my best to satisfy your needs.

The talk

“HTTP/3 for everyone” was my single talk this FOSDEM. Just two days before the talk, I landed updated commits in curl’s git master branch for doing HTTP/3 up-to-date with the latest draft (-25). Very timely and I got to update the slide mentioning this.

As I talked HTTP/3 already last year in the Mozilla devroom, I also made sure to go through the slides I used then to compare and make sure I wouldn’t do too much of the same talk. But lots of things have changed and most of the content is updated and different this time around. Last year, literally hundreds of people were lining up outside wanting to get into room when the doors were closed. This year, I talked in the room Janson, which features 1415 seats. The biggest one on campus. It was pack full!

It is kind of an adrenaline rush to stand in front of such a wall of people. At one time in my talk I paused for a brief moment and then I felt I could almost hear the complete silence when a huge amount of attentive faces captured what I had to say.

The audience, photographed by Sidsel Jensen who had to sit in the stairs…
Photo by Mirza Krak
Photo by Wolfgang Gassler

I got a lot of positive feedback on the presentation. I also thought that my decision to not even try to take question in the big room was a correct and I ended up talking and discussing details behind the scene for a good while after my talk was done. Really fun!

The video

The video is also available from the FOSDEM site in webm and mp4 formats.

The slides

If you want the slides only, run over to slideshare and view them.

Summing up My 2019

2019 is special in my heart. 2019 was different than many other years to me in several ways. It was a great year! This is what 2019 was to me.

curl and wolfSSL

I quit Mozilla last year and in the beginning of the year I could announce that I joined wolfSSL. For the first time in my life I could actually work with curl on my day job. As the project turned 21 I had spent somewhere in the neighborhood of 15,000 unpaid spare time hours on it and now I could finally do it “for real”. It’s huge.

Still working from home of course. My commute is still decent.

HTTP/3

Just in November 2018 the name HTTP/3 was set and this year has been all about getting it ready. I was proud to land and promote HTTP/3 in curl just before the first browser (Chrome) announced their support. The standard is still in progress and we hope to see it ship not too long into next year.

curl

Focusing on curl full time allows a different kind of focus. I’ve landed more commits in curl during 2019 than any other year going back all the way to 2005. We also reached 25,000 commits and 3,000 forks on github.

We’ve added HTTP/3, alt-svc, parallel transfers in the curl tool, tiny-curl, fixed hundreds of bugs and much, much more. Ten days before the end of the year, I’ve authored 57% (over 700) of all the commits done in curl during 2019.

We ran our curl up conference in Prague and it was awesome.

We also (re)started our own curl Bug Bounty in 2019 together with Hackerone and paid over 1000 USD in rewards through-out the year. It was so successful we’re determined to raise the amounts significantly going into 2020.

Public speaking

I’ve done 28 talks in six countries. A crazy amount in front of a lot of people.

In media

Dagens Nyheter published this awesome article on me. I’m now shown on the internetmuseum. I was interviewed and highlighted in Bloomberg Businessweek’s “Open Source Code Will Survive the Apocalypse in an Arctic Cave” and Owen William’s Medium post The Internet Relies on People Working for Free.

When Github had their Github Universe event in November and talked about their new sponsors program on stage (which I am part of, you can sponsor me) this huge quote of mine was shown on the big screen.

Maybe not media, but in no less than two Mr Robot episodes we could see curl commands in a TV show!

Podcasts

I’ve participated in three podcast episodes this year, all in Swedish. Kompilator episode 5 and episode 8, and Kodsnack episode 331.

Live-streamed

I’ve toyed with live-streamed programming and debugging sessions. That’s been a lot of fun and I hope to continue doing them on and off going forward as well. They also made me consider and get started on my libcurl video tutorial series. We’ll see where that will end…

2020?

I figure it can become another fun year too!

Internetmuseum

The Internet Museum translated to Swedish becomes “internetmuseum“. It is a digital, online-only, museum that collects Internet- and Web related historical information, especially focused on the Swedish angle to all of this. It collects stories from people who did the things. The pioneers, the ground breakers, the leaders, the early visionaries. Most of their documentation is done in the form of video interviews.

I was approached and asked to be part of this – as an Internet Pioneer. Me? Internet Pioneer, really?

Internetmuseum’s page about me.

I’m humbled and honored to be considered and I certainly had a lot of fun doing this interview. To all my friends not (yet) fluent in Swedish: here’s your grand opportunity to practice, because this is done entirely in this language of curl founders and muppet chefs.

Photo from Internetmusuem

Back in the morning of October 18th 2019, two guys showed up as planned at my door and I let them in. One of my guests was a photographer who set up his gear in my living room for the interview, and then me and and guest number two, interviewer J├Ârgen, sat down and talked for almost an hour straight while being recorded.

The result can be seen here below.

The Science museum was first

This is in fact the second Swedish museum to feature me.

I have already been honored with a display about me, at the Tekniska Museet in Stockholm, the “Science museum” which has an exhibition about past Polhem Prize award winners.

Information displayed about me at the Swedish Science museum in Stockholm. I have a private copy of the cardboard posters.

(Top image by just-pics from Pixabay)