Tag Archives: Fastly

A QQGameHall storm

Mar 31 2020, 11:13:38: I get a message from Frank in the #curl IRC channel over on Freenode. I’m always “hanging out” on IRC and Frank is a long time friend and fellow frequent IRCer in that channel. This time, Frank informs me that the curl web site is acting up:

“I’m getting 403s for some mailing list archive pages. They go away when I reload”

That’s weird and unexpected. An important detail here is that the curl web site is “CDNed” by Fastly. This means that every visitor of the web site is actually going to one of Fastly’s servers and in most cases they get cached content from those servers, and only infrequently do these servers come back to my “origin” server and ask for an updated file to send out to a web site visitor.

A 403 error for a valid page is not a good thing. I started checking out some of my logs – which then only are for the origin as I don’t do any logging at all at CDN level (more about that later) – and I could verify the 403 errors. So they’re in my log meaning it isn’t caused by (a misconfiguration of) the CDN. Why would a perfectly legitimate URL suddenly return 403 to have it go away again after a reload?

Why does he get a 403?

I took a look at Fastly’s management web interface and I spotted that the curl web site was sending out data at an unusual high speed at the moment. An average speed of around 50mbps, while we typically average at below 20. Hm… something is going on.

While I continued to look for the answers to these things I noted that my logs were growing really rapidly. There were POSTs being sent to the same single URL at a high frequency (10-20 reqs/second) and each of those would get some 225Kbytes of data returned. And they all used the same User-agent: QQGameHall. It seems this started within the last 24 hours or so. They’re POSTs so Fastly basically always pass them through to my server.

Before I could figure out Franks’s 403s, I decided to slow down this madness by temporarily forbidding this user-agent access so that the bot or program or whatever would notice it starts to fail, and it would of course then stop bombarding the site.

Deny

Ok, a quick deny of the user-agent made my server start responding with 403s to all those requests and instead of a 225K response it now sent back 465 bytes per request. The average bandwidth on the site immediately dropped down to below 20Mbps again. Back to looking for Frank’s 403-problem

First the 403s seen due to the ratelimiting, then I removed the ratelmiting and finally I added a block of the user-agent. Screenshotted error rates from Fastly’s admin interface. This is errors per minute.

The answer was pretty simple and I didn’t have to search a lot. The clues existed in the error logs and it turned out we had “mod_evasive” enabled since another heavy bot load “attack” a while back. It is a module for “rate limiting” incoming requests and since a lot of requests to our server now comes from Fastly’s limited set of IP addresses and we had this crazy QQ thing hitting us, my server would return a 403 every now and then when it considered the rate too high.

I whitelisted Fastly’s requests and Frank’s 403 problems were solved.

Deny a level up

The bot traffic showed no sign of slowing down. Easily 20 requests per second, to the same URL and they all get an error back and obviously they don’t care. I decided to up my game a little so with help, I moved my blocking of this service to Fastly. I now block their user-agent already there so the traffic doesn’t ever reach my server. Phew, my server was finally back to its regular calm state. They way it should be.

It doesn’t stop there. Here’s a follow-up graph I just grabbed, a little over a week since I started the blocking. 16.5 million blocked requests (and counting). This graph here shows number of requests/hour on the Y axis, peeking at almost 190k; around 50 requests/second. The load is of course not actually a problem, just a nuisance now. QQGameHall keeps on going.

Errors per hour over the period of several days.

QQGameHall

What we know about this.

Friends on Twitter and googling for this name informs us that this is a “game launcher” done by Tencent. I’ve tried to contact them via Twitter (as I have no means of contacting them otherwise that seems even remotely likely to work).

I have not checked what these user-agent POSTs, because I didn’t log that. I suspect it was just a zero byte POST.

The URL they post to is the CA cert bundle file with provide on the curl CA extract web page. The one we convert from the Mozilla version into a PEM for users of the world to enjoy. (Someone seems to enjoy this maybe just a little too much.)

The user-agents seemed to come (mostly) from China which seems to add up. Also, the look of the graph when it goes up and down could indicate an eastern time zone.

This program uses libcurl. Harry in the #curl channel found files in Virus Total and had a look. It is, I think, therefore highly likely that this “storm” is caused by an application using curl!

My theory: this is some sort of service that was deployed, or an upgrade shipped, that wants to get an updated CA store and they get that from our site with this request. Either they get it far too often or maybe there are just a very large amount them or similar. I cannot understand why they issue a POST though. If they would just have done a GET I would never have noticed and they would’ve fetched perfectly fine cached versions from the CDN…

Feel free to speculate further!

Logging, privacy, analytics

I don’t have any logging of the CDN traffic to the curl site. Primarily because I haven’t had to, but also because I appreciate the privacy gain for our users and finally because handling logs at this volume pretty much requires a separate service and they all seem to be fairly pricey – for something I really don’t want. So therefore I don’t see the source IP addresses these things. (But yes, I can ask Fastly to check and tell me if I really really wanted to know.)

Also: I don’t run any analytics (Google or otherwise) on the site, primarily for privacy reasons. So that won’t give me that data or other clues either.

Update: it has been proposed I could see the IP address in the X-Forwarded-For: headers and it seems accurate. Of course I didn’t log that header during this period but I will consider starting doing it for better control and info in the future.

Update 2: As of May 18 2020, this flood has not diminished. Logs show that we still block about 5 million requests/day from this service, peaking at over 100 requests/minute.

Credits

Top image by Elias Sch. from Pixabay

The curl year 2017

I’m about to take an extended vacation for the rest of the year and into the beginning of the next, so I decided I’d sum up the year from a curl angle already now, a few weeks early. (So some numbers will grow a bit more after this post.)

2017

So what did we do this year in the project, how did curl change?

The first curl release of the year was version 7.53.0 and the last one was 7.57.0. In the separate blog posts on 7.55.0, 7.56.0 and 7.57.0 you’ll note that we kept up adding new goodies and useful features. We produced a total of 9 releases containing 683 bug fixes. We announced twelve security problems. (Down from 24 last year.)

At least 125 different authors wrote code that was merged into curl this year, in the 1500 commits that were made. We never had this many different authors during a single year before in the project’s entire life time! (The 114 authors during 2016 was the previous all-time high.)

We added more than 160 new names to the THANKS document for their help in improving curl. The total amount of contributors is now over 1660.

This year we truly started to use travis for CI builds and grew from a mere two builds per commit and PR up to nineteen (with additional ones run on appveyor and elsewhere). The current build set is a very good verification that that most things still compile and work after a PR is merged. (see also the testing curl article).

Mozilla announced that they too will use colon-slash-slash in their logo. Of course we all know who had it that in their logo first… =)

 

In March 2017, we had our first ever curl get-together as we arranged curl up 2017 a weekend in Nuremberg, Germany. It was very inspiring and meeting parts of the team in real life was truly a blast. This was so good we intend to do it again: curl up 2018 will happen.

curl turned 19 years old in March. In May it surpassed 5,000 stars on github.

Also in May, we moved over the official curl site (and my personal site) to get hosted by Fastly. We were beginning to get problems to handle the bandwidth and load, and in one single step all our worries were graciously taken care of!

We got curl entered into the OSS-fuzz project, and Max Dymond even got a reward from Google for his curl-fuzzing integration work and thanks to that project throwing heaps of junk at libcurl’s APIs we’ve found and fixed many issues.

The source code (for the tool and library only) is now at about 143,378 lines of code. It grew around 7,057 lines during the year. The primary reasons for the code growth were:

  1. the new libssh-powered SSH backend (not yet released)
  2. the new mime API (in 7.56.0) and
  3. the new multi-SSL backend support (also in 7.56.0).

Your maintainer’s view

Oh what an eventful year it has been for me personally.

The first interim meeting for QUIC took place in Japan, and I participated from remote. After all, I’m all set on having curl support QUIC and I’ll keep track of where the protocol is going! I’ve participated in more interim meetings after that, all from remote so far.

I talked curl on the main track at FOSDEM in early February (and about HTTP/2 in the Mozilla devroom). I’ve then followed that up and have also had the pleasure to talk in front of audiences in Stockholm, Budapest, Jönköping and Prague through-out the year.

 

I went to London and “represented curl” in the third edition of the HTTP workshop, where HTTP protocol details were discussed and disassembled, and new plans for the future of HTTP were laid out.

 

In late June I meant to go to San Francisco to a Mozilla “all hands” conference but instead I was denied to board the flight. That event got a crazy amount of attention and I received massive amounts of love from new and old friends. I have not yet tried to enter the US again, but my plan is to try again in 2018…

I wrote and published my h2c tool, meant to help developers convert a set of HTTP headers into a working curl command line.

The single occasion that overshadows all other events and happenings for me this year by far, was without doubt when I was awarded the Polhem Prize and got a gold medal medal from no other than his majesty the King of Sweden himself. For all my work and years spent on curl no less.

Not really curl related, but in November I was also glad to be part of the huge Firefox Quantum release. The biggest Firefox release ever, and one that has been received really well.

I’ve managed to commit over 800 changes to curl through the year, which is 54% of the totals and more commits than I’ve done in curl during a single year since 2005 (in which I did 855 commits). I explain this increase mostly on inspiration from curl up and the prize, but I think it also happened thanks to excellent feedback and motivation brought by my fellow curl hackers.

We’re running towards the end of 2017 with me being the individual who did most commits in curl every single month for the last 28 months.

2018?

More things to come!

A curl delivery network

I’ve run my own public web sites on hardware I’ve administered myself for over twenty years now. I’ve hosted the curl web site myself since it’s inception.

The curl web site at curl.haxx.se has recently been delivering roughly 1.5 terabyte of data to the world per month. The CA bundle we convert to PEM from the Mozilla source code, is alone downloaded more than 100,000 times per day. Occasional blog entries I’ve posted here on my blog have climbed very fast on popular sites such as Hacker news and Reddit, and have resulted in intense visitor storms hitting this same server – sometimes reaching visitor counts above 200,000 “uniques” – most of them within the first few hours of the publication. At times, those visitor spikes have effectively brought the server to its knees.

Yes, my personal web site and the curl web site are both sharing the same physical server. It also hosts more than a dozen other sites and numerous services for our own pleasures and fun, providing services for a handful of different open source projects. So when the server has to cease doing work because it runs out of memory or hits other resource restraints, that causes interruptions all over. Oh yes, and my email doesn’t reach me.

Inconvenient and annoying.

The server

Haxx owns and runs this co-located server that we have a busload of web servers on – for the good of the projects and people that run things on it. This machine’s worst bottle neck is available RAM memory and perhaps I/O performance. Every time the server goes down to a crawl due to network traffic overload we discuss how we should upgrade it. Installing a new machine and transferring over all the sites and services is work. Work that none of us at Haxx are very happy to volunteer to do. So it hasn’t been done yet, and frankly the server handles the daily load just fine and without even a blink. Which is ninety nine point something percent of the time…

Haxx pays for a certain amount of network traffic so as long as we’re below some threshold we remain paying the same monthly fee. We don’t want to increase the traffic by magnitudes as that would cost more.

The specific machine, that sits deep inside a server room in Stockholm Sweden, is a five(?) years old Dell Poweredge E310, Intel Xeon X3440 2.53GHz with 8GB ram, This model is shown on the image at the top.

Alternatives that hasn’t helped

Why not a mirror system? We had a fair amount of curl site mirrors a few years ago, but it never worked well because they were always less reliable than the main site and they often turned stale and out of sync with the master site which eventually just hurt users.

They also trick visitors into bookmark or otherwise go back to the mirror site instead of the real one and there were always the annoying people who couldn’t resist but to fill the mirror with ads and stuff. Plus, they didn’t help much with with the storms to the main site.

Why not a cloud server? Because with the amount of services, servers and various things we do on our server, it would be inconvenient and expensive. But perhaps even more because we started out like this so we have invested time and energy into the infrastructure as it works right now. And I enjoy rowing my own boat!

The CDN

Fastly reached out and graciously offered to help us handle the load. Both on the account of traffic amounts but also to save our machine from struggling this hard the next time I’ll write something that tickles people’s curiosity (or rage) to that level when several thousands of visitors want to read the same article at the same time.

Starting now, the curl.haxx.se and the daniel.haxx.se web sites are fronted by Fastly. It should give web site visitors from all over the world faster response times and it will make the site more reliable and less likely to have problems due to traffic load going forward.

In case you’re not familiar with what a CDN is, a simplified explanation would say it is a globally distributed network of reverse proxy servers deployed in multiple data centers. These CDN servers front the Internet and will to the largest extent possible serve the visitors with the right content directly from their own caches instead of them reaching the actual lowly backend server I run that hosts the original content. Fastly has lots of servers across the globe for this purpose. Users who are a long way away from Sweden will probably be the ones who will notice this change the most, as you may suddenly find haxx.se content much closer (network round-trip wise) than before.

Standards

These new servers will host the sites over HTTPS just like before, and they will require TLS 1.2 and SNI. They will work over IPv6 and support HTTP/2.  Network standard wise, there shouldn’t be any step down – and honestly, I haven’t exactly been on the cutting edge of these technologies myself for these sites in the past.

Editing the site

We will keep editing and maintaining the site like before. It is made up of an old system with templates and include files that generate mostly static web pages. The site is mostly available on github and using that, you can build a local version for development and trying out changes before they land.

Hopefully, this move to Fastly will only make the site faster and more reliable. If you notice any glitches or experience any problems with the site, please let us know!