curl ootw: –append

Previously mentioned command line options of the week.

--append is the long form option, -a is the short. The option has existed since at least May 1998 (present in curl 4.8). I think it is safe to say that if we would’ve created this option just a few years later, we would not have “wasted” a short option letter on it. It is not a very frequently used one.

Append remotely

The append in the option name is a require to the receiver to append to — rather than replace — a destination file. This option only has any effect when uploading using either FTP(S) or SFTP. It is a flag option and you use it together with the --upload option.

When you upload to a remote site with these protocols, the default behavior is to overwrite any file that happens to exist on the server using the name we’re uploading to. If you append this option to the command line, curl will instead instruct the server to append the newly uploaded data to the end of the remote file.

The reason this option is limited to just subset of protocols is of course that they are the only ones for which we can give that instruction to the server.

Example

Append the local file “trailer” to the remote file called “begin”:

curl --append --upload trailer ftp://example.com/path/begin

A QQGameHall storm

Mar 31 2020, 11:13:38: I get a message from Frank in the #curl IRC channel over on Freenode. I’m always “hanging out” on IRC and Frank is a long time friend and fellow frequent IRCer in that channel. This time, Frank informs me that the curl web site is acting up:

“I’m getting 403s for some mailing list archive pages. They go away when I reload”

That’s weird and unexpected. An important detail here is that the curl web site is “CDNed” by Fastly. This means that every visitor of the web site is actually going to one of Fastly’s servers and in most cases they get cached content from those servers, and only infrequently do these servers come back to my “origin” server and ask for an updated file to send out to a web site visitor.

A 403 error for a valid page is not a good thing. I started checking out some of my logs – which then only are for the origin as I don’t do any logging at all at CDN level (more about that later) – and I could verify the 403 errors. So they’re in my log meaning it isn’t caused by (a misconfiguration of) the CDN. Why would a perfectly legitimate URL suddenly return 403 to have it go away again after a reload?

Why does he get a 403?

I took a look at Fastly’s management web interface and I spotted that the curl web site was sending out data at an unusual high speed at the moment. An average speed of around 50mbps, while we typically average at below 20. Hm… something is going on.

While I continued to look for the answers to these things I noted that my logs were growing really rapidly. There were POSTs being sent to the same single URL at a high frequency (10-20 reqs/second) and each of those would get some 225Kbytes of data returned. And they all used the same User-agent: QQGameHall. It seems this started within the last 24 hours or so. They’re POSTs so Fastly basically always pass them through to my server.

Before I could figure out Franks’s 403s, I decided to slow down this madness by temporarily forbidding this user-agent access so that the bot or program or whatever would notice it starts to fail, and it would of course then stop bombarding the site.

Deny

Ok, a quick deny of the user-agent made my server start responding with 403s to all those requests and instead of a 225K response it now sent back 465 bytes per request. The average bandwidth on the site immediately dropped down to below 20Mbps again. Back to looking for Frank’s 403-problem

First the 403s seen due to the ratelimiting, then I removed the ratelmiting and finally I added a block of the user-agent. Screenshotted error rates from Fastly’s admin interface. This is errors per minute.

The answer was pretty simple and I didn’t have to search a lot. The clues existed in the error logs and it turned out we had “mod_evasive” enabled since another heavy bot load “attack” a while back. It is a module for “rate limiting” incoming requests and since a lot of requests to our server now comes from Fastly’s limited set of IP addresses and we had this crazy QQ thing hitting us, my server would return a 403 every now and then when it considered the rate too high.

I whitelisted Fastly’s requests and Frank’s 403 problems were solved.

Deny a level up

The bot traffic showed no sign of slowing down. Easily 20 requests per second, to the same URL and they all get an error back and obviously they don’t care. I decided to up my game a little so with help, I moved my blocking of this service to Fastly. I now block their user-agent already there so the traffic doesn’t ever reach my server. Phew, my server was finally back to its regular calm state. They way it should be.

It doesn’t stop there. Here’s a follow-up graph I just grabbed, a little over a week since I started the blocking. 16.5 million blocked requests (and counting). This graph here shows number of requests/hour on the Y axis, peeking at almost 190k; around 50 requests/second. The load is of course not actually a problem, just a nuisance now. QQGameHall keeps on going.

Errors per hour over the period of several days.

QQGameHall

What we know about this.

Friends on Twitter and googling for this name informs us that this is a “game launcher” done by Tencent. I’ve tried to contact them via Twitter (as I have no means of contacting them otherwise that seems even remotely likely to work).

I have not checked what these user-agent POSTs, because I didn’t log that. I suspect it was just a zero byte POST.

The URL they post to is the CA cert bundle file with provide on the curl CA extract web page. The one we convert from the Mozilla version into a PEM for users of the world to enjoy. (Someone seems to enjoy this maybe just a little too much.)

The user-agents seemed to come (mostly) from China which seems to add up. Also, the look of the graph when it goes up and down could indicate an eastern time zone.

This program uses libcurl. Harry in the #curl channel found files in Virus Total and had a look. It is, I think, therefore highly likely that this “storm” is caused by an application using curl!

My theory: this is some sort of service that was deployed, or an upgrade shipped, that wants to get an updated CA store and they get that from our site with this request. Either they get it far too often or maybe there are just a very large amount them or similar. I cannot understand why they issue a POST though. If they would just have done a GET I would never have noticed and they would’ve fetched perfectly fine cached versions from the CDN…

Feel free to speculate further!

Logging, privacy, analytics

I don’t have any logging of the CDN traffic to the curl site. Primarily because I haven’t had to, but also because I appreciate the privacy gain for our users and finally because handling logs at this volume pretty much requires a separate service and they all seem to be fairly pricey – for something I really don’t want. So therefore I don’t see the source IP addresses these things. (But yes, I can ask Fastly to check and tell me if I really really wanted to know.)

Also: I don’t run any analytics (Google or otherwise) on the site, primarily for privacy reasons. So that won’t give me that data or other clues either.

Update: it has been proposed I could see the IP address in the X-Forwarded-For: headers and it seems accurate. Of course I didn’t log that header during this period but I will consider starting doing it for better control and info in the future.

Update 2: As of May 18 2020, this flood has not diminished. Logs show that we still block about 5 million requests/day from this service, peaking at over 100 requests/minute.

Credits

Top image by Elias Sch. from Pixabay

curl ootw: -v is for verbose

(Previous entries in the curl option of the week series.)

This is one of the original 24 command line options that existed already in the first ever curl release in the spring of 1998. The -v option’s long version is --verbose.

Note that this uses the lowercase ‘v’. The uppercase -V option shows detailed version information.

So use it!

In a blog post series of curl command line options you’d think that this option would basically be unnecessary to include since it seems to basic, so obvious and of course people know of it and use it immediately to understand why curl invokes don’t behave as expected!

Time and time again the first response to users with problems is to please add –verbose to the command line. Many of those times, the problem is then figured out, understood and sorted out without any need of further help.

--verbose should be the first action to try for everyone who runs a curl command that fails unexplainably.

What verbose shows

First: there’s only one verbosity level in curl. There’s normal and there’s verbose. Pure binary; on or off. Adding more -v flags on the same line won’t bring you more details. In fact, adding more won’t change anything at all other than making your command line longer. (And yes, you have my permission to gently taunt anyone you see online who uses more than one -v with curl.)

Verbose mode shows outgoing and incoming headers (or protocol commands/responses) as well as “extra” details that we’ve deemed sensible in the code.

For example you will get to see which IP addresses curl attempts to connect to (that the host name resolved to), it will show details from the server’s TLS certificate and it will tell you what TLS cipher that was negotiated etc.

-v shows details from the protocol engine. Of course you will also see different outputs depending on what protocol that’s being used.

What verbose doesn’t show

This option is meant to help you understand the protocol parts but it doesn’t show you everything that’s going on – for example it doesn’t show you the outgoing protocol data (like the HTTP request body). If -v isn’t enough for you, then the --trace and --trace-ascii options are there for you.

If you are in the rare situation where the trace options aren’t detailed enough, you can go all-in with full SSLKEYLOGFILE mode and inspect curl’s network traffic with Wireshark.

Less verbose?

It’s not exactly “verbose level” but curl does by default for example show a progress meter and some other things. You can silence curl completely by using -s (--silent) or use the recently introduced option --no-progress-meter.

Example

curl -v https://example.com/

Google Open Source Peer Bonus award 2020

I’m honored to – once again – be a recipient of this award Google hands out to open source contributors, annually. I was previously awarded this in 2011.

I don’t get a lot of awards. Getting this token of appreciation feels awesome and I’m humbled and grateful I was not only nominated but also actually selected as recipient. Thank you, Google!

Nine years ago I got 350 USD credits in the Google store and I got my family a set of jackets using them – my kids have grown significantly since then, so to them those black beauties are now just a distant memory, but I still actually wear mine from time to time!

curl beers and curl stickers!

This time, the reward comes with a 250 USD “payout” (that’s the gift mentioned in the mail above), as a real money transfer that can be spent on other things than just Google merchandise!

I’ve decided to accept the reward and the money and I intend to spend it on beer and curl stickers for my friends and fans. As I prefer to view it:

The Google Open Source Beer Bonus.

Thank you Google and thank you Gaspar!

Update: the Google Open Source blog post about it.