On May 7, 2020 I will present common mistakes when using libcurl (and how to fix them) as a webinar over Zoom. The presentation starts at 19:00 Swedish time, meaning 17:00 UTC and 10:00 PDT (US West coast).
libcurl is used in thousands of different applications and devices for client-side Internet transfer and powers a significant part of what flies across the wires of the world a normal day.
Over the years as the lead curl and libcurl developer I’ve answered many questions and I’ve seen every imaginable mistake done. Some of the mistakes seem to happen more frequently and some of the mistake seem easier than others to avoid.
I’m going to go over a list of things that users often get wrong with libcurl, perhaps why they do and of course I will talk about how to fix those errors.
Length
It should be done within 30-40 minutes, plus some additional time for questions at the end.
Audience
You’re interested in Internet transfer, preferably you already know what libcurl is and perhaps you have even written code that uses libcurl. Directly in C or using a binding in another language.
Material
The video and slides will of course be made available as well in case you can’t tune in live.
Sign up
If you sign up to attend, you can join, enjoy the talk and of course ask me whatever is unclear or you think needs clarification around this topic. See you next week!
This option only has a long version and it is --remote-name-all.
Shipped curl 7.19.0 for the first time – September 1 2008.
History of curl output options
I’m a great fan of the Unix philosophy for command line tools so for me there was never any deeper thoughts on what curl should do with the contents of the URL it gets already from the beginning: it should send it to stdout by default. Exactly like the command line tool cat does for files.
Of course I also realized that not everyone likes that so we provided the option to save the contents to a given file. Output to a named file. We selected -o for that option – if I remember correctly I think I picked it up from some other tools that used this letter for the same purpose: instead of sending the response body to stdout, save it to this file name.
Okay but when you selected “save as” in a browser, you don’t actually have to select the full name yourself. It’ll propose the default name to use based on the URL you’re viewing, probably because in many cases that makes sense for the user and is a convenient and quick way to get a sensible file name to save the content as.
It wasn’t hard to come with the idea that curl could offer something similar. Since the URL you give to curl has a file name part when you want to get a file name, having a dedicated option pick the name from the rightmost part of the URL for the local file name was easy. As different output option that -o,it felt natural to pick the uppercase O option for this slightly different save-the-output option: -O.
Enter more than URL
curl sends everything to stdout, unless to tell it to direct it somewhere else. Then (this is still before the year 2000, so very early days) we added support for multiple URLs on the command line and what would the command line options mean then?
The default would still be to send data to stdout and since the -o and -O options were about how to save a single URL we simply decided that they do exactly that: they instruct curl how to send a single URL. If you provide multiple URLs to curl, you subsequently need to provide multiple output flags. Easy!
It has the interesting effect that if you download three files from example.com and you want them all named according to their rightmost part from the URL, you need to provide multiple -O options:
Back in 2008 at some point, I think I took some critique about this maybe a little too hard and decided that if certain users really wanted to download multiple URLs to local file names in an easier manner, that perhaps other command line internet download tools do, I would provide an option that lets them to this!
--remote-name-all was born.
Specifying this option will make -O the default behavior for URLs on the command line! Now you can provide as many URLs as you like and you don’t need to provide an extra flag for each URL.
Get five different URLs on the command line and save them all locally using the file part form the URLs:
Then if you don’t want that behavior you need to provide additional -o flags…
.curlrc perhaps?
I think the primary idea was that users who really want -O by default like this would put --remote-name-all in their .curlrc files. I don’t this ever really materialized. I believe this remote name all option is one of the more obscure and least used options in curl’s huge selection of options.
On April 22nd 2019, we announced our current, this, incarnation of the curl bug bounty. In association with Hackerone we now run the program ourselves, primarily funded by gracious sponsors. Time to take a closer look at how the first year of bug bounty has been!
Number of reports
We’ve received a total of 112 reports during this period.
On average, we respond with a first comment to reports within the first hour and we triage them on average within the first day.
Out of the 112 reports, 6 were found actual security problems.
Bounties
All confirmed security problems were rewarded a bounty. We started out a bit careful with the amounts but we are determined to raise them as we go along and we’ve seen that there’s not really a tsunami coming.
We’ve handed out 1,400 USD so far, which makes it an average of 233 USD per confirmed report. The top earner got two reports rewarded and received 450 USD from us. So far…
But again: our ambition is to significantly raise these amounts going forward.
Trends
The graph above speaks clearly: lots of people submitted reports when we opened up and the submission frequency has dropped significantly over the year.
A vast majority of the 112 reports we’ve received have were more or less rubbish and/or more or less automated reports. A large amount of users have reported that our wiki can be edited by anyone (which I consider to be a fundamental feature of a wiki) or other things that we’ve expressly said is not covered by the program: specific details about our web hosting, email setup or DNS config.
A rough estimate says that around 80% of the reports were quickly dismissed as “out of policy” – ie they reported stuff that we documented is not covered by the bug bounty (“Sirs, we can figure out what http server that’s running” etc). The curl bug bounty covers the products curl and libcurl, thus their source code and related specifics.
Bounty funds
curl has no ties to any organization. curl is not owned by any corporation. curl is developed by individuals. All the funds we have in the project are graciously provided to us by sponsors and donors. The curl funds are handled by the awesome Open Collective.
Security is of utmost importance to us. It trumps all other areas, goals and tasks. We aim to produce solid and secure products for the world and we act as swiftly and firmly as we can on all reported security problems.
Security vulnerability trends
We have not published a single CVE for curl yet this year (there was one announced, CVE-2019-15601 but after careful considerations we have backpedaled on that, we don’t consider it a flaw anymore and the CVE has been rejected in the records.)
As I write this, there’s been exactly 225 days since the latest curl CVE was published and we’re aiming at shipping curl 7.70.0 next week as the 6th release in a row without a security vulnerability to accompany it. We haven’t done 6 “clean” consecutive release like this since early 2013!
Looking at the number of CVEs reported in the curl project per year, we can of course see that 2016 stands out. That was the year of the security audit that ended up the release of curl 7.51.0 with no less than eleven security vulnerabilities announced and fixed. Better is of course the rightmost bar over the year 2020 label. It is still non-existent!
As you can see in the graph below, the “plateau” in the top right is at 92 published CVEs. The previous record holder for longest period in the project without a CVE ended in February 2013 (with CVE-2013-0249) at 379 days.
2013 was however quite a different era for curl. Less code, much less scrutinizing, no bug bounty, lesser tools, no CI jobs etc.
Are we improving?
Is curl getting more secure?
We have more code and support more protocols than ever. We have a constant influx of new authors and contributors. We probably have more users than ever before in history.
At the same time we offer better incentives than ever before for people to report security bugs. We run more CI jobs than ever that run more and more test cases while code analyzers and memory debugging are making it easier to detect problems earlier. There are also more people looking for security bugs in curl than ever before.
Jinx?
I’m under no illusion that there aren’t more flaws to find, report and fix. We’re all humans and curl is still being developed at a fairly high pace.
This option is -4 as the short option and --ipv4 as the long, added in curl 7.10.8.
IP version
So why would anyone ever need this option?
Remember that when you ask curl to do a transfer with a host, that host name will typically be resolved to a list of IP addresses and that list will contain both IPv4 and IPv6 addresses. When curl then connects to the host, it will iterate over that list and it will attempt to connect to both IPv6 and IPv4 addresses, even at the same time in the style we call happy eyeballs.
The first connect attempt to succeed will be the one curl sticks to and will perform the transfer over.
When IPv6 acts up
In rare occasions or even in some setups, you may find yourself in a situation where you get back problematic IPv6 addresses in the name resolve, or the server’s IPv6 behavior seems erratic, your local config simply makes IPv6 flaky or things like that. Reasons you may want to ask curl to stick to IPv4-only to avoid a headache.
Then --ipv4 is here to help you.
--ipv4
First, this option will make the name resolving only ask for IPv4 addresses so there will be no IPv6 addresses returned to curl to try to connect to.
Then, due to the disabled IPv6, there won’t be any happy eyeballs procedure when connecting since there are now only addresses from a single family in the list.
It could perhaps be worth to stress that if the host name you target then doesn’t have any IPv4 addresses associated with it, the operation will instead fail when this option is used.
Example
curl --ipv4 https://example.org/
Related options
The reversed request, ask for IPv6 only is done with the --ipv6 option.
There are also options for specifying other protocol versions, in particular for example HTTP with --http1.0, --http1.1, --http2 and --http3 or for TLS with --tslv1, --tlsv1.2, --tlsv1.3 and more.
Neither a visa or a rejection yet, exactly two years since I completed my US visa application. Not a lot more to say that I haven’t already said before on this subject.
Of course I’m not surprised that I won’t get an approval in these travel-restricted Covid-19 times – as it would be a fine irony to get a visa and then not be allowed to travel anyway due to a general travel ban – but it also seems like the US immigration authorities haven’t yet used the pandemic as an excuse to (finally) just deny my application.
I was first prevented from traveling to the US on June 26 2017 (on ESTA) but it wasn’t until the following spring that I applied for a visa in an attempt to rectify the situation.
As so many other events in these mysterious times, the foss-north conference went online-only and on March 30, 2020 I was honored to be included among the champion speakers at this lovely conference and I talked about how to “curl better” there.
The talk is a condensed run-through of how curl works and why, and then a look into how some of the more important HTTP oriented command line options work and how they’re supposed to be used.
As someone pointed out: I don’t do a lot of presentations about the curl tool. Maybe I should do more of these.
curl is widely used but still most users only use a very small subset of options or even just copy their command line from somewhere else. I think more users could learn to curl better. Below is the video of this talk.
Doing a talk to a potentially large audience in front of your laptop in completely silence and not seeing a single audience member is a challenge. No “contact” with the audience and no feel for if they’re all going to sleep or seem interested etc. Still I have the feeling that this is the year we all are going to do this many times and hopefully get better at it over time…
Both browsers have paused or conditioned their efforts to not take the final steps during the Covid-19 outbreak, but they will continue and the outcome is given: FTP support in browsers is going away. Soon.
curl
curl supported both uploads and downloads with FTP already in its first release in March 1998. Which of course was many years before either of those browsers mentioned above even existed!
In the curl project, we work super hard and tirelessly to maintain backwards compatibility and not break existing scripts and behaviors.
For these reasons, curl will not drop FTP support. If you have legacy systems running FTP, curl will continue to have your back and perform as snappy and as reliably as ever.
FTP the protocol
FTP is a protocol that is quirky to use over the modern Internet mostly due to its use of two separate TCP connections. It is unencrypted in its default version and the secured version, FTPS, was never supported by browsers. Not to mention that the encrypted version has its own slew of issues when used through NATs etc.
To put it short: FTP has its issues and quirks.
FTP use in general is decreasing and that is also why the browsers feel that they can take this move: it will only negatively affect a very minuscule portion of their users.
Legacy
FTP is however still used in places. In the 2019 curl user survey, more than 29% of the users said they’d use curl to transfer FTP within the last two years. There’s clearly a long tail of legacy FTP systems out there. Maybe not so much on the public Internet anymore – but in use nevertheless.
Alternative protocols?
SFTP could have become a viable replacement for FTP in these cases, but in practice we’ve moved into a world where HTTPS replaces everything where browsers are used.
This is the 25th transfer protocol added to curl. The first new addition since we added SMB and SMBS back in November 2014.
Background
Back in early 2019, my brother Björn Stenberg brought a pull request to the curl project that added support for MQTT. I tweeted about it and it seemed people were interested in seeing this happen.
Time passed and Björn unfortunately didn’t manage to push his work forward and instead it grew stale and the PR eventually was closed due to that inactivity later the same year.
Roadmap 2020
In my work trying to go over and figure out what I want to see in curl the coming year and what we (wolfSSL) as a company would like to see being done, MQTT qualified as a contender for the list. See my curl roadmap 2020 video.
It’s happening again
I grabbed Björn’s old pull-request and rebased it onto git master, fixed a few minor conflicts and small cleanups necessary and then brought it further. I documented two of my early sessions on this, live-streamed on twitch. See MQTT in curl and MQTT part two below:
Polish
Björn’s code was an excellent start but didn’t take us all the way.
I wrote an MQTT test server, created a set of test cases, made sure the code worked for those test cases, made it more solid and more. It is still early days and the MQTT support is basic and comes with several caveats, but it’s slowly getting there.
MQTT – really?
When I say that MQTT almost fits the curl concepts and paradigms, I mean that you can consider what an MQTT client does to be “sending” and “receiving” and you can specify that with a URL.
Fetching an MQTT URL with curl means doing SUSCRIBE on a topic and waiting for that to arrive and get the payload sent to the output.
Doing the equivalent of a HTTP POST with curl, like with the command line’s -d option makes an MQTT PUBLISH and sends a payload to a topic.
Rough corners and wrong assumptions
I’m an MQTT rookie. I’m sure there will be mistakes and I will have misunderstood things. The MQTT will be considered experimental for a time forward so that people will get a chance to verify the functionality and we have a chance to change and correct the worst decisions and fatal mistakes. Remember that for experimental features in curl, we reserve ourselves the right to change behavior, API and ABI so nobody should ship such features enabled anywhere without first thinking it through very carefully!
If you’re a person who think MQTT in curl would be useful, good or just fun and you have use cases or ideas where you’d want to use this. Please join in and try and let us know how it works and what you think we should polish or fix to make it truly stellar!
The code is landed in the master branch since PR 5173 was merged. The code will be present in the coming 7.70.0 release, due to ship on April 29 2020.
TODO
As I write this, the MQTT support is still very basic. I want a first version out to users as early as possible as I want to get feedback and comments to help verify that we’re in the right direction and then work on making the support of the protocol more complete. TLS, authentication, QoS and more will come as we proceed. Of course, if you let me know what we must support for MQTT to make it interesting for you, I’ll listen! Preferably, you do the discussions on the curl-library mailing list.
We’ve only just started.
Credits
The initial MQTT patch that kicked us off was written by Björn Stenberg. I brought it forward from there, bug-fixed it, extended it, added a test server and test cases and landed the lot in the master branch.
--append is the long form option, -a is the short. The option has existed since at least May 1998 (present in curl 4.8). I think it is safe to say that if we would’ve created this option just a few years later, we would not have “wasted” a short option letter on it. It is not a very frequently used one.
Append remotely
The append in the option name is a require to the receiver to append to — rather than replace — a destination file. This option only has any effect when uploading using either FTP(S) or SFTP. It is a flag option and you use it together with the --upload option.
When you upload to a remote site with these protocols, the default behavior is to overwrite any file that happens to exist on the server using the name we’re uploading to. If you append this option to the command line, curl will instead instruct the server to append the newly uploaded data to the end of the remote file.
The reason this option is limited to just subset of protocols is of course that they are the only ones for which we can give that instruction to the server.
Example
Append the local file “trailer” to the remote file called “begin”:
Mar 31 2020, 11:13:38: I get a message from Frank in the #curl IRC channel over on Freenode. I’m always “hanging out” on IRC and Frank is a long time friend and fellow frequent IRCer in that channel. This time, Frank informs me that the curl web site is acting up:
“I’m getting 403s for some mailing list archive pages. They go away when I reload”
That’s weird and unexpected. An important detail here is that the curl web site is “CDNed” by Fastly. This means that every visitor of the web site is actually going to one of Fastly’s servers and in most cases they get cached content from those servers, and only infrequently do these servers come back to my “origin” server and ask for an updated file to send out to a web site visitor.
A 403 error for a valid page is not a good thing. I started checking out some of my logs – which then only are for the origin as I don’t do any logging at all at CDN level (more about that later) – and I could verify the 403 errors. So they’re in my log meaning it isn’t caused by (a misconfiguration of) the CDN. Why would a perfectly legitimate URL suddenly return 403 to have it go away again after a reload?
Why does he get a 403?
I took a look at Fastly’s management web interface and I spotted that the curl web site was sending out data at an unusual high speed at the moment. An average speed of around 50mbps, while we typically average at below 20. Hm… something is going on.
While I continued to look for the answers to these things I noted that my logs were growing really rapidly. There were POSTs being sent to the same single URL at a high frequency (10-20 reqs/second) and each of those would get some 225Kbytes of data returned. And they all used the same User-agent: QQGameHall. It seems this started within the last 24 hours or so. They’re POSTs so Fastly basically always pass them through to my server.
Before I could figure out Franks’s 403s, I decided to slow down this madness by temporarily forbidding this user-agent access so that the bot or program or whatever would notice it starts to fail, and it would of course then stop bombarding the site.
Deny
Ok, a quick deny of the user-agent made my server start responding with 403s to all those requests and instead of a 225K response it now sent back 465 bytes per request. The average bandwidth on the site immediately dropped down to below 20Mbps again. Back to looking for Frank’s 403-problem
The answer was pretty simple and I didn’t have to search a lot. The clues existed in the error logs and it turned out we had “mod_evasive” enabled since another heavy bot load “attack” a while back. It is a module for “rate limiting” incoming requests and since a lot of requests to our server now comes from Fastly’s limited set of IP addresses and we had this crazy QQ thing hitting us, my server would return a 403 every now and then when it considered the rate too high.
I whitelisted Fastly’s requests and Frank’s 403 problems were solved.
Deny a level up
The bot traffic showed no sign of slowing down. Easily 20 requests per second, to the same URL and they all get an error back and obviously they don’t care. I decided to up my game a little so with help, I moved my blocking of this service to Fastly. I now block their user-agent already there so the traffic doesn’t ever reach my server. Phew, my server was finally back to its regular calm state. They way it should be.
It doesn’t stop there. Here’s a follow-up graph I just grabbed, a little over a week since I started the blocking. 16.5 million blocked requests (and counting). This graph here shows number of requests/hour on the Y axis, peeking at almost 190k; around 50 requests/second. The load is of course not actually a problem, just a nuisance now. QQGameHall keeps on going.
QQGameHall
What we know about this.
Friends on Twitter and googling for this name informs us that this is a “game launcher” done by Tencent. I’ve tried to contact them via Twitter (as I have no means of contacting them otherwise that seems even remotely likely to work).
I have not checked what these user-agent POSTs, because I didn’t log that. I suspect it was just a zero byte POST.
The URL they post to is the CA cert bundle file with provide on the curl CA extract web page. The one we convert from the Mozilla version into a PEM for users of the world to enjoy. (Someone seems to enjoy this maybe just a little too much.)
The user-agents seemed to come (mostly) from China which seems to add up. Also, the look of the graph when it goes up and down could indicate an eastern time zone.
This program uses libcurl. Harry in the #curl channel found files in Virus Total and had a look. It is, I think, therefore highly likely that this “storm” is caused by an application using curl!
My theory: this is some sort of service that was deployed, or an upgrade shipped, that wants to get an updated CA store and they get that from our site with this request. Either they get it far too often or maybe there are just a very large amount them or similar. I cannot understand why they issue a POST though. If they would just have done a GET I would never have noticed and they would’ve fetched perfectly fine cached versions from the CDN…
Feel free to speculate further!
Logging, privacy, analytics
I don’t have any logging of the CDN traffic to the curl site. Primarily because I haven’t had to, but also because I appreciate the privacy gain for our users and finally because handling logs at this volume pretty much requires a separate service and they all seem to be fairly pricey – for something I really don’t want. So therefore I don’t see the source IP addresses these things. (But yes, I can ask Fastly to check and tell me if I really really wanted to know.)
Also: I don’t run any analytics (Google or otherwise) on the site, primarily for privacy reasons. So that won’t give me that data or other clues either.
Update: it has been proposed I could see the IP address in the X-Forwarded-For: headers and it seems accurate. Of course I didn’t log that header during this period but I will consider starting doing it for better control and info in the future.
Update 2: As of May 18 2020, this flood has not diminished. Logs show that we still block about 5 million requests/day from this service, peaking at over 100 requests/minute.