In April 2019 we launched the current curl bug-bounty program under the Hackerone umbrella and from my point of view it has been nothing but a raging success. Until today we’ve paid almost 17,000 USD in rewards and and the average payment amount has been increasing all the time.
The reward money in this program have been paid to security reporters sourced from our own funds. Funds that have been donated to the curl project by our generous curl sponsors.
Before that day in 2019, when this program started, we did a few attempts to lean on and piggy-back on other bug-bounty efforts, but that never worked good enough. It mostly made the process unpredictable, outside of our control and ability to influence them and they never paid researchers “proper”.
We even started this latest program in association with a known brand company (that I won’t name here) who promised to chip-in and contribute money to the rewards whenever they would affect one of their use cases – but that similarly just ended up an empty promise for something that apparently never could happen. It feels much more honest and straight forward not giving anyone such false expectations – so they’re no longer involved here.
The original Internet Bug-Bounty
Another “failed program” from the past, at least as far as bounties for curl issues go, was the Hackerone driven bounty program known as IBB. It was an umbrella project to offer bounties for security problems in a set of “internet programs” including curl. I won’t bore you with details why that didn’t work. I think they paid some small bounties to two or three curl related issues.
IBB reborn but different now
The experience from all previous attempts and programs we’ve tried for bounties says that we need to be in control of what reported issues that are considered security related problems and I think it is important that we reward all such issues, without discrimination or other conditions. If the issue is indeed a security problem, then we appreciate getting told about it and we reward the person who did the job, figured it out and told us.
Therefore, skepticism was the initial response I felt when I was briefed about the re-introduction, rebirth if you want, of the IBB program. We’ve been there, we tried that.
But after talking to the people involved, I was subsequently convinced that we should give this effort a chance. There are several reasons that made me think this time can be different, to our benefit. They include:
The IBB program will pay the rewards from their funds, and they will do their own fund raising and “pester “big companies to help out, thus either entirely or mostly remove the need for us to fund the rewards or at least make our spending smaller. Or the rewards larger.
The members of the curl security team will still work with reported issues the exact same way as before and our security team will remain the sole arbiters of what problems that are in-scope and what problems that are not for issues reported on curl. We’ve established a decent working method for that over the last two something years and I feel good about us sticking to this. The IBB program is mostly involved at the end of the process when the reward amount and payout are handled.
We stick to mostly the same work-flow and site for reporting issues and communicating with reporters while the issues are in the initial non-disclosed state. Namely within the nicely working Hackerone issue tracker, which is designed and made specifically for this purpose.
Evaluation
We have not signed up for this new way of doing things for life. If it turns out that it is bad somehow for the curl project or for security researchers filing problems about curl, then we can always just backpedal back to the previous situation and continue as before.
This should be a fairly harmless test and change of process that should be an improvement for us as otherwise we won’t stick to it!
Welcome to the 200th curl release. We call it 200 OK. It coincides with us counting more than 900 commit authors and surpassing 2,400 credited contributors in the project. This is also the first release ever in which we thank more than 80 persons in the RELEASE-NOTES for having helped out making it and we’ve set two new record in the bug-bounty program: the largest single payout ever for a single bug (2,000 USD) and the largest total payout during a single release cycle: 3,800 USD.
This release cycle was 42 days only, two weeks shorter than normal due to the previous 7.76.1 patch release.
Release Presentation
Numbers
the 200th release 5 changes 42 days (total: 8,468) 133 bug-fixes (total: 6,966) 192 commits (total: 27,202) 0 new public libcurl function (total: 85) 2 new curl_easy_setopt() option (total: 290) 2 new curl command line option (total: 242) 82 contributors, 44 new (total: 2,410) 47 authors, 23 new (total: 901) 3 security fixes (total: 103) 3,800 USD paid in Bug Bounties (total: 9,000 USD)
Security
We set two new records in the curl bug-bounty program this time as mentioned above. These are the issues that made them happen.
This is a Use-After-Free in the OpenSSL backend code that in the absolutely worst case can lead to an RCE, a Remote Code Execution. The flaw is reasonably recently added and it’s very hard to exploit but you should upgrade or patch immediately.
The issue occurs when TLS session related info is sent from the TLS server when the transfer that previously used it is already done and gone.
The reporter was awarded 2,000 USD for this finding.
When libcurl accepts custom TELNET options to send to the server, it the input parser was flawed which could be exploited to have libcurl instead send contents from the stack.
The reporter was awarded 1,000 USD for this finding.
In the Schannel backend code, the selected cipher for a transfer done with was stored in a static variable. This caused one transfer’s choice to weaken the choice for a single set transfer could unknowingly affect other connections to a lower security grade than intended.
The reporter was awarded 800 USD for this finding.
Changes
In this release we introduce 5 new changes that might be interesting to take a look at!
Make TLS flavor explicit
As explained separately, the curl configure script no longer defaults to selecting a particular TLS library. When you build curl with configure now, you need to select which library to use. No special treatment for any of them!
No more SSL
curl now has no more traces of support for SSLv2 or SSLv3. Those ancient and insecure SSL versions were already disabled by default by TLS libraries everywhere, but now it’s also impossible to activate them even in special build. Stripped out from both the curl tool and the library (thus counted as two changes).
HSTS in the build
We brought HSTS support a while ago, but now we finally remove the experimental label and ship it enabled in the build by default for everyone to use it more easily.
In-memory cert API
We introduce API options for libcurl that allow users to specify certificates in-memory instead of using files in the file system. See CURLOPT_CAINFO_BLOB.
Favorite bug-fixes
Again we manage to perform a large amount of fixes in this release, so I’m highlighting a few of the ones I find most interesting!
Version output
The first line of curl -V output got updated: libcurl now includes OpenLDAP and its version of that was used in the build, and then the curl tool can add libmetalink and its version of that was used in the build!
curl_mprintf: add description
We’ve provided the *printf() clone functions in the API since forever, but we’ve tried to discourage users from using them. Still, now we have a first shot at a man page that clearly describes how they work.
This is important as they’re not quite POSIX compliant and users who against our advice decide to rely on them need to be able to know how they work!
CURLOPT_IPRESOLVE: preventing wrong IP version from being used
This option was made a little stricter than before. Previously, it would be lax about existing connections and prefer reuse instead of resolving again, but starting now this option makes sure to only use a connection with the request IP version.
This allows applications to explicitly create two separate connections to the same host using different IP versions when desired, which previously libcurl wouldn’t easily let you do.
Ignore SIGPIPE in curl_easy_send
libcurl makes its best at ignoring SIGPIPE everywhere and here we identified a spot where we had missed it… We also made sure to enable the ignoring logic when built to use wolfSSL.
Several HTTP/2-fixes
There are no less than 6 separate fixes mentioned in the HTTP/2 module in this release. Some potential memory leaks but also some more behavior improving things. Possibly the most important one was the move of the transfer-related error code from the connection struct to the transfers struct since it was vulnerable to a race condition that could make it wrong. Another related fix is that libcurl no longer forcibly disconnects a connection over which a transfer gets HTTP_1_1_REQUIRED returned.
Partial CONNECT requests
When the CONNECT HTTP request sent to a proxy wasn’t all sent in a single send() call, curl would fail. It is baffling that this bug hasn’t been found or reported earlier but was detected this time when the reporter issued a CONNECT request that was larger than 16 kilobytes…
TLS: add USE_HTTP2 define
There was several remaining bad assumptions that HTTP/2 support in curl relies purely on nghttp2. This is no longer true as HTTP/2 support can also be provide by hyper.
When libcurl (built with libssh2 support) stopped an SFTP transfer because a timeout was triggered, the following SFTP disconnect procedure was subsequently also stopped because of the same timeout and therefore wasn’t allowed to properly clean up everything, leading to a memory-leak!
IRC network switch
We moved the #curl IRC channel to the new network libera.chat. Come join us there!
Next release
On Jul 21, 2021 we plan to ship the next release. The version number for that is not yet decided but we have changes in the pipeline, making a minor version number bump very likely.
In the curl project we make great efforts to store a lot of meta data about each and every vulnerability that we have fixed over the years – and curl is over 23 years old. This data set includes CVE id, first vulnerable version, last vulnerable version, name, announce date, report to the project date, CWE, reward amount, code area and “C mistake kind”.
We also keep detailed data about releases, making it easy to look up for example release dates for specific versions.
Dashboard
All this, combined with my fascination (some would call it obsession) of graphs is what pushed me into creating the curl project dashboard, with an ever-growing number of daily updated graphs showing various data about the curl projects in visual ways. (All scripts for that are of course also freely available.)
What to show is interesting but of course it is sometimes even more important how to show particular data. I don’t want the graphs just to show off the project. I want the graphs to help us view the data and make it possible for us to draw conclusions based on what the data tells us.
Vulnerabilities
The worst bugs possible in a project are the ones that are found to be security vulnerabilities. Those are the kind we want to work really hard to never introduce – but we basically cannot reach that point. This special status makes us focus a lot on these particular flaws and we of course treat them special.
For a while we’ve had two particular vulnerability graphs in the dashboard. One showed the number of fixed issues over time and another one showed how long each reported vulnerability had existed in released source code until a fix for it shipped.
CVE age in code until report
The CVE age in code until report graph shows that in general, reported vulnerabilities were introduced into the code base many years before they are found and fixed. In fact, the all time average time suggests they are present for more than 2,700 – more than seven years. Looking at the reports from the last 12 months, the average is even almost 1000 days more!
It takes a very long time for vulnerabilities to get found and reported.
When were the vulnerabilities introduced
Just the other day it struck me that even though I had a lot of graphs already showing in the dashboard, there was none that actually showed me in any nice way at what dates we created the vulnerabilities we spent so much time and effort hunting down, documenting and talking about.
I decided to use the meta data we already have and add a second plot line to the already existing graph. Now we have the previous line (shown in green) that shows the number of fixed vulnerabilities bumped at the date when a fix was released.
Added is the new line (in red) that instead is bumped for every date we know a vulnerability was first shipped in a release. We know the version number from the vulnerability meta data, we know the release date of that version from the release meta data.
This all new graph helps us see that out of the current 100 reported vulnerabilities, half of them were introduced into the code before 2010.
Using this graph it also very clear to me that the increased CVE reporting that we can spot in the green line started to accelerate in the project in 2016 was not because the bugs were introduced then. The creation of vulnerabilities rather seem to be fairly evenly distributed over time – with occasional bumps but I think that’s more related to those being particular releases that introduced a larger amount of features and code.
As the average vulnerability takes 2700 days to get reported, it could indicate that flaws landed since 2014 are too young to have gotten reported yet. Or it could mean that we’ve improved over time so that new code is better than old and thus when we find flaws, they’re more likely to be in old code paths… I don’t think the red graph suggests any particular notable improvement over time though. Possibly it does if we take into account the massive code growth we’ve also had over this time.
The green “fixed” line at least has a much better trend and growth angle.
Present in which releases
As we have the range of vulnerable releases stored in the meta data file for each CVE, we can then add up the number of the flaws that are present in every past release.
Together with the release dates of the versions, we can make a graph that shows the number of reported vulnerabilities that are present in each past release over time, in a graph.
You can see that some labels end up overwriting each other somewhat for the occasions when we’ve done two releases very close in time.
I’ve previously blogged about the possible backdoor threat to curl. This post might be a little repeat but also a refresh and renewed take on the subject several years later, in the shadow of the recent PHP backdoor commits of March 28, 2021. Nowadays, “supply chain attacks” is a hot topic.
Since you didn’t read that PHP link: an unknown project outsider managed to push a commit into the PHP master source code repository with a change (made to look as if done by two project regulars) that obviously inserted a backdoor that could execute custom code when a client tickled a modified server the right way.
The commits were apparently detected very quickly. I haven’t seen any proper analysis on exactly how they were performed, but to me that’s not the ultimate question. I rather talk and think about this threat in a curl perspective.
PHP is extremely widely used and so is curl, but where PHP is (mostly) server-side running code, curl is client-side.
How to get malicious code into curl
I’d like to think about this problem from an attacker’s point of view. There are but two things an attacker need to do to get a backdoor in and a third adjacent step that needs to happen:
Make a backdoor change that is hard to detect and appears innocent to a casual observer, while actually still being able to do its “job”
Get that changed landed in the master source code repository branch
The code needs to be included in a curl release that is used by the victim/target
These are not simple steps. The third step, getting into a release, is not strictly always necessary because there are sometimes people and organizations that run code off the bleeding edge master repository (against our advice I should add).
Writing the backdoor code
As was seen in this PHP attack, it failed rather miserably at step 1, making the attack code look innocuous, although we can suspect that maybe that was done so on purpose. In 2010 there was a lengthy discussion about an alleged backdoor in OpenBSD’s IPSEC stack that presumably had been in place for years and even while that particular backdoor was never proven to be real, the idea that it can be done certainly is.
Every time we fix a security problem in curl there’s that latent nagging question in the back of our collective minds: was this flaw placed here deliberately? Historically, we’ve not seen any such attacks against curl. I can tell this with a high degree of certainty since almost all of the existing security problems detected and reported in curl was done by me…!
The best attack code would probably do something minor that would have a huge impact in a special context for which the attacker has planned to use it. I mean minor as in doing a NULL-pointer dereference or doing a use-after-free or something. This, because doing a full-fledged generic stack based buffer overflow is much harder to land undetected. Maybe going with a single-byte overwrite outside of a malloc could be the way, like it was back in 2016 when such a flaw in c-ares was used as the first step in a multi-flaw exploit sequence to execute remote code as root on ChromeOS…
Ideally, the commit should also include an actual bug-fix that would be the public facing motivation for it.
Get that code landed in the repo
Okay let’s imagine that you have produced code that actually is a useful bug-fix or feature addition but with an added evil twist, and you want that landed in curl. I can imagine several different theoretical ways to do it:
A normal pull-request and land using the normal means
Tricking or forcing a user with push rights to circumvent the review process
Use a weakness somewhere and land the code directly without involving existing curl team members
The Pull Request method
I’ve never seen this attempted. Submit the pull-request to the project the usual means and argue that the commit fixes a bug – which could be true.
This makes the backdoor patch to have to go through all testing and reviews with flying colors to get merged. I’m not saying this is impossible, but I will claim that it is very hard and also a very big gamble by an attacker. Presumably it is a fairly big job just to get the code for this attack to work, so maybe going with a less risky way to land the code is then preferable? But then which way is likely to have the most reliable outcome?
The tricking a user method
Social engineering is very powerful. I can’t claim that our team is immune to that so maybe there’s a way an outsider could sneak in behind our imaginary personal walls and make us take a shortcut for a made up reason that then would circumvent the project’s review process.
We can even include more forced “convincing” such as direct threats against persons or their families: “push this code or else…”. This way of course cannot be protected against using 2fa, better passwords or things like that. Forcing a users to do it is also likely to eventually get known and then immediately make the commit reverted.
Tricking a user doesn’t make the commit avoid testing and scrutinizing after the fact. When the code has landed, it will be scanned and tested in a hundred CI jobs that include a handful of static code analyzers and memory/address sanitizers.
Tricking a user could land the code, but it can’t make it stick unless the code is written as the perfect stealth change. It really needs to be that good attack code to work out. Additionally: circumventing the regular pull-request + review procedure is unusual so I believe it is likely that such commit will be reviewed and commented on after the fact, and there might then be questions about it and even likely follow-up actions.
The exploiting a weakness method
A weakness in this context could be a security problem in the hosting software or even a rogue admin in the company that hosts the main source code git repo. Something that allows code to get pushed into the code repository without it being the result of one of the existing team members. This seems to be the method that the PHP attack was done through.
This is a hard method as well. Not only does it shortcut reviews, it is also done in the name of someone on the team who knows for sure that they didn’t do the commit, and again, the commit will be tested and poked at anyway.
For all of us who sign our git commits, detecting such a forged commit is easy and quickly done. In the curl project we don’t have mandatory signed commits so the lack of a signature won’t actually block it. And who knows, a weakness somewhere could even possibly find a way to bypass such a requirement.
The skip-git-altogether methods
As I’ve described above, it is really hard even for a skilled developer to write a backdoor and have that landed in the curl git repository and stick there for longer than just a very brief period.
If the attacker instead can just sneak the code directly into a release archive then it won’t appear in git, it won’t get tested and it won’t get easily noticed by team members!
curl release tarballs are made by me, locally on my machine. After I’ve built the tarballs I sign them with my GPG key and upload them to the curl.se origin server for the world to download. (Web users don’t actually hit my server when downloading curl. The user visible web site and downloads are hosted by Fastly servers.)
An attacker that would infect my release scripts (which btw are also in the git repository) or do something to my machine could get something into the tarball and then have me sign it and then create the “perfect backdoor” that isn’t detectable in git and requires someone to diff the release with git in order to detect – which usually isn’t done by anyone that I know of.
But such an attacker would not only have to breach my development machine, such an infection of the release scripts would be awfully hard to pull through. Not impossible of course. I of course do my best to maintain proper login sanitation, updated operating systems and use of safe passwords and encrypted communications everywhere. But I’m also a human so I’m bound to do occasional mistakes.
Another way could be for the attacker to breach the origin download server and replace one of the tarballs there with an infected version, and hope that people skip verifying the signature when they download it or otherwise notice that the tarball has been modified. I do my best at maintaining server security to keep that risk to a minimum. Most people download the latest release, and then it’s enough if a subset checks the signature for the attack to get revealed sooner rather than later.
The further-down-the-chain method
As an attacker, get into the supply chain somewhere else: find a weaker link in the chain between the curl release tarball and the target system for your attack . If you can trick or social engineer maybe someone else along the way to get your evil curl tarball to get used there instead of the actual upstream tarball, that might be easier and give you more bang for your buck. Perhaps you target your particular distribution’s or Operating System’s release engineers and pretend to be from the curl project, make up a story and send over a tarball to help them out…
Fake a security advisory and send out a bad patch directly to someone you know build their own curl/libcurl binaries?
Better ways?
If you can think of other/better ways to get malicious code via curl code into a victim’s machine, let me know! If you find a security problem, we will reward you for it!
Similarly, if you can think of ways or practices on how we can improve the project to further increase our security I’ll be very interested. It is an ever-moving process.
Dependencies
Added after the initial post. Lots of people have mentioned that curl can get built with many dependencies and maybe one of those would be an easier or better target. Maybe they are, but they are products of their own individual projects and an attack on those projects/products would not be an attack on curl or backdoor in curl by my way of looking at it.
In the curl project we ship the source code for curl and libcurl and the users, the ones that builds the binaries from that source code will get the dependencies too.
I spent a lot of time and effort digging up the numbers and facts for this post!
Lots of people keep referring to the awesome summary put together by a friendly pseudonymous “Tim” which says that “53 out of 95” (55.7%) security flaws in curl could’ve been prevented if curl had been written in Rust. This is usually in regards to discussions around how insecure C is and what to do about it. I’ve blogged about this topic before, but things change, the world changes and my own view on these matters keep getting refined.
I did my own count: how many of the current 98 published security problems in curl are related to it being written in C?
Possibly due to the slightly different question, possibly because I’ve categorized one or two vulnerabilities differently, possibly because I’m biased as heck, but my count end up at:
51 out of 98 security vulnerabilities are due to C mistakes
That’s still 52%. (you can inspect my analysis and submit issues/pull-requests against the vuln.pm file) and yes, 51 flaws that could’ve been avoided if curl had been written in a memory safe language. This contradicts what I’ve said in the past, but I will also show you below that the numbers have changed and I still was right back then!
Let me also already now say that if you check out the curl security section, you will find very detailed descriptions of all vulnerabilities. Using those, you can draw your own conclusions and also easily write your own blog posts on this topic!
This post is not meant as a discussion around how we can rewrite C code into other languages to avoid these problems. This is an introspection of the C related vulnerabilities in curl. curl will not be rewritten but will continue to support backends written in other languages.
It seems hard to draw hard or definite conclusions based on the CVEs and C mistakes in curl’s history due to the relatively small amounts to analyze. I’m not convinced this is data enough to actually spot real trends, but might be mostly random coincidences.
98 flaws out of 6,682
The curl changelog counts a total of 6,682 bug-fixes at the time of this writing. It makes the share of all vulnerabilities to be 1.46% of all known curl bugs fixed through curl’s entire life-time, starting in March 1998.
Looking at recent curl development: the last three years. Since January 1st 2018, we’ve fixed 2,311 bugs and reported 26 vulnerabilities. Out of those 26 vulnerabilities, 18 (69%) were due to C mistakes. 18 out of 2,311 is 0.78% of the bug-fixes.
We’ve not reported a single C-based vulnerability in curl since September 2019, but six others. And fixed over a thousand other bugs. (There’s another vulnerability pending announcement, a 99th one, to become public on March 31, but that is also not a C mistake.)
This is not due to lack of trying. We’re one of the few small open source projects that pays several hundred dollars for any reported and confirmed security flaw since a few years back.
The share of C based security issues in curl is an extremely small fraction of the grand total of bugs. The security flaws are however of course the most fatal and serious ones – as all bugs are certainly not equal.
But also: not all vulnerabilities are equal. Very few curl vulnerabilities have had a severity level over medium and none has been marked critical.
Unfortunately we don’t have “severity” noted for very many many of the past vulnerabilities, as we only started that practice in 2019 and I’ve spent time and effort to backtrack and fill them in for the 2018 ones, but it’s a tedious job and I probably will not update the remainder soon, if at all.
51 flaws due to C
Let’s dive in to see how they look.
Here’s a little pie chart with the five different C mistake categories that have caused the 51 vulnerabilities. The categories here are entirely my own. No surprises here really. The two by far most common C mistakes that caused vulnerabilities are reading or writing outside a buffer.
Buffer overread – reading outside the buffer size/boundary. Very often due to a previous integer overflow.
Buffer overflow – code wrote more data into a buffer than it was allocated to hold.
Use after free – code used a memory area that had already been freed.
Double free – freeing a memory pointer that had already been freed.
NULL mistakes – NULL pointer dereference and NUL byte mistake.
Addressing the causes
I’ve previously described a bunch of the counter-measures we’ve done in the project to combat some of the most common mistakes we’ve done. We continue to enforce those rules in the project.
Two of the main methods we’ve introduced that are mentioned in that post, are that we have A) created a generic dynamic buffer system in curl that we try to use everywhere now, to avoid new code that handles buffers, and B) we enforce length restrictions on virtually all input strings – to avoid risking integer overflows.
Areas
When I did the tedious job of re-analyzing every single security vulnerability anyway, I also assigned an “area” to each existing curl CVE. Which area of curl in which the problem originated or belonged. If we look at where the C related issues were found, can we spot a pattern? I think not.
“internal” being the number one area, which means that was in generic code that affected multiple protocols or in several cases even entirely protocol independent.
HTTP was the second largest area, but that might just also reflect the fact that it is the by far most commonly used protocol in curl – and there is probably the most amount of protocol-specific code for this protocol. And there were a total of 21 vulnerabilities reported in that area, and 8 out of 21 is 38% C mistakes – way below the total average.
Otherwise I think we can conclude that the mistakes were distributed all over, rather nondiscriminatory…
C mistake history
As curl is an old project now and we have a long history to look back at, we can see how we have done in this regard throughout history. I think it shows quite clearly that age hasn’t prevented C related mistakes to slip in. Even if we are experienced C programmers and aged developers, we still let such flaws slip in. Or at least we don’t find old such mistakes that went in a long time ago – as the reported vulnerabilities in the project have usually been present in the source code for many years at the time of the finding.
The fact is that we only started to take proper and serious counter-measures against such mistakes in the last few years and while the graph below shows that we’ve improved recently, I don’t think we yet have enough data to show that this is a true trend and not just a happenstance or a temporary fluke.
The blue line in the graph shows how big the accumulated share of all security vulnerabilities has been due to C mistakes over time. It shows we went below 50% totally in 2012, only to go above 50% again in 2018 and we haven’t come down below that again…
The red line shows the percentage share the last twelve months at that point. It illustrates that we have had several series of vulnerabilities reported over the years that were all C mistakes, and it has happened rather recently too. During the period one year back from the very last reported vulnerability, we did not have a single C mistake among them.
Finding the flaws takes a long time
C mistakes might be easier to find and detect in source code. valgrind, fuzzing, static code analyzers and sanitizers can find them. Logical problems cannot as easily be detected using tools.
I decided to check if this seems to be the case in curl and if it is true, then C mistakes should’ve lingered in the code for a shorter time until found than other mistakes.
I had a script go through the 98 existing vulnerabilities and calculating the average time the flaws were present in the code until reported, splitting out the C mistake ones from the ones not caused by C mistakes. It revealed a (small) difference:
C mistake vulnerabilities are found on average at 80% of the time other mistakes need to get found. Or put the other way around: mistakes that were not C mistakes took 25% longer to get reported – on average. I’m not convinced the difference is very significant. C mistakes are still shipped in code for 2,421 days – on average – until reported. Looking over the last 10 C mistake vulnerabilities, the average is slightly lower at 2,108 days (76% of the time the 10 most recent non C mistakes were found). Non C mistakes take 3,030 days to get reported on average.
Reproducibility
All facts I claim and provide in this blog post can be double-checked and verified using available public data and freely available scripts.
On February 11th, 2021 18:00 UTC (10am Pacific time, 19:00 Central Europe) we invite you to participate in a webinar we call “curl, Hyper and Rust”. To join us at the live event, please register via the link below:
https://www.wolfssl.com/isrg-partner-webinar/
What is the project about, how will this improve curl and Hyper, how was it done, what lessons can be learned, what more can we expect in the future and how can newcomers join in and help?
Participating speakers in this webinar are:
Daniel Stenberg. Founder of and lead developer of curl.
Josh Aas, Executive Director at ISRG / Let’s Encrypt.
The event went on for 60 minutes, including the Q&A session at the end.
Recording
Questions?
If you already have a question you want to ask, please let us know ahead of time. Either in a reply here on the blog, or as a reply on one of the many tweets that you will see about about this event from me and my fellow “webinarees”.
You might recall that my Twitter account was hijacked and then again just two weeks later.
The first: brute-force
The first take-over was most likely a case of brute-forcing my weak password while not having 2FA enabled. I have no excuse for either of those lapses. I had convinced myself I had 2fa enabled which made me take a (too) lax attitude to my short 8-character password that was possible to remember. Clearly, 2fa was not enabled and then the only remaining wall against the evil world was that weak password.
The second time
After that first hijack, I immediately changed password to a strong many-character one and I made really sure I enabled 2fa with an authenticator app and I felt safe again. Yet it would only take seventeen days until I again was locked out from my account. This second time, I could see how someone had managed to change the email address associated with my account (displayed when I wanted to reset my password). With the password not working and the account not having the correct email address anymore, I could not reset the password, and my 2fa status had no effect. I was locked out. Again.
It felt related to the first case because I’ve had my Twitter account since May 2008. I had never lost it before and then suddenly after 12+ years, within a period of three weeks, it happens twice?
Why and how
How this happened was a complete mystery to me. The account was restored fairly swiftly but I learned nothing from that.
Then someone at Twitter contacted me. After they investigated what had happened and how, I had a chat with a responsible person there and he explained for me exactly how this went down.
Had Twitter been hacked? Is there a way to circumvent 2FA? Were my local computer or phone compromised? No, no and no.
Apparently, an agent at Twitter who were going through the backlog of issues, where my previous hijack issue was still present, accidentally changed the email on my account by mistake, probably confusing it with another account in another browser tab.
There was no outside intruder, it was just a user error.
Okay, the cynics will say, this is what he told me and there is no evidence to back it up. That’s right, I’m taking his words as truth here but I also think the description matches my observations. There’s just no way for me or any outsider to verify or fact-check this.
A brighter future
They seem to already have identified things to improve to reduce the risk of this happening again and Michael also mentioned a few other items on their agenda that should make hijacks harder to do and help them detect suspicious behavior earlier and faster going forward. I was also happy to provide my feedback on how I think they could’ve made my lost-account experience a little better.
I’m relieved that the second time at least wasn’t my fault and neither of my systems are breached or hacked (as far as I know).
I’ve also now properly and thoroughly gone over all my accounts on practically all online services I use and made really sure that I have 2fa enabled on them. On some of them I’ve also changed my registered email address to one with 30 random letters to make it truly impossible for any outsider to guess what I use.
(I’m also positively surprised by this extra level of customer care Twitter showed for me and my case.)
Am I a target?
I don’t think I am. I think maybe my Twitter account could be interesting to scammers since I have almost 25K followers and I have a verified account. Me personally, I work primarily with open source and most of my works is already made public. I don’t deal in business secrets. I don’t think my personal stuff attracts attackers more than anyone else does.
What about the risk or the temptation for bad guys in trying to backdoor curl? It is after all installed in some 10 billion systems world-wide. I’ve elaborated on that before. Summary: I think it is terribly hard for someone to actually manage to do it. Not because of the security of my personal systems perhaps, but because of the entire setup and all processes, signings, reviews, testing and scanning that are involved.
So no. I don’t think my personal systems are a valued singled out target to attackers.
Status: 00:27 in the morning of December 4 my account was restored again. No words or explanations on how it happened – yet.
This morning (December 3rd, 2020) I woke up to find myself logged out from my Twitter account on the devices where I was previously logged in. Due to “suspicious activity” on my account. I don’t know the exact time this happened. I checked my phone at around 07:30 and then it has obviously already happened. So at time time over night.
Trying to log back in, I get prompted saying I need to update my password first. Trying that, it wants to send a confirmation email to an email address that isn’t mine! Someone has managed to modify the email address associated with my account.
It has only been two weeks since someone hijacked my account the last time and abused it for scams. When I got the account back, I made very sure I both set a good, long, password and activated 2FA on my account. 2FA with auth-app, not SMS.
The last time I wasn’t really sure about how good my account security was. This time I know I did it by the book. And yet this is what happened.
Communication
I was in touch with someone at Twitter security and provided lots of details of my systems , software, IP address etc while they researched their end about what happened. I was totally transparent and gave them all info I had that could shed some light.
I was contacted by a Sr. Director from Twitter (late Dec 4 my time). We have a communication established and I’ve been promised more details and information at some point next week. Stay tuned.
Was I breached?
Many people have proposed that the attacker must have come through my local machine to pull this off. If someone did, it has been a very polished job as there is no trace at all of that left anywhere on my machine. Also, to reset my password I would imagine the attacker would need to somehow hijack my twitter session, need the 2FA or trigger a password reset and intercept the email. I don’t receive emails on my machine so the attacker would then have had to (also?) manage to get into my email machine and removed that email – and not too many others because I receive a lot of email and I’ve kept on receiving a lot of email during this period.
I’m not ruling it out. I’m just thinking it seems unlikely.
If the attacker would’ve breached my phone and installed something nefarious on that, it would not have removed any reset emails and it seems like a pretty touch challenge to hijack a “live” session from the Twitter client or get the 2FA code from the authenticator app. Not unthinkable either, just unlikely.
Most likely?
As I have no insights into the other end I cannot really say which way I think is the most likely that the perpetrator used for this attack, but I will maintain that I have no traces of a local attack or breach and I know of no malicious browser add-ons or twitter apps on my devices.
Details
Firefox version 83.0 on Debian Linux with Tweetdeck in a tab – a long-lived session started over a week ago (ie no recent 2FA codes used),
Browser extensions: Cisco Webex, Facebook container, multi-account containers, HTTPS Everywhere, test pilot and ublock origin.
I only use one “authorized app” with Twitter and that’s Tweetdeck.
On the Android phone, I run an updated Android with an auto-updated Twitter client. That session also started over a week ago. I used Google Authenticator for 2fa.
While this hijack took place I was asleep at home (I don’t know the exact time of it), on my WiFi, so all my most relevant machines would’ve been seen as originating from the same “NATed” IP address. This info was also relayed to Twitter security.
Restored
The actual restoration happens like this (and it was the exact same the last time): I just suddenly receive an email on how to reset my password for my account.
The email is a standard one without any specifics for this case. Just a template press the big button and it takes you to the Twitter site where I can set a new password for my account. There is nothing in the mail that indicates a human was involved in sending it. There is no text explaining what happened. Oh, right, the mail also include a bunch of standard security advice like “use a strong password”, “don’t share your password with others” and “activate two factor” etc as if I hadn’t done all that already…
It would be prudent of Twitter to explain how this happened, at least roughly and without revealing sensitive details. If it was my fault somehow, or if I just made it easier because of something in my end, I would really like to know so that I can do better in the future.
What was done to it?
No tweets were sent. The name and profile picture remained intact. I’ve not seen any DMs sent or received from while the account was “kidnapped”. Given this, it seems possible that the attacker actually only managed to change the associated account email address.
HTTP Strict Transport Security (HSTS) is a standard HTTP response header for sites to tell the client that for a specified period of time into the future, that host is not to be accessed with plain HTTP but only using HTTPS. Documented in RFC 6797 from 2012.
The idea is of course to reduce the risk for man-in-the-middle attacks when the server resources might be accessible via both HTTP and HTTPS, perhaps due to legacy or just as an upgrade path. Every access to the HTTP version is then a risk that you get back tampered content.
Browsers preload
These headers have been supported by the popular browsers for years already, and they also have a system setup for preloading a set of sites. Sites that exist in their preload list then never get accessed over HTTP since they know of their HSTS state already when the browser is fired up for the first time.
The entire .dev top-level domain is even in that preload list so you can in fact never access a web site on that top-level domain over HTTP with the major browsers.
With the curl tool
Starting in curl 7.74.0, curl has experimental support for HSTS. Experimental means it isn’t enabled by default and we discourage use of it in production. (Scheduled to be released in December 2020.)
You instruct curl to understand HSTS and to load/save a cache with HSTS information using --hsts <filename>. The HSTS cache saved into that file is then updated on exit and if you do repeated invokes with the same cache file, it will effectively avoid clear text HTTP accesses for as long as the HSTS headers tell it.
I envision that users will simply use a small hsts cache file for specific use cases rather than anyone ever really want to have or use a “complete” preload list of domains such as the one the browsers use, as that’s a huge list of sites and for most use cases just completely unnecessary to load and handle.
With libcurl
Possibly, this feature is more useful and appreciated by applications that use libcurl for HTTP(S) transfers. With libcurl the application can set a file name to use for loading and saving the cache but it also gets some added options for more flexibility and powers. Here’s a quick overview:
CURLOPT_HSTS – lets you set a file name to read/write the HSTS cache from/to.
CURLOPT_HSTSREADFUNCTION – this callback gets called by libcurl when it is about to start a transfer and lets the application preload HSTS entries – as if they had been read over the wire and been added to the cache.
CURLOPT_HSTSWRITEFUNCTION – this callback gets called repeatedly when libcurl flushes its in-memory cache and allows the application to save the cache somewhere and similar things.
Feedback?
I trust you understand that I’m very very keen on getting feedback on how this works, on the API and your use cases. Both negative and positive. Whatever your thoughts are really!
Earlier this year I was the recipient of a monetary Google patch grant with the expressed purpose of improving security in libcurl.
This was an upfront payout under this Google program describing itself as “an experimental program that rewards proactive security improvements to select open-source projects”.
I accepted this grant for the curl project and I intend to keep working fiercely on securing curl. I recognize the importance of curl security as curl remains one of the most widely used software components in the world, and even one that is doing network data transfers which typically is a risky business. curl is responsible for a measurable share of all Internet transfers done over the Internet an average day. My job is to make sure those transfers are done as safe and secure as possible. It isn’t my only responsibility of course, as I have other tasks to attend to as well, but still.
Do more
Security is already and always a top priority in the curl project and for myself personally. This grant will of course further my efforts to strengthen curl and by association, all the many users of it.
What I will not do
When security comes up in relation to curl, some people like to mention and propagate for other programming languages, But curl will not be rewritten in another language. Instead we will increase our efforts in writing good C and detecting problems in our code earlier and better.
Proactive counter-measures
Things we have done lately and working on to enforce everywhere:
String and buffer size limits – all string inputs and all buffers in libcurl that are allowed to grow now have a maximum allowed size, that makes sense. This stops malicious uses that could make things grow out of control and it helps detecting programming mistakes that would lead to the same problems. Also, by making sure strings and buffers are never ridiculously large, we avoid a whole class of integer overflow risks better.
Unified dynamic buffer functions – by reducing the number of different implementations that handle “growing buffers” we reduce the risk of a bug in one of them, even if it is used rarely or the spot is hard to reach with and “exercise” by the fuzzers. The “dynbuf” internal API first shipped in curl 7.71.0 (June 2020).
Realloc buffer growth unification – pretty much the same point as the previous, but we have earlier in our history had several issues when we had silly realloc() treatment that could lead to bad things. By limiting string sizes and unifying the buffer functions, we have reduced the number of places we use realloc and thus we reduce the number of places risking new realloc mistakes. The realloc mistakes were usually in combination with integer overflows.
Code style – we’ve gradually improved our code style checker (checksrc.pl) over time and we’ve also gradually made our code style more strict, leading to less variations in code, in white spacing and in naming. I’m a firm believer this makes the code look more coherent and therefore become more readable which leads to fewer bugs and easier to debug code. It also makes it easier to grep and search for code as you have fewer variations to scan for.
More code analyzers – we run every commit and PR through a large number of code analyzers to help us catch mistakes early, and we always remove detected problems. Analyzers used at the time of this writing: lgtm.com, Codacy, Deepcode AI, Monocle AI, clang tidy, scan-build, CodeQL, Muse and Coverity. That’s of course in addition to the regular run-time tools such as valgrind and sanitizer builds that run the entire test suite.
Memory-safe components – curl already supports getting built with a plethora of different libraries and “backends” to cater for users’ needs and desires. By properly supporting and offering users to build with components that are written in for example rust – or other languages that help developers avoid pitfalls – future curl and libcurl builds could potentially avoid a whole section of risks. (Stay tuned for more on this topic in a near future.)
Reactive measures
Recognizing that whatever we do and however tight ship we run, we will continue to slip every once in a while, is important and we should make sure we find and fix such slip-ups as good and early as possible.
Raising bounty rewards. While not directly fixing things, offering more money in our bug-bounty program helps us get more attention from security researchers. Our ambition is to gently drive up the reward amounts progressively to perhaps multi-thousand dollars per flaw, as long as we have funds to pay for them and we mange keep the security vulnerabilities at a reasonably low frequency.
More fuzzing. I’ve said it before but let me say it again: fuzzing is really the top method to find problems in curl once we’ve fixed all flaws that the static analyzers we use have pointed out. The primary fuzzing for curl is done by OSS-Fuzz, that tirelessly keeps hammering on the most recent curl code.
Good fuzzing needs a certain degree of “hand-holding” to allow it to really test all the APIs and dig into the dustiest corners, and we should work on adding more “probes” and entry-points into libcurl for the fuzzer to make it exercise more code paths to potentially detect more mistakes.