The I in LLM stands for intelligence

I have held back on writing anything about AI or how we (not) use AI for development in the curl factory. Now I can’t hold back anymore. Let me show you the most significant effect of AI on curl as of today – with examples.

Bug Bounty

Having a bug bounty means that we offer real money in rewards to hackers who report security problems. The chance of money attracts a certain amount of “luck seekers”. People who basically just grep for patterns in the source code or maybe at best run some basic security scanners, and then report their findings without any further analysis in the hope that they can get a few bucks in reward money.

We have run the bounty for a few years by now, and the rate of rubbish reports has never been a big problem. Also, the rubbish reports have typically also been very easy and quick to detect and discard. They have rarely caused any real problems or wasted our time much. A little like the most stupid spam emails.

Our bug bounty has resulted in over 70,000 USD paid in rewards so far. We have received 415 vulnerability reports. Out of those, 64 were ultimately confirmed security problems. 77 of the report were informative, meaning they typically were bugs or similar. Making 66% of the reports neither a security issue nor a normal bug.

Better crap is worse

When reports are made to look better and to appear to have a point, it takes a longer time for us to research and eventually discard it. Every security report has to have a human spend time to look at it and assess what it means.

The better the crap, the longer time and the more energy we have to spend on the report until we close it. A crap report does not help the project at all. It instead takes away developer time and energy from something productive. Partly because security work is consider one of the most important areas so it tends to trump almost everything else.

A security report can take away a developer from fixing a really annoying bug. because a security issue is always more important than other bugs. If the report turned out to be crap, we did not improve security and we missed out time on fixing bugs or developing a new feature. Not to mention how it drains you on energy having to deal with rubbish.

AI generated security reports

I realize AI can do a lot of good things. As any general purpose tool it can also be used for the wrong things. I am also sure AIs can be trained and ultimately get used even for finding and reporting security problems in productive ways, but so far we have yet to find good examples of this.

Right now, users seem keen at using the current set of LLMs, throwing some curl code at them and then passing on the output as a security vulnerability report. What makes it a little harder to detect is of course that users copy and paste and include their own language as well. The entire thing is not exactly what the AI said, but the report is nonetheless crap.

Detecting AI crap

Reporters are often not totally fluent in English and sometimes their exact intentions are hard to understand at once and it might take a few back and fourths until things reveal themselves correctly – and that is of course totally fine and acceptable. Language and cultural barriers are real things.

Sometimes reporters use AIs or other tools to help them phrase themselves or translate what they want to say. As an aid to communicate better in a foreign language. I can’t find anything wrong with that. Even reporters who don’t master English can find and report security problems.

So: just the mere existence of a few give-away signs that parts of the text were generated by an AI or a similar tool is not an immediate red flag. It can still contain truths and be a valid issue. This is part of the reason why a well-formed crap report is harder and takes longer to discard.

Exhibit A: code changes are disclosed

In the fall of 2023, I alerted the community about a pending disclosure of CVE-2023-38545. A vulnerability we graded severity high.

The day before that issue was about to be published, a user submitted this report on Hackerone: Curl CVE-2023-38545 vulnerability code changes are disclosed on the internet

That sounds pretty bad and would have been a problem if it actually was true.

The report however reeks of typical AI style hallucinations: it mixes and matches facts and details from old security issues, creating and making up something new that has no connection with reality. The changes had not been disclosed on the Internet. The changes that actually had been disclosed were for previous, older, issues. Like intended.

In this particular report, the user helpfully told us that they used Bard to find this issue. Bard being a Google generative AI thing. It made it easier for us to realize the craziness, close the report and move on. As can be seen in the report log, we did have to not spend much time on researching this.

Exhibit B: Buffer Overflow Vulnerability

A more complicated issue, less obvious, done better but still suffering from hallucinations. Showing how the problem grows worse when the tool is better used and better integrated into the communication.

On the morning of December 28 2023, a user filed this report on Hackerone: Buffer Overflow Vulnerability in WebSocket Handling. It was morning in my time zone anyway.

Again this sounds pretty bad just based on the title. Since our WebSocket code is still experimental, and thus not covered by our bug bounty it helped me to still have a relaxed attitude when I started looking at this report. It was filed by a user I never saw before, but their “reputation” on Hackerone was decent – this was not their first security report.

The report was pretty neatly filed. It included details and was written in proper English. It also contained a proposed fix. It did not stand out as wrong or bad to me. It appeared as if this user had detected something bad and as if the user understood the issue enough to also come up with a solution. As far as security reports go, this looked better than the average first post.

In the report you can see my first template response informing the user their report had been received and that we will investigate the case. When that was posted, I did not yet know how complicated or easy the issue would be.

Nineteen minutes later I had looked at the code, not found any issue, read the code again and then again a third time. Where on earth is the buffer overflow the reporter says exists here? Then I posted the first question asking for clarification on where and how exactly this overflow would happen.

After repeated questions and numerous hallucinations I realized this was not a genuine problem and on the afternoon that same day I closed the issue as not applicable. There was no buffer overflow.

I don’t know for sure that this set of replies from the user was generated by an LLM but it has several signs of it.

Ban these reporters

On Hackerone there is no explicit “ban the reporter from further communication with our project” functionality. I would have used it if it existed. Researchers get their “reputation” lowered then we close an issue as not applicable, but that is a very small nudge when only done once in a single project.

I have requested better support for this from Hackerone. Update: this function exists, I just did not look at the right place for it…

Future

As these kinds of reports will become more common over time, I suspect we might learn how to trigger on generated-by-AI signals better and dismiss reports based on those. That will of course be unfortunate when the AI is used for appropriate tasks, such as translation or just language formulation help.

I am convinced there will pop up tools using AI for this purpose that actually work (better) in the future, at least part of the time, so I cannot and will not say that AI for finding security problems is necessarily always a bad idea.

I do however suspect that if you just add an ever so tiny (intelligent) human check to the mix, the use and outcome of any such tools will become so much better. I suspect that will be true for a long time into the future as well.

I have no doubts that people will keep trying to find shortcuts even in the future. I am sure they will keep trying to earn that quick reward money. Like for the email spammers, the cost of this ends up in the receiving end. The ease of use and wide access to powerful LLMs is just too tempting. I strongly suspect we will get more LLM generated rubbish in our Hackerone inboxes going forward.

Discussion

Hacker news

Credits

Image by Haider Mahmood from Pixabay

18 thoughts on “The I in LLM stands for intelligence”

  1. There are solutions to this problem; unfortunately, they require you to change your ways to such an extent that you would be unable to contribute to cURL anymore and all your work, except for compatibility work would be obsolete (it technically has been obsolete for a long time; it’s just that you are lucky that almost everyone on this planet is incompetent). From my understanding, you consider yourself to be the center of the universe, which is incompatible with such a change.

    You are like that WW1 commander in WW2 that says “those tanks are useless”, seconds before being slaughtered on the battlefield by the Germans.

    You are one of those persons that complains about a problem, instead of just solving it (and not in the simple, short, and wrong way like you have done in the past decades).

    1. Thank you so much for your deep insight. You are clearly very smart. As a mere mortal, I wondered if you could spare a moment to bestow upon us more of your superior intellect and guide cURL into the future by suggesting specifically what the solutions are and how they solve the problem, and specifying in what ways Daniel or the project would have to change.

      O wise Aart, please respond quickly. I fear without your light shining upon us all, cURL will die in the cold dark winter, like a panzer division outside Moscow in the winter of 1941.

    2. @Aart I am just curious what problem you think you are solving with your comment, and if this “solution” is working for you.

      Frankly, it doesn’t work for me. I don’t know you nor do I know Daniel, I would consider myself an objective bystander in this matter. And in my humble opinion, Daniel’s post is helpful and well intended, but your comment is not.

    3. I totally disagree with you. AI brings many advantages, but also the disadvantages described by Daniel. He’s not the only one who has these problems, by the way. Even if his text is emotional and therefore not always friendly, it is completely correct in terms of content.

  2. Have you considered solving this the way Bitcoin does it? Require “Skin in the game”

    This could take many forms: A mandatory $1 donation to the project (https://curl.se/donation.html) when submitting a bug report applying for a security bounty. A required PR that improves (at least) 1 line of documentation. etc.
    A reference to your favorite discussion on the mailing list this month.
    Simply ask the submitter a question: Explain why we should believe you? Elaborate on how you found this issue

    1. That actually could be a good idea, at least it could help to get at least some money back for having to do such wasted research!
      And @Daniel, keep up the good work you do and did over the last decades!

  3. The second exhibit B started their message with “Certainly! Let me elaborate on the concerns raised by the triager:”, I had no doubt in my mind it was written by ChatGPT, haha. But of course I had been primed to look for it by this article.

    We are fortunate that most of these people are too stupid to tune the tone of the LLM to not be the default ChatGPT tone.

    1. The reporter in Exhibit B could also hardly formulate a coherent sentence when asked why he kept addressing another user. I’m guessing that’s the actual human writing there and the rest is all LLM.

  4. Tools like static code analysis already create tons of false issues, for all fools. The combination with another tool that is perfectly suited for creating stories is – for any fool with a tool – a brilliant idea. LLMs. ability is to concat words that they have seen in the past – like a cargo cult.

  5. Interesting post. Now this is only my opinion, but one solution to avoid and/or reduce the phenomenon of AI-generated reports on bug-bounty platforms could be the reporter paying a small fee, which would be refunded only if the report was judged as non-generated by AI. Not the best and ethical solution of course.

  6. For me this is the exact same problem as reports coming from automated tools that perform static analysis: any tool is fine provided that its output is validated by whoever uses it.

    The problem these days is that a lot of people are stupider than the tools they’re using, but they seek fame, so they’ll take their tools’ output verbatim and just forward it as-is without trying to analyze nor understand it.

    If someone tells me “I’m working on an LLM-based tool that helps me spot bugs and it found this which I confirmed as valid”, I’m OK with reading it. But if someone says “could you please check this report from my bug detector” then no, if you can’t validate it yourself, find someone else’s time to waste, and don’t expect to be credited for the discovery in case the problem is real, at best the tool will be.

  7. You will have an LLM that handles bug reports for you in the future up to the point of handoff. Then it will all be okay. Until then yes we are in a chaotic time.

    Yes it is forced, we are in forms of governance where capital is the ism and money talks more than anything else. AI is a bottom line technology for big corporations to run even more efficient and make more money. Why wouldn’t they use them, this isn’t Napster and the copy right infringement as the only issue days. All the big corporations are already making money off LLMs. You are not going to see this go away anytime soon. Plus anyone can run an LLM locally and have the code to add/write one again now. It’s over, we can’t stop the flood. Might as well swim with it or find a way to live without it, yet I don’t see how.

  8. Hi Daniel,

    Thanks for this post, and I also read/saw your replies on the curl mailing list about the last “report”.

    I have to state this: LLMs are today, what we described (in my student years) back in 1992 as *expert* systems: ie. they are there to *assist* the user, but the user, must be an expert to understand and grasp the impacts etc. of the responses given by the system.

    I’ll compare it to radiology that had been a great example of AI systems “better” at diagnostics: The data fed into the AI system, would be all over the country/world, and then the system is able to see cases that happens perhaps in 1 out of every 10 radiologists’ practising life. When a case like that does appear, the AI systems is then able to make that a part of the possible diagnoses – the radiologist can then direct those by checking to confirm/deny more easily, then would otherwise be possible… it NEEDS the expert’s to make the decisions, and is just a TOOL in guiding the diagnosis.

    But yes, as a tool it did pick up something that is a potential issue (strcpy). It should’ve advised, which it sorta did, but it nowhere actually gone through the logic of the code to confirm/deny it an overflow would happen given the tests around the code that basically does what strncpy would’ve done, perhaps the tests are beter in understanding the specifics of the code flow to prevent the overflow.

    Just a “note” on strncpy vs strcpy: When you have capable developers, strcpy should not be a problem, but the reason strncpy is deemed beter, is ’cause the average Joe Programmer that writes C code, are clueless, and they needs to be forced to use strncpy – but then, idiots are a problem and even strncpy could be creaintg buffer overflows when the wrong values are fed into it 😀

  9. The “I” means intelligence not just in LLM-s but in these bug reporters, too. Don’t blame a tool because people misuse it.
    Sadly using LLM-s makes it super easy to generate hundreds of (mostly false) bug reports per day. The bug bounty pages should handle this issue.
    I think the mandatory 1$ donation before bug report (at least for new users) suggested by many people here is a good idea.

    Side note: I love to generate small (mostly test or UI-related) code snippets with LLM-s but I would never insert the generated code without full review.

  10. I wonder how many of the people who suggest a donation before accepting a report have fully thought things through. Doing that would mean that you either choose to drop a fair number of useful reports (not everyone *can* even donate, even if they wanted to…), or setting up another channel for security reports that isn’t eligible for bug bounties.

    Curl has a history and tradition of trying to provide as little resistance for contributors as possible. It doesn’t feel like actively raising roadblocks for legitimate reports fits in that.

    1. I Totally agree with Frank. As a employee working with cURL I’m not able to make a donation nor is the company able to sped this money. This would force a ignorance to bugs and lead to the attitude that the author of the software should simply see how he /she can clean out the bugs. I’m not convinced from this idea.

Comments are closed.