AI-powered code submissions

Who knows, maybe May 18 2020 will mark some sort of historic change when we look back on this day in the future.

On this day, the curl project received the first “AI-powered” submitted issues and pull-requests. They were submitted by MonocleAI, which is described as:

MonocleAI, an AI bug detection and fixing platform where we use AI & ML techniques to learn from previous vulnerabilities to discover and fix future software defects before they cause software failures.

I’m sure these are still early days and we can’t expect this to be perfected yet, but I would still claim that from the submissions we’ve seen so far that this is useful stuff! After I tweeted about this “event”, several people expressed interest in how well the service performs, so let me elaborate on what we’ve learned already in this early phase. I hope I can back in the future with updates.

Disclaimers: I’ve been invited to try this service out as an early (beta?) user. No one is saying that this is complete or that it replaces humans. I have no affiliation with the makers of this service other than as a receiver of their submissions to the project I manage. Also: since this service is run by others, I can’t actually tell how much machine vs humans this actually is or how much human “assistance” the AI required to perform these actions.

I’m looking forward to see if we get more contributions from this AI other than this first batch that we already dealt with, and if so, will the AI get better over time? Will it look at how we adjusted its suggested changes? We know humans adapt like that.

Pull-request quality

Monocle still needs to work on adapting its produced code to follow the existing code style when it submits a PR, as a human would. For example, in curl we always write the assignment that initializes a variable to something at declaration time immediately on the same line as the declaration. Like this:

int name = 0;

… while Monocle, when fixing cases where it thinks there was an assignment missing, adds it in a line below, like this:

int name;
name = 0;

I can only presume that in some projects that will be the preferred style. In curl it is not.

White space

Other things that maybe shouldn’t be that hard for an AI to adapt to, as you’d imagine an AI should be able to figure out, is other code style issues such as where to use white space and where not no. For example, in the curl project we write pointers like char * or void *. That is with the type, a space and then an asterisk. Our code style script will yell if you do this wrong. Monocle did it wrong and used it without space: void*.

C89

We use and stick to the most conservative ANSI C version in curl. C89/C90 (and we have CI jobs failing if we deviate from this). In this version of C you cannot mix variable declarations and code. Yet Monocle did this in one of its PRs. It figured out an assignment was missing and added the assignment in a new line immediately below, which of course is wrong if there are more variables declared below!

int missing;
missing = 0; /* this is not C89 friendly */
int fine = 0;

NULL

We use the symbol NULL in curl when we zero a pointer . Monocle for some reason decided it should use (void*)0 instead. Also seems like something virtually no human would do, and especially not after having taken a look at our code…

The first issues

MonocleAI found a few issues in curl without filing PRs for them, and they were basically all of the same kind of inconsistency.

It found function calls for which the return code wasn’t checked, while it was checked in some other places. With the obvious and rightful thinking that if it was worth checking at one place it should be worth checking at other places too.

Those kind of “suspicious” code are also likely much harder fix automatically as it will include decisions on what the correct action should actually be when checks are added, or perhaps the checks aren’t necessary…

Credits

Image by Couleur from Pixabay