curl code coverage

Every once in a while someone brings up the topic of code coverage in relation to curl. What portion of the code is actually exercised when running the tests?

Honestly, we don’t know. We can’t figure it out. We are not trying to figure it out. We have to live with this.

We used to get a number

A few years back we actually did a build and a test run in our CI setup that used one of those cloud services that would monitor the code coverage and warn if we would commit something that drastically reduced coverage.

This had significant drawbacks:

First, the service was unstable which made it occasionally sound the horns because we had gone down to 0% coverage and that is bad.

Secondly, it made parts of the audience actually believe that what was reported by that service for a single build and a single test run was the final and accurate code coverage number. It was far from it.

We ended up ditching that job as it did very little good but some amount of harm.

Different build combinations – and platforms

Code coverage is typically the number of lines of code that were executed as a share out of the total amount of possible lines (lines that were compiled and used in the build, not lines of code that were not included in the complete source). Since curl offers literally many million build combinations, an evaluated code coverage number can only apply to that specific build combination. When using that exact setup and running a particular set of tests on a fixed platform.

Just getting the coverage rate off one of these builds is easy enough but is hardly representing the true number as we run tests on many build combinations doing many different tests.

Can’t do it all in a single test run

We run many different tests and some of the tests we limit and split up into several different specific CI jobs since they are very slow and by doing a smaller portion of the jobs in separate CI jobs, we allow them to run in parallel and thus complete faster. That is super complicated from a code coverage point of view as we would have to merge coverage data between numerous independent and isolated build runs, possibly running on different services, to get a number approaching the truth.

We don’t even try to do this.

Not the panacea

Eventually, even if we would be able to get a unified number from a hundred different builds and test runs spread over many platforms, what would it tell us?

libcurl has literally over 300 run-time options that can be used in combinations. Running through the code with a few different option combinations could theoretically reach almost complete code coverage and yet only test a fraction of the possibilities.

But yes: it would help us identify source code lines that are never executed when the tests run and it would be very useful.

Instead

We rely on manual (and more error-prone) methods of identifying what parts of the code we need to add more tests for. This is hard, and generally the best way to find weak spots is when someone reports a bug or a regression as that usually means that there was a lack of tests for that area that allowed the problem to sneak in undetected.

Of course we also need to make sure that all new features and functions get test cases added in parallel.

This is a rather weak system but we have not managed to make a better one yet.