This is the video recording of my talk with this title, done at February 4, 2024 10:00 in the K1.105 room at FOSDEM 2024. The room can hold some 800 people but there were a few hundred seats still unoccupied. Several people I met up with later have insisted that 10 am on a Sunday is way too early for attending talks…
When I was about to start my talk, the slides would not show on the projector. Yeah, sigh. Nothing surprising maybe, but you always hope you can avoid these problems – in particular in the last moment with a huge audience waiting.
There was this separate video monitor laptop that clearly showed that my laptop would output the correct thing – in a proper resolution (1280 x 720 as per auto-negotiation), but the projector refused to play ball. The live stream could also see my output, so the problem was somehow from the video box to the projector.
Several people eventually got involved, things were rebooted multiple times, cables were yanked and replugged in again. First after I installed arandr and forced-updated the resolution of my HDMI output to 1920×1080 the projector would suddenly show my presentation. (Later on I was told that people had the same problem in this room the day before…)
That was about nine minutes of technical difficulties that is cut out from the recording. Nine minutes to test my nerves and presentation finesse as I had to adapt.
A few weeks ago I mentioned how we fund Stefan’s work on improving HTTP(/3) in curl. Now, in similar spirit we are funding Dan Fandrich to work on further improving test infrastructure. Dan has worked fiercely on the introduction of parallel tests over the recent year or so and this is work that builds on that and continues down that road.
curl contains a regression test suite of over 1900 individual test cases that are run automatically on every commit submission and on every pull request in almost 130 different environments., meaning that every change can result in more than 140,000 tests being run. A spurious test failure rate of a mere 0.001% is likely to cause a perfectly good PR to end up showing with a red failure. A new contributor that doesn’t understand this problem can spend hours poring over his or her patch and the related code in curl, searching for a problem that isn’t there.
Analyzing 140,000 tests for each change to the curl source code to find failure trends (such as flaky tests) demands an automated solution. Dan has created a system (working name Test Clutch) that has been successfully ingesting curl CI test results for much of the past year and has been used by him to find flaky tests as well as permanently failing tests (often submitted under the mistaken impression that the failed test was merely flaky). It collects individual test results from all the CI systems used in the curl project into a database where they can be analyzed.
This system has potential to be useful to a broader base of curl developers to help see test trends, test platform coverage and to better determine which tests are flaky and could use improvement. It has been written in a fashion such that test results for other projects besides curl can also be added and analyzed separately.
Make Test Clutch available
The current test ingestion and analysis system will be productionized and the analysis summary table will be integrated into the curl web site for easy access for developers.
Assist in PR work
This task will involve writing code to trigger the test analysis system to retrieve detailed PR test results when available. It must make a reasonable determination of when all the expected tests have been completed (since not all tests will run for every PR) then commenting on the PR with a summary of the test results and believability of any test failures.
These are project that will benefit the project when implemented but they are not time sensitive and Dan is not going to work full time on them. There are no exact end dates set for them.
The result of Dan’s work will become visible in PRs and website updates as we go forward.