Nix CI Benchmarks

Benchmarks for Nix CI build times across different CI platforms

We picked some open source projects with lots of stars that have Nix already set up, forked them, and ran various different Nix CIs on the same commits, measuring the time CI took. Because Nix is Nix, these CIs did the same work, broadly speaking, making the comparisons more significant than they would be across wholly different stacks.

We ran the following setups:

These setups span the range of using GitHub Actions for everything (1-2), to including external caches (3), to including external builds (4), to having external evaluators too, and not using GitHub runners for anything (5).

We picked more popular repos that already had largely working Nix builds. For a start, we also focused on:

In the future, we might have different test types, excluding from those CIs that don't support the required features.

It's useful to understand the methodology of these benchmarks to interpret them correctly. For the impatient, however, you can skip to the results.

Methodology

We wrote a script that:

You can see all the GitHub Actions workflow runs (for 1-4) here and the garnix logs here.

Note that the packages and checks that we checked may differ from the ones that are enabled on the repo's own CI.

For configuration:

Running it yourself

If you'd like to try it out yourself, you can follow these steps:

Important notes:

The results

We ran the benchmarks for three repos:

The commits picked were the last ten commits at the time the benchmark started. You can see the individual commit hashes by hovering on a datapoint

Note that by default we exclude the first commit from any calculations. This is for two reasons

Analysis

A few facts stand out:

Future improvements

There are a few benchmarks missing:

Ideally, we would also be more systematic in what repositories we check. Most starred repositories that are not starter templates or documentation, with a flake.nix which builds a substantial part of the project (instead of e.g. just devshells) might be a good criterion. Unfortunately, we couldn't figure out how to get GitHub search to accurately show repos, ordered by stars, with a flake.nix.

It was probably a mistake to let GitHub Actions (serial) fail on first error, since it means it did less work than other CIs. If we were to rerun this, we would change that.

Loading data...