Parallelizing Jest with GitHub Actions
While working on the automated testing of Capacitor Plugins, I realized the same parallelization strategy I was using could be applied to anything using GitHub Actions. Not only was it possible to create multiple jobs from a single definition using strategy.matrix
, it was possible to dynamically create them from prior job output. TLDR: You can run parallel tests in GitHub Actions, but you can also define the scaling rules for your continuous integration testing.
Before we dive in, Jest has some impressive options (--maxConcurrency
and --maxWorkers
) for tweaking performance on CI servers (but no built-in way to chunk tests across machines, see #2330). You can also host your own GitHub Actions runner to use your own dedicated hardware.
True parallelization (splitting tests across multiple machines) is a great strategy for boosting performance, but it might not be the best for your use case!
Let’s take a look at a simple workflow file example.
Whenever someone pushes to our repo, this job will checkout our files, install the dependencies, and run jest
.
As an example, let’s say we have four test files: difference.test.js
, product.test.js
, quotient.test.js
, and sum.test.js
. To illustrate this, I’ve set up this repo.
To parallelize our tests, we can use the matrix strategy offered by GitHub Actions. In the workflow file below, we hard-code test files into the test-file
matrix and use expression syntax to tell Jest which test to run.
The workflow then runs four jobs, one for each test file we’ve specified in our matrix.
Great! But also, not great. If we add or remove tests, we’d have to modify our workflow file. Also, chances are a real project would have more than four test files. What we really want is to put our test files into several buckets automatically and run each bucket of test files in its own job. To do that, we need to add a job that runs beforehand to gather test files and split them into groups.
The first thing we need is a way to list test files. We could use our test regex and query the file system directly, but luckily Jest provides a useful option for us, which outputs all test files as JSON:
$ npx jest --listTests --json
Next, we need to split the test files into groups. We could write a script (in fact, Lodash has a .chunk()
function), but the GitHub-hosted runners come with a surprising amount of preinstalled software, including jq, a useful utility for manipulating JSON on the command-line.
Let’s pipe the Jest output to jq and invoke an expression which will split the list of tests into groups. We can control the number of groups with the _nwise
function’s parameter. In this example, we take the length
of the input array (the number of test files found in the project) and chunk them into two groups.
You can change 2
to the number of desired groups, which will ultimately be the number of parallel jobs to run. For example, if your project has 450 test files, you can use 5
to create five groups: four with 100 test files and one with 50.
$ npx jest --listTests --json | jq '[_nwise(length / 2 | ceil)]'
[
[
"/Users/dan/git/parallelizing-jest/difference.test.js",
"/Users/dan/git/parallelizing-jest/quotient.test.js"
],
[
"/Users/dan/git/parallelizing-jest/product.test.js",
"/Users/dan/git/parallelizing-jest/sum.test.js"
]
]
Now all we need to do is hook this up to GitHub Actions.
By default, jobs run in parallel in GitHub Actions. To run jobs sequentially, we need to use the needs
keyword. This is important because it’s the only way a job can share its output with another job.
Steps can have output, too, but their output is only available within the job and doesn’t need to explicitly defined.
We need to create a setup
job which will gather and group the test files and store them as JSON in the job’s output. To do this, we’ll use the ::set-output
command during setup steps (line 14) and then expose the output using job output syntax (line 7).
Notice how we also created an array of indices to use for the chunk matrix (line 17). Using fromJson
, we can dynamically define a matrix strategy for parallelizing our tests (line 27). Each job picks out the chunk of tests to run and passes the list to Jest (line 32).
The workflow now runs two jobs (excluding the setup job), each testing their own chunk of test files.
Magic. 🎩✨ As we continue to add tests, our workflow will automatically scale the number of parallel jobs based on our chunk size. Your CI will scale with your test suite and you will never have to edit YAML files ever again, ever.