Getting started with Problem Matchers
Problem matchers are a relatively new concept that allow you to watch unstructured log output for specific details and add annotations to your source code based on what it finds.
This means that you don’t need to go reading through hundreds of lines of logs. Any relevant information will be added to your code inline, allowing you to see the information in context.
Imagine that you’re using ESLint for your JavaScript code. Instead of seeing that myVar
is declared on line 50, then not used in the rest of the file on line 312 of a 500 log file, it’ll show that information on line 50 of your editor (or in your pull request!)
Here's how it looks in VSCode (I'm using the errorlens plugin to show the problem inline):
Then once I push to GitHub it shows in the Actions log:
Plus it shows as an inline annotation in the files list on the pull request page:
If you want to try building your own problem matchers after reading this post, you might find this problem matcher tester (source) useful.
How Problem Matchers work
If we strip problem matchers right back to their core, they’re just a regular expression (just - as though regular expressions don’t strike fear into all of our hearts!)
The way they work is by scanning each line of the output for a specific pattern and populating a predefined set of values using the strings that match the regular expression.
A minimal problem matcher has three values:
file
line
message
There is another type of matcher which specifies
{"kind": "file"}
rather than specifying aline
entry. We’ll cover these later on
These values allow a message
to be shown in the interface in the correct file
on the correct line
. Imagine that you have a log message containing the following:
src/sample.js:10:This is the message
In this regular expression, everything up to the first colon is the filename, the next section is the line number and the remainder of the line is the message to add. To match this message using a problem matcher you’d use the following configuration:
json
{"owner": "demo-matcher","pattern": [{"regexp": "([^:]+):(\\d+):(.+)","file": 1,"line": 2,"message": 3}]}
It’s important to note that in your JSON file, you need to escape any backslashes, which means that (\d+) becomes (\\d+).
In all future examples I’m only going to show the
pattern
section of this config file to make the examples smaller
You could take this problem matcher and deploy it and you’d start seeing annotations in your application, but this is just the start.
Single line matchers
Problem matchers can also provide a lot more context than just the file
, line
and message
. You’ll find that most tools also output a severity
level and column
which you can use to augment your annotations. Using the example above, we can add this additional information into the error message:
ERROR:src/sample.js:10,12:This is the message
To capture the severity
and column
we need to add some additional capture groups to our regular expression, and add some additional values:
json
{"regexp": "(ERROR|WARNING|INFO):([^:]+):(d+),(d+):(.+)","file": 2,"line": 3,"message": 5,"severity": 1,"column": 4}
This regular expression will match anything that starts with ERROR
, WARNING
or INFO
(the severity
), followed by the file
, then a line
, a comma and then column
, followed by the message
.
The example above is well formatted, with colons to help mark the end of fields. Problem matchers can work with unstructured text too:
Sadly there was an error on line 19. "Something went wrong". It was on column 12 in sample.js
To match the above string, you’d use the following problem matcher:
json
{"regexp": "Sadly there was an error on line (\\d+).\\s+\"([^\"]+)\".\\s+It was on column (\\d+) in ([\\w\\.]+)","file": 5,"line": 2,"message": 3,"severity": 1,"column": 4}
Finally, there are a few additional properties that can be added in addition to file
, line
, message
, severity
and column
:
location
- a shorthand way to provideline
andcolumn
in a single group. Allows for the formatline
,line,column
orstartLine,startColumn,endLine,endColumn
endLine
- for multi-line notices. This will highlight all lines betweenline
andendLine
endColumn
- the same asendLine
, but for columnscode
- capture a standardised error code
I’ve not seen these additional properties used, with the exception of code
which can be useful when capturing the rule name using a linting tool.
Multi-line matches
Problem matchers work on multi-line messages in addition to single line matchers.
Here’s some sample output from the ESLint stylish formatter:
test.js 1:0 error Missing "use strict" statement strict 5:10 error 'addOne' is defined but never used no-unused-vars foo.js 36:10 error Expected parentheses around arrow function argument arrow-parens 37:13 error Expected parentheses around arrow function argument arrow-parens ✖ 4 problems (4 errors, 0 warnings)
We can see that the first line provides the file
, then each line after that provides the line
, column
, severity
, message
and code
.
To match these values, problem matchers allows you to specify multiple regular expressions. Each regular expression will match as many lines as it can before moving on to the next expression if you set loop
to true
.
json
[{"regexp": "^([^\\s].*)$","file": 1},{"regexp": "^\\s+(\\d+):(\\d+)\\s+(error|warning|info)\\s+(.*)\\s\\s+(.*)$","line": 1,"column": 2,"severity": 3,"message": 4,"code": 5,"loop": true}]
In this configuration, the first regular expression matches everything up to the first space and stores it in file
, then as that expression doesn’t match the next line it moves on to the next expression. This regular expression matches lines 2 and three, populating the provided values and looping until the regex no longer matches. At this point it goes back to the first expression and starts looping again.
File level matches
I mentioned earlier that you need to provide a line
value unless you set kind: file
. Let’s take a look at what kind: file
allows us to do.
It’s unusual to have an error that doesn’t relate to a specific line in a file when you think about unit tests or style linting. However, when the unit of work you’re looking at gets bigger, sometimes error messages only make sense in the context of a whole file.
Imagine that you’ve got a set of acceptance tests that are configured using a YAML file that contains account information for a test user. The same test will run for multiple users, so adding an annotation on the single test wouldn’t make sense. Instead, we want to flag which configuration files failed.
Here’s the sample output from our testing tool:
[ i ] PROCESS SUITES/TESTS RESULTS ... ---------------------------------------------------------------------- Suite: acceptance_ui_account (1 tests) ---------------------------------------------------------------------- failed: 1 ---------------------------------------------------------------------- | 00:00:05 | failed | User can log in | user/login/alice.yml | 00:00:12 | passed | User can log in | user/login/bob.yml | 00:00:18 | error | User can log in | user/login/charlie.yml ---------------------------------------------------------------------- Suite start time : 12:41:09 Suite end time : 12:41:27 Suite elapsed time: 00:00:18 ---------------------------------------------------------------------- [ ! ] Failed tests list was stored in 'acceptance_ui_account' suite.
The regular expression for this is a bit more complex, but it allows us to specify a whole file to add an annotation:
json
{"regexp": "\\|\\s+\\d+:\\d+:\\d+\\s+\\|\\s+(failed|error)\\s+\\|\\s+([^\\|]+)\\s+\\|\\s+(.*)$","file": 3,"message": 2,"severity": 1,"kind": "file"}
I’ve not seen any real-world usage of kind: file
, so if you’re using it let me know.
Top level parameters
If your logs don’t contain a severity
entry, you might be wondering how you can set this value to add annotations as an error or a warning. Problem matchers can specify a default severity
value which will be applied to all entries:
json
{"owner": "eslint-stylish","severity": "error","pattern": [{"regexp": "^([^\\s].*)$","file": 1},{"regexp": "^\\s+(\\d+):(\\d+)\\s+(.*)\\s\\s+(.*)$","line": 1,"column": 2,"message": 4,"code": 5,"loop": true}]}
In addition to severity
, you can set a few other values in addition to owner
(which don’t have any effect on GitHub, but do in VSCode):
applyTo
- Set toallDocuments
to run against all files, not just open filesbackground
- Used to detect if a background task is runningbase
- The problem matcher to extend. Any properties specified inpattern
will override the values inbase
fileLocation
- specify if the file path isrelative
orabsolute
Actions with built-in matchers
Everything so far has covered writing your own problem matchers, but for common use cases you don’t need to.
If you’re a JavaScript developer using actions/setup-node
, you’ll get the problem matchers for ESLint and Typescript (tsc
) added for free. setup-go
adds a compiler matcher, as do setup-dotnet
and setup-elixir
. setup-java
adds a matcher for uncaught exceptions and setup-python
will show errors.
Those are just the GitHub maintained setup-*
actions too! You can also find available actions that register matchers for known patterns. For example, here’s a matcher for PHPUnit tests that matches the TeamCity output. A quick search also shows matchers for Android, GCC, StyleLint and Sphinx.
Building your own
The how problem matchers work section above covers almost everything you need to know to build your own problem matchers, but I wanted to mention one last trick that I needed to make a custom problem matcher work.
For annotations to be added automatically, the filename
value must be relative to the root of the repository. However, some tools (such as PHPUnit) provide the absolute path to the file.
To handle this, I add the GITHUB_WORKSPACE
path to my regular expression dynamically and keep it out of the capture group. As we register new matchers using JavaScript, it’s easy to replace a placeholder in the regex and write out a new matcher file.
Conclusion
Problem matchers are used by GitHub Actions and VSCode today, with Actions using a subset of the VSCode functionality.
On GitHub, problem matchers run against the log output for GitHub Actions, and must be registered in a workflow before the output is written to the log.
In VSCode, problem matchers are defined as part of a task, and run on the output of the task that is run.
I’ve not seen any other usages of problem matchers out in the world, but they feel like a good solution to parsing free text and making it in to something that is contextually useful.
If you’ve seen an awesome use of problem matchers out in the wild, I’d love to hear about it - I’m @mheap on Twitter.