Getting started with Problem Matchers

11 Oct 2021 in Infrastructure

Problem matchers are a relatively new concept that allow you to watch unstructured log output for specific details and add annotations to your source code based on what it finds.

This means that you don’t need to go reading through hundreds of lines of logs. Any relevant information will be added to your code inline, allowing you to see the information in context.

Imagine that you’re using ESLint for your JavaScript code. Instead of seeing that myVar is declared on line 50, then not used in the rest of the file on line 312 of a 500 log file, it’ll show that information on line 50 of your editor (or in your pull request!)

Here's how it looks in VSCode (I'm using the errorlens plugin to show the problem inline):

Problem matchers in VSCode

Then once I push to GitHub it shows in the Actions log:

Problem matchers in VSCode

Plus it shows as an inline annotation in the files list on the pull request page:

Problem matchers in VSCode

If you want to try building your own problem matchers after reading this post, you might find this problem matcher tester (source) useful.

How Problem Matchers work

If we strip problem matchers right back to their core, they’re just a regular expression (just - as though regular expressions don’t strike fear into all of our hearts!)

The way they work is by scanning each line of the output for a specific pattern and populating a predefined set of values using the strings that match the regular expression.

A minimal problem matcher has three values:

  • file
  • line
  • message

There is another type of matcher which specifies {"kind": "file"} rather than specifying a line entry. We’ll cover these later on

These values allow a message to be shown in the interface in the correct file on the correct line. Imagine that you have a log message containing the following:

src/sample.js:10:This is the message

In this regular expression, everything up to the first colon is the filename, the next section is the line number and the remainder of the line is the message to add. To match this message using a problem matcher you’d use the following configuration:

json
{
"owner": "demo-matcher",
"pattern": [
{
"regexp": "([^:]+):(\\d+):(.+)",
"file": 1,
"line": 2,
"message": 3
}
]
}

It’s important to note that in your JSON file, you need to escape any backslashes, which means that (\d+) becomes (\\d+).

In all future examples I’m only going to show the pattern section of this config file to make the examples smaller

You could take this problem matcher and deploy it and you’d start seeing annotations in your application, but this is just the start.

Single line matchers

Problem matchers can also provide a lot more context than just the file, line and message. You’ll find that most tools also output a severity level and column which you can use to augment your annotations. Using the example above, we can add this additional information into the error message:

ERROR:src/sample.js:10,12:This is the message

To capture the severity and column we need to add some additional capture groups to our regular expression, and add some additional values:

json
{
"regexp": "(ERROR|WARNING|INFO):([^:]+):(d+),(d+):(.+)",
"file": 2,
"line": 3,
"message": 5,
"severity": 1,
"column": 4
}

This regular expression will match anything that starts with ERROR, WARNING or INFO (the severity), followed by the file, then a line, a comma and then column, followed by the message.

The example above is well formatted, with colons to help mark the end of fields. Problem matchers can work with unstructured text too:

Sadly there was an error on line 19. "Something went wrong". It was on column 12 in sample.js

To match the above string, you’d use the following problem matcher:

json
{
"regexp": "Sadly there was an error on line (\\d+).\\s+\"([^\"]+)\".\\s+It was on column (\\d+) in ([\\w\\.]+)",
"file": 5,
"line": 2,
"message": 3,
"severity": 1,
"column": 4
}

Finally, there are a few additional properties that can be added in addition to file, line, message, severity and column:

  • location - a shorthand way to provide line and column in a single group. Allows for the format line, line,column or startLine,startColumn,endLine,endColumn
  • endLine - for multi-line notices. This will highlight all lines between line and endLine
  • endColumn - the same as endLine, but for columns
  • code - capture a standardised error code

I’ve not seen these additional properties used, with the exception of code which can be useful when capturing the rule name using a linting tool.

Multi-line matches

Problem matchers work on multi-line messages in addition to single line matchers.

Here’s some sample output from the ESLint stylish formatter:

test.js 1:0 error Missing "use strict" statement strict 5:10 error 'addOne' is defined but never used no-unused-vars foo.js 36:10 error Expected parentheses around arrow function argument arrow-parens 37:13 error Expected parentheses around arrow function argument arrow-parens ✖ 4 problems (4 errors, 0 warnings)

We can see that the first line provides the file, then each line after that provides the line, column, severity, message and code.

To match these values, problem matchers allows you to specify multiple regular expressions. Each regular expression will match as many lines as it can before moving on to the next expression if you set loop to true.

json
[
{
"regexp": "^([^\\s].*)$",
"file": 1
},
{
"regexp": "^\\s+(\\d+):(\\d+)\\s+(error|warning|info)\\s+(.*)\\s\\s+(.*)$",
"line": 1,
"column": 2,
"severity": 3,
"message": 4,
"code": 5,
"loop": true
}
]

In this configuration, the first regular expression matches everything up to the first space and stores it in file, then as that expression doesn’t match the next line it moves on to the next expression. This regular expression matches lines 2 and three, populating the provided values and looping until the regex no longer matches. At this point it goes back to the first expression and starts looping again.

File level matches

I mentioned earlier that you need to provide a line value unless you set kind: file. Let’s take a look at what kind: file allows us to do.

It’s unusual to have an error that doesn’t relate to a specific line in a file when you think about unit tests or style linting. However, when the unit of work you’re looking at gets bigger, sometimes error messages only make sense in the context of a whole file.

Imagine that you’ve got a set of acceptance tests that are configured using a YAML file that contains account information for a test user. The same test will run for multiple users, so adding an annotation on the single test wouldn’t make sense. Instead, we want to flag which configuration files failed.

Here’s the sample output from our testing tool:

[ i ] PROCESS SUITES/TESTS RESULTS ... ---------------------------------------------------------------------- Suite: acceptance_ui_account (1 tests) ---------------------------------------------------------------------- failed: 1 ---------------------------------------------------------------------- | 00:00:05 | failed | User can log in | user/login/alice.yml | 00:00:12 | passed | User can log in | user/login/bob.yml | 00:00:18 | error | User can log in | user/login/charlie.yml ---------------------------------------------------------------------- Suite start time : 12:41:09 Suite end time : 12:41:27 Suite elapsed time: 00:00:18 ---------------------------------------------------------------------- [ ! ] Failed tests list was stored in 'acceptance_ui_account' suite.

The regular expression for this is a bit more complex, but it allows us to specify a whole file to add an annotation:

json
{
"regexp": "\\|\\s+\\d+:\\d+:\\d+\\s+\\|\\s+(failed|error)\\s+\\|\\s+([^\\|]+)\\s+\\|\\s+(.*)$",
"file": 3,
"message": 2,
"severity": 1,
"kind": "file"
}

I’ve not seen any real-world usage of kind: file, so if you’re using it let me know.

Top level parameters

If your logs don’t contain a severity entry, you might be wondering how you can set this value to add annotations as an error or a warning. Problem matchers can specify a default severity value which will be applied to all entries:

json
{
"owner": "eslint-stylish",
"severity": "error",
"pattern": [
{
"regexp": "^([^\\s].*)$",
"file": 1
},
{
"regexp": "^\\s+(\\d+):(\\d+)\\s+(.*)\\s\\s+(.*)$",
"line": 1,
"column": 2,
"message": 4,
"code": 5,
"loop": true
}
]
}

In addition to severity, you can set a few other values in addition to owner (which don’t have any effect on GitHub, but do in VSCode):

  • applyTo - Set to allDocuments to run against all files, not just open files
  • background - Used to detect if a background task is running
  • base - The problem matcher to extend. Any properties specified in pattern will override the values in base
  • fileLocation - specify if the file path is relative or absolute

Actions with built-in matchers

Everything so far has covered writing your own problem matchers, but for common use cases you don’t need to.

If you’re a JavaScript developer using actions/setup-node, you’ll get the problem matchers for ESLint and Typescript (tsc) added for free. setup-go adds a compiler matcher, as do setup-dotnet and setup-elixir. setup-java adds a matcher for uncaught exceptions and setup-python will show errors.

Those are just the GitHub maintained setup-* actions too! You can also find available actions that register matchers for known patterns. For example, here’s a matcher for PHPUnit tests that matches the TeamCity output. A quick search also shows matchers for Android, GCC, StyleLint and Sphinx.

Building your own

The how problem matchers work section above covers almost everything you need to know to build your own problem matchers, but I wanted to mention one last trick that I needed to make a custom problem matcher work.

For annotations to be added automatically, the filename value must be relative to the root of the repository. However, some tools (such as PHPUnit) provide the absolute path to the file.

To handle this, I add the GITHUB_WORKSPACE path to my regular expression dynamically and keep it out of the capture group. As we register new matchers using JavaScript, it’s easy to replace a placeholder in the regex and write out a new matcher file.

Conclusion

Problem matchers are used by GitHub Actions and VSCode today, with Actions using a subset of the VSCode functionality.

On GitHub, problem matchers run against the log output for GitHub Actions, and must be registered in a workflow before the output is written to the log.

In VSCode, problem matchers are defined as part of a task, and run on the output of the task that is run.

I’ve not seen any other usages of problem matchers out in the world, but they feel like a good solution to parsing free text and making it in to something that is contextually useful.

If you’ve seen an awesome use of problem matchers out in the wild, I’d love to hear about it - I’m @mheap on Twitter.