Group by fields with jq

08 May 2021 in TIL

Aggregate your logs using `jq` and get a count of unique tuples based on the contents of various fields

Whilst working with some access logs I wanted to find out how many unique user agents were calling a given endpoint. Instead of writing a script, I wanted to see if jq could handle it - it could!

Given the following input, I want to group by uri, method and user_agent, returning a count of how many matching rows are in each:

[
{ "uri": "/pets", "method": "POST", "user_agent": "demo/1.1" },
{ "uri": "/pets", "method": "GET", "user_agent": "demo/1.1" },
{ "uri": "/names", "method": "GET", "user_agent": "another/7.3" },
{ "uri": "/pets", "method": "POST", "user_agent": "none" },
{ "uri": "/pets", "method": "POST", "user_agent": "none" },
{ "uri": "/pets", "method": "POST", "user_agent": "another/7.3" }
]

The expected output looks like the following:

[
{ "uri": "/names", "method": "GET", "user_agent": "another/7.3", "count": 1 },
{ "uri": "/pets", "method": "POST", "user_agent": "another/7.3", "count": 1 },
{ "uri": "/pets", "method": "GET", "user_agent": "demo/1.1", "count": 1 },
{ "uri": "/pets", "method": "POST", "user_agent": "demo/1.1", "count": 1 },
{ "uri": "/pets", "method": "POST", "user_agent": "none", "count": 2 }
]

To do this with jq you use group_by to chunk the results into grouped lists, then map to collapse those lists into a single top level element, taking the length of the sub-list and storing it in count:

jq '. | group_by(.user_agent,.uri,.method) | map(.[0] + {"count": length})' /path/to/file.json