Group by fields with jq
08 May 2021 in TIL
Whilst working with some access logs I wanted to find out how many unique user agents were calling a given endpoint. Instead of writing a script, I wanted to see if jq
could handle it - it could!
Given the following input, I want to group by uri
, method
and user_agent
, returning a count
of how many matching rows are in each:
json
[{ "uri": "/pets", "method": "POST", "user_agent": "demo/1.1" },{ "uri": "/pets", "method": "GET", "user_agent": "demo/1.1" },{ "uri": "/names", "method": "GET", "user_agent": "another/7.3" },{ "uri": "/pets", "method": "POST", "user_agent": "none" },{ "uri": "/pets", "method": "POST", "user_agent": "none" },{ "uri": "/pets", "method": "POST", "user_agent": "another/7.3" }]
The expected output looks like the following:
json
[{ "uri": "/names", "method": "GET", "user_agent": "another/7.3", "count": 1 },{ "uri": "/pets", "method": "POST", "user_agent": "another/7.3", "count": 1 },{ "uri": "/pets", "method": "GET", "user_agent": "demo/1.1", "count": 1 },{ "uri": "/pets", "method": "POST", "user_agent": "demo/1.1", "count": 1 },{ "uri": "/pets", "method": "POST", "user_agent": "none", "count": 2 }]
To do this with jq
you use group_by
to chunk the results into grouped lists, then map
to collapse those lists into a single top level element, taking the length of the sub-list and storing it in count
:
bash
jq '. | group_by(.user_agent,.uri,.method) | map(.[0] + {"count": length})' /path/to/file.json