Follow GitHub Link headers with Bash
When working with the GitHub API, data may be returned across multiple pages of results. This is communicated using a
Link header, with
rel="next". There are libraries available to help work with this header, but if you’re writing a shell script then it’s not as easy as it could be.
If you’re looking for an example script, here’s one that fetches multiple pages of pull requests and outputs them to the terminal:
PULLS=""URL="https://api.github.com/repos/:owner/:repo/pulls?per_page=100"while [ "$URL" ]; doRESP=$(curl -i -Ss -H "Authorization: token $GITHUB_TOKEN" "$URL")HEADERS=$(echo "$RESP" | sed '/^\r$/q')URL=$(echo "$HEADERS" | sed -n -E 's/Link:.*<(.*?)>; rel="next".*/\1/p')PULLS="$PULLS $(echo "$RESP" | sed '1,/^\r$/d')"doneecho $PULLS
Be careful! Each page is a list of objects, so
$PULLSwon’t be valid JSON. Thankfully,
jqcan process this format just fine as it works with streaming data
Make a HTTP request:
RESPONSE=$(curl -i -Ss -H "Authorization: token $GITHUB_TOKEN" "$URL")
Extract just the HTTP Headers:
echo $RESPONSE | sed '/^\r$/q'
echo $RESPONSE | sed -n -E 's/Link:.*<(.*?)>; rel="next".*/\1/p')
Extract just the response body:
echo $RESPONSE | sed '1,/^\r$/d')
How it works
There are a lot of cool tricks in the script above - let’s take them one at a time.
while [ "$URL" ]; do
This script works due to the fact that
$URL will be empty if there’s no
rel="next" link header. We set the default URL to the first page, and if they all fit on a single page the loop will only execute once.
curl to fetch the data from the API. Using the
-i flag adds the response headers in addition to the JSON payload returned
sed command runs until it finds a line that matches the supplied pattern, then stops processing the input (
q means quit). By specifying
^\r$ as the match pattern it will stop executing as soon as it finds an empty line, signifying the end of the HTTP headers.
This means that once you run
HEADERS=$(echo "$RESP" | sed '/^\r$/q'), the variable
$HEADERS will contain only the HTTP headers for the response
sed -n -E 's/Link:.*<(.*?)>; rel="next".*/\1/p'
Now that we’ve got the headers, we can use
sed once again to extract the
rel="next" link from the
$HEADERS string. It looks for a line starting with
Link:, then anything until it finds a string contained between
>. It captures the matching pattern using parenthesis, but only if the next characters are
rel="next". Finally, it returns only the value of the matching group using
URL=$(echo "$HEADERS" | sed -n -E 's/Link:.*<(.*?)>; rel="next".*/\1/p'
If there is a
rel="next" link available, it’ll populate
$URL and the loop will run again, fetching the next page. If not, it’ll be empty and the loop will stop executing.
Finally, we need the JSON response without the headers.
sed comes to the rescue, this time using the
d (delete) modifier. This command says start at line
1, search until you find an empty line and then delete everything between those lines, returning the remaining content. This allows us to extract the response body without the HTTP headers.