10 April 2021
In Distributed Systems, if System Y performs a request to System X, you can use attributes like transaction_id, trace_id and span_id to navigate all logs belonging to a particular trace, and vice-versa.
But sometimes you have systems interacting which have no good log correlation. You could be still in MVP phase, or the system calling is a legacy system that no one want to touch, or even that the logs between systems are not structured in a way that is easy to query data.
You’ll find yourself looking in surrounding documents near the log you are interested in, or blindingly looking in some other system logs for some information that helps you understand some situation… and if your systems produce a lot of log data, finding the information you want can easily feel like finding a needle on a waystack.
Well… i found myself in this situation this week ¯(シ)/¯ .
Fortunately, the information i was searching was always on the surrounding documents of some specific log.
So i thought: Ok, I’ll spend 20 min of my weekend automating this.
Should be simple enough: Open browser developer tools, click the Kibana log link of View Surrounding Documents
, copy as curl. Tune some of the request fields, perform the request and use jq to look up the fields i want.
The original curl (without headers) is similar to this one:
curl 'https://<some_endpoint>/_msearch?rest_total_hits_as_int=true&ignore_throttled=true'
--data-raw $'{"index":"logstash-default*","ignore_unavailable":true,
"preference":1618072435228}\n{"size":5,"search_after":[1618071886078,23742571],
"sort":[{"@timestamp":{"order":"asc","unmapped_type":"boolean"}},{"_doc":{"order":"desc","unmapped_type":"boolean"}}],
"version":true,"_source":{"excludes":[]},"stored_fields":["*"],"script_fields":{},
"docvalue_fields":[{"field":"@timestamp","format":"date_time"}],
"query":{"bool":{"must":[{"constant_score":{"filter":{"range":{"@timestamp":{"format":"epoch_millis","gte":1618071886078,"lte":1618158286078}}}}}],
"filter":[],"should":[],"must_not":[]}},"timeout":"30000ms"}\n{"index":"logstash-default*","ignore_unavailable":true,"preference":1618072435228}\n
{"size":5,"search_after":[1618071886078,23742571],"sort":[{"@timestamp":{"order":"desc","unmapped_type":"boolean"}},
{"_doc":{"order":"asc","unmapped_type":"boolean"}}],"version":true,
"_source":{"excludes":[]},"stored_fields":["*"],"script_fields":{},
"docvalue_fields":[{"field":"@timestamp","format":"date_time"}],
"query":{"bool":{"must":[{"constant_score":{"filter":{"range":{
"@timestamp":{"format":"epoch_millis","lte":1618071886078,"gte":1617985486078}}}}}],
"filter":[],"should":[],"must_not":[]}},"timeout":"30000ms"}\n'
Ok, this big curl thing that you probably cannot see in your mobile phone screen, uses multisearch to execute several searches with a single API request.
The request makes use of the newline delimited JSON format (NDJSON). In simpler terms, it follows the following structure:
Header\n
Body\n
Header\n
Body\n
This is important and justifies why the curl data raw is –data-raw $’{}’. instead of –data-raw ‘{}’.
The notation $’…’ is a special form of quoting a string. Strings that are scanned for ANSI C like escape sequences.
So in a quick look i could see that two headers and two body, so it is performing two searches.
Each header contains the following information:
{
"index": "logstash-default*",
"ignore_unavailable": true,
"preference": 12353564645
}
The two bodies have a similar format:
{
"size": 5,
"search_after": [
1618071886078,
23742571
],
"sort": [
{
"@timestamp": {
"order": "asc",
"unmapped_type": "boolean"
}
},
{
"_doc": {
"order": "desc",
"unmapped_type": "boolean"
}
}
],
"version": true,
"_source": {
"excludes": []
},
"stored_fields": [
"*"
],
"script_fields": {},
"docvalue_fields": [
{
"field": "@timestamp",
"format": "date_time"
}
],
"query": {
"bool": {
"must": [
{
"constant_score": {
"filter": {
"range": {
"@timestamp": {
"format": "epoch_millis",
"gte": 1618071886078,
"lte": 1618158286078
}
}
}
}
}
],
"filter": [],
"should": [],
"must_not": []
}
},
"timeout": "30000ms"
}
The important fields are the following:
Now it is clear what the curl is doing: it is using the unix timestamp of a particular record to retrieve the 5 previous and following records.
So, if we give it the unix timestamp of the record, we should be able to obtain the neibouring information.
For that, i have choosen the following jq filter:
jq '.responses[0].hits.hits[] | ._source.payload.fields.<field_i_am_looking_for>'
we use the first array index of responses and we will return all of the elements of an array hits. We use the Pipe operator to run a filter for each of those results. In my case i want to retrieve fields _source.payload.fields.
I want to receive as input three fields:
To handle user input, i’ve written the following:
helpFunction()
{
echo ""
echo "Usage: $0 --sort <sort_id> --around <number_of_entries> --field <the field name i am looking for>"
echo -e "\t--sort sort field found on json entry"
echo -e "\t--size number of surrounding documents"
echo -e "\t--field field you are looking"
exit 1
}
ARGUMENT_LIST=(
"sort"
"around"
"field"
)
# read arguments
opts=$(getopt \
--longoptions "$(printf "%s:," "${ARGUMENT_LIST[@]}")" \
--name "$(basename "$0")" \
--options "" \
-- "$@"
)
eval set --$opts
while [[ $# -gt 0 ]]; do
case "$1" in
--sort)
argOne=$2
shift 2
;;
--around)
argTwo=$2
shift 2
;;
--field)
argThree=$2
shift 2
;;
--) shift ; break ;;
esac
done
# Print helpFunction in case parameters are empty
if [ -z "$argOne" ] || [ -z "$argTwo" ]|| [ -z "$argThree" ]
then
echo "Some or all of the parameters are empty";
helpFunction
fi
With sort input, we need to calculate the around timestamps:
around_1=$(( $argOne + 86400000 ))
around_2=$(( $argOne + 1985122 ))
And now we just need to call the curl!
Remeber that the notation $’…’ is being use on data-raw
. That means that each user input variable added to the curl body needs the notations $’…’.
So this:
curl something -d $'{"somefield1":'$argOne$',"somefield2":'$argTwo$'}'
Instead of this:
curl something -d $'{"somefield1":'$argOne',"somefield2":'$argTwo'}'
Did you notice that single $ after the variable and before the string?
So all together, the gist can be found here
This blog was originally posted on Medium–be sure to follow and clap!