Dissecting Check_MK Crash Reports
If, like me, you happen to use Check_MK, sooner or later you will probably run into a check failure which generates a crash report. It will look something like this:
UNKNOWN - check failed - please submit a crash report!
Crash dump:
H4sIAMVjgVYAA+2V32vbMBDH/Zy/4tYXN5DEkmzHw6/dxsbYD9joSylGUZRG1JaMrGSMsf+9ZyfZ1q1lMEjG2H1eBLqv7k6nkzRLoqPDkCLP92MxjGzOduOOiIt8zgUTWcojxlmWpRHkx08tijZdkB4gMsrYG/m4TrZSrfUpMjopsyR4qY67r+G4s+yx8+c8F3j+84yJOS8Y9gnnWSEiYEfNas9/fv4AF2utbsFtQrsJJfSoYWYlTa2XMIW21rLT0G0WjQkgQXnZrcHr1vnwZLR3UL15DZfad8bZEvhMzLJWoO2ZDHrn9IBgPJ+Kp1MugM/LlJeMo+6l68J93SqXOP9B+61RP7p412oLF85arQIG6w4JQPjcftOt8mphbkxbKRT2kldBN/f9v3VWo+G99LLRATM/mL/Ew6K4hHORY3tOIO27dDyBuOvq6mHjV3T1sb9IC4mpnDe4HayQ0jaAknUNtezCuBwdgr/A0sJZspU+qc0iGQpeNbdJi2tc0/aFT7AAZxOojdXA8YpMwFhYugq9VYO+q5yt1hho9H1Tkf3TEGnBiyHETvhLAXsuZb3Rz713vkTlVtZmiYuxdrKGlfM4F87H8MmENSz6juGshDjeLx797U4nHmKWGLtyx43xm/cff4Ds5/c/FSm9/6fg6ipOY3za4utruqEEQRAEQRAEQRAEQRAEQRAEQRAEQRD/MHejC8RxACgAAA==
To have a look into the dump, you can decode it (it is base64) and then you will find that the contents are gzip’d. So:
$ echo 'H4sIAMVjgVYAA+2V32vbMBDH/Zy/4tYXN5DEkmzHw6/dxsbYD9joSylGUZRG1JaMrGSMsf+9ZyfZ1q1lMEjG2H1eBLqv7k6nkzRLoqPDkCLP92MxjGzOduOOiIt8zgUTWcojxlmWpRHkx08tijZdkB4gMsrYG/m4TrZSrfUpMjopsyR4qY67r+G4s+yx8+c8F3j+84yJOS8Y9gnnWSEiYEfNas9/fv4AF2utbsFtQrsJJfSoYWYlTa2XMIW21rLT0G0WjQkgQXnZrcHr1vnwZLR3UL15DZfad8bZEvhMzLJWoO2ZDHrn9IBgPJ+Kp1MugM/LlJeMo+6l68J93SqXOP9B+61RP7p412oLF85arQIG6w4JQPjcftOt8mphbkxbKRT2kldBN/f9v3VWo+G99LLRATM/mL/Ew6K4hHORY3tOIO27dDyBuOvq6mHjV3T1sb9IC4mpnDe4HayQ0jaAknUNtezCuBwdgr/A0sJZspU+qc0iGQpeNbdJi2tc0/aFT7AAZxOojdXA8YpMwFhYugq9VYO+q5yt1hho9H1Tkf3TEGnBiyHETvhLAXsuZb3Rz713vkTlVtZmiYuxdrKGlfM4F87H8MmENSz6juGshDjeLx797U4nHmKWGLtyx43xm/cff4Ds5/c/FSm9/6fg6ipOY3za4utruqEEQRAEQRAEQRAEQRAEQRAEQRAEQRD/MHejC8RxACgAAA==' | base64 -d | gzip -cd
./0000755000075700000600000000000012561202431010443 5ustar icingaapache./trace0000644000075700000600000000115212640261705011472 0ustar icingaapache Check output: check failed - please submit a crash report!
Check_MK Version: 1.2.4p2
Date: 2015-28-12 16:31:01
Host: f5a
Service: Open Connections
Check type: f5_bigip_conns
Item: None
Parameters: {'conns': (25000, 30000), 'ssl_conns': (25000, 30000)}
Traceback (most recent call last):
File "/var/lib/check_mk/precompiled/f5a", line 1115, in do_all_checks_on_host
n
File "/var/lib/check_mk/precompiled/f5a", line 3717, in check_f5_bigip_conns
ValueError: invalid literal for int() with base 10: ''
./info0000644000075700000600000000001412640261705011323 0ustar icingaapache[['3', '']]
What this says, basically, is that the return data from the check is a structure with two values: one for connections and one of ssl connections. In my case, I am getting a value of ‘3′ for connections and ‘’ for ssl connections, which is not a valid integer value. Troubleshooting this further requires diving into the check definition to figure out how the values are retrieved from the BigIP device.
# tail -15 /usr/share/check_mk/checks/f5_bigip_conns
check_info["f5_bigip_conns"] = {
'check_function' : check_f5_bigip_conns,
'inventory_function' : inventory_f5_bigip_conns,
'service_description' : 'Open Connections',
'has_perfdata' : True,
'group' : 'f5_connections',
'default_levels_variable' : 'f5_bigip_conns_default_levels',
'snmp_info' : ( '.1.3.6.1.4.1.3375.2.1.1.2', [
'1.8', # sysStatServerCurConns
'9.2', # sysClientsslStatCurConns
] ),
'snmp_scan_function' : lambda oid: '.1.3.6.1.4.1.3375.2' in \
oid(".1.3.6.1.2.1.1.2.0") and "big-ip" in \
oid(".1.3.6.1.4.1.3375.2.1.4.1.0").lower(),
}
Looking in the check definition, we can see that the info structure is populated from the output of a few SNMP OIDs, namely:
.1.3.6.1.4.1.3375.2.1.1.2.1.8 for conns
.1.3.6.1.4.1.3375.2.1.1.2.9.2 for ssl_conns
Querying the F5 device manually, I can see that the latter oid returns no data, and this is what is causing the ‘’ return value instead of a numeric value. If I back off the last digit or two from the oid and use snmpwalk and grep for ssl, I can see other oids which return connection counts. It looks like in the version of LTM we are running the data is not returning data as Check_MK expects.
After digging around for a bit, I refreshed the snmp configuration in the F5 device and configsync’d the two, and I can see the data again. Once this was done the checks are once again returning counts for SSL connections.
The crash reports and reading the python check_mk check code can help you diagnose most issues that would cause a check crash failure to occur.