Monitoring CPU usage of a Linux system with check_logfiles
Posted on June 2nd, 2012 by lausser
Keeping an eye on cpu usage of your servers is one of the basic things in system monitoring. For Nagios (and Shinken, of course) you’ll find plenty of plugins for this task. However, i was never happy with the way they work. Most of the plugins you can download work like this: read a counter – sleep – re-read the counter. This technique not only adds an extra delay to the execution time of the plugin, but it only shows the state of things within a small time frame. If you run such a plugin every 5 minutes and it sleeps 5 seconds between the two measurements, you don’t know what happens in the other 295 seconds. This is a very small sample rate. Another technique is to read the counter and compare it to the value which was saved when the plugin ran last time. The new data will then be saved again (so it can be used in the next run). This way the calculation is based on a delta covering the whole time range between two subsequent runs of a plugin. One of the core functionalities of the check_logfiles plugin is to save persistent information after each run which can be used in the next run. It’s other job is to read lines from files. So why not use check_logfiles to read /proc/stat and save counters between the plugin’s runs? The result is this proof-of-concept, which again shows that check_logfiles is a tool for all kinds of monitoring jobs.
Please note that you need the newest release of check_logfiles.
=head1 NAME check_cpu.cfg - A config file for check_logfiles used to monitor cpu usage =head1 SYNOPSIS $ check_logfiles --config check_cpu.cfg =head1 DESCRIPTION From man (5) proc: /proc/stat kernel/system statistics. Varies with architecture. Common entries include: cpu 3357 0 4313 1362393 The amount of time, measured in units of USER_HZ (1/100ths of a second on most architectures, use sysconf(_SC_CLK_TCK) to obtain the right value), that the system spent in user mode, user mode with low priority (nice), system mode, and the idle task, respectively. The last value should be USER_HZ times the second entry in the uptime pseudo-file. In Linux 2.6 this line includes three additional columns: iowait - time waiting for I/O to complete (since 2.5.41); irq - time servicing interrupts (since 2.6.0-test4); softirq - time servicing softirqs (since 2.6.0-test4). Since Linux 2.6.11, there is an eighth column, steal - stolen time, which is the time spent in other operating systems when running in a virtualized environment Since Linux 2.6.24, there is a ninth column, guest, which is the time spent running a virtual CPU for guest operating systems under the control of the Linux kernel. The plugin check_logfiles is used to scan the /proc/stat file and read the cpu entry above. The numbers in this line are used to calculate the percentage of time the cpu has spent in each of the listed modes since check_logfiles was run for the last time. =head2 An Example $ check_logfiles --config check_cpu.cfg user: 3.90%, nice: 0.10%, sys: 2.89%, idle: 92.31%, iowait: 0.19%, irq: 0.00%, sirq: 0.40%, steal: 0.23%, guest: 0.00% | user=3.90% nice=0.10% sys=2.89% idle=92.31% iowait=0.19% irq=0.00% sirq=0.40% steal=0.23% guest=0.00% =head1 SEE ALSO man (5) proc =head1 COPYRIGHT Copyright Gerhard Lausser Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. =cut my @columns = (); my $percent = {}; @searches = ({ tag => 'cpu', logfile => '/proc/stat', type => 'virtual', criticalpatterns => ['^cpu\s+'], options => 'script', script => sub { my $numcols = scalar(split(/\s+/, $ENV{CHECK_LOGFILES_SERVICEOUTPUT})); if ($numcols == 4) { @columns = (qw(user nice sys idle)); } elsif ($numcols == 7) { @columns = (qw(user nice sys idle iowait irq sirq)); } elsif ($numcols == 8) { @columns = (qw(user nice sys idle iowait irq sirq steal)) } else { @columns = (qw(user nice sys idle iowait irq sirq steal guest)) } my $elapsed = time - $CHECK_LOGFILES_PRIVATESTATE->{lastruntime}; my $idx = 1; my $nowvalues = {}; my $lastvalues = {}; my $ticks = 0; foreach my $col (@columns) { $nowvalues->{$col} = (split(/\s+/, $ENV{CHECK_LOGFILES_SERVICEOUTPUT}))[$idx]; $idx++; $lastvalues->{$col} = exists $CHECK_LOGFILES_PRIVATESTATE->{$col} ? $CHECK_LOGFILES_PRIVATESTATE->{$col} : 0; if ($nowvalues->{$col} < $lastvalues->{$col}) { $lastvalues->{$col} = 0; } $CHECK_LOGFILES_PRIVATESTATE->{$col} = $nowvalues->{$col}; $deltavalues->{$col} = $nowvalues->{$col} - $lastvalues->{$col}; $ticks += $deltavalues->{$col}; } foreach my $col (@columns) { $percent->{$col} = 100 * $deltavalues->{$col} / $ticks; } }, }); $options = 'supersmartpostscript'; $postscript = sub { my @output = (); my @perfdata = (); foreach my $col (@columns) { push(@output, sprintf '%s: %.2f%%', $col, $percent->{$col}); push(@perfdata, sprintf '%s=%.2f%%', $col, $percent->{$col}); } printf "%s | %s\n", join(', ', @output), join(' ', @perfdata); return 0; }; |
Tags: check_logfiles, cpu, Icinga, linux, Nagios, Shinken
Filed under Nagios, OMD, Shinken |


