Description

check_logfiles is a Plugin for Nagios which scans log files for specific patterns.

Motivation

The conventional plugins which scan log files are not adequate in a mission critical environment. Especially the missing ability to handle logfile rotation and inclusion of the rotated archives in the scan allow gaps in the monitoring. Check_logfiles was written because these deficiencies would have prevented Nagios from replacing a propritetary monitoring system.

Features

  • Detection of rotations - usually nightly logfiles are rotated and compressed. Each operating system or company has it’s own naming scheme. If this rotation is done between two runs of check_logfiles also the rotated archive has to be scanned to avoid gaps. The most common rotation schemes are predefined but you can describe any strategy (shortly: where and under which name is a logfile archived).
  • More than one pattern can be defined which again can be classified as warning patterns and critical patterns.
  • Triggered actions - Usually nagios plugins return just an exit code and a line of text, describing the result of the check. Sometimes, however, you want to run some code during the scan every time you got a hit. Check_logfiles lets you call scripts either after every hit or at the beginning or the end of it’s runtime.
  • Exceptions - If a pattern matches, the matched line could be a very special case which should not be counted as an error. You can define exception patterns which are more specific versions of your critical/warning patterns. Such a match would then cancel an alert.
  • Thresholds - You can define the number of matching lines which are necessary to activate an alert.
  • Protocol - The matching lines can be written to a protocol file the name of which will be included in the plugin’s output.
  • Macros - Pattern definitions and logfile names may contain macros, which are resolved at runtime.
  • Performance data - The number of lines scanned and the number of warnings/criticals is output.
  • Windows - The plugin works with Unix as well as with Windows (e.g. with ActiveState Perl).

Introduction

Usually you call the plugin with the –config option which gets the name of a configuration file:

nagios$ check_logfiles --config
OK - no errors or warnings

In it’s most simple form check_logfiles can get all the essential parameters as command line options. However, not all features can be utilized in this case.

nagios$ check_logfiles --tag=ssh --logfile=/var/adm/messages \
     --rotation SOLARIS \
     --criticalpattern 'Failed password for root'
OK - no errors or warnings |ssh=1722;0;0;0

nagios$ check_logfiles --tag=ssh --logfile=/var/adm/messages \
     --rotation SOLARIS \
     --criticalpattern 'Failed password for root'
CRITICAL - (1 errors in check_logfiles.protocol-2007-04-25-20-59-20) - Apr 25 20:59:15 srvweb8 sshd[10849]: [ID 800047 auth.info] Failed password for root from 172.16.224.11 port 24206 ssh2 |ssh=2831;0;1;0

In principle check_logfiles scans a log file until the end-of-file is reached. The offset will then be saved in a so-called seekfile. The next time check_logfiles runs, this offset will be used as the starting position inside the log file. In the event that a rotation has occurred in the meantime, the rest of the rotated archive will be scanned also.

Documentation

For the most simple applications it is sufficient to call check_logfile with command line parameters. More complex scan jobs can be described with a config file.

Command line parameters

  • --tag=<identifier> A short unique descriptor for this search. It will appear in the output of the plugin and is used to separare the different services.
  • --logfile=<filenname> This is the name of the log file you want to scan.
  • --rotation=<method> This is the method how log files are rotated.
  • --criticalpattern=<regexp> A regular expression which will trigger a critical error.
  • --warningpattern=<regexp> The same…a match results in a warning.
  • --criticalexception=<regexp> / –warningexception=<regexp> Exceptions which are not counted as errors.
  • --okpattern=<regexp> A pattern which resets the error counters.
  • --noprotocol Normally all the matched lines are written into a protocol file with this file’s name appearing in the plugin’s output. This option switches this off.
  • --syslogserver With this option you limit the pattern matching to lines originating from the host check_logfiles is running on.
  • --syslogclient=<clientname> With this option you limit the pattern matching to lines originating from the host named in this option.
  • --sticky[=<lifetime>] Errors are propagated through successive runs.
  • --unstick Resets sticky errors.
  • --config The name of a configuration file. The syntax of this file is described in the next section.
  • --configdir The name of a configuration directory. Configfiles ending in .cfg or .conf are (recursively) imported.
  • --searches=<tag1,tag2,…> A list of tags of those searches which are to be run. Using this parameter, not all searches listed in the config file are run, but only those selected. (–selectedsearches is also possible)
  • --report=[short|long|html]This option turns on multiline output (Default: off). The setting html generates a table which display the last hits in the service details view.
  • --maxlength=[length] With this parameter long lines are truncated (Default: off). Some programs (e.g. TrueScan) generate entries in the eventlog of such a length, that the output of the plugin becomes longer than 1024 characters. NSClient++ discards these.
  • --winwarncrit With this parameter messages in the eventlog are classified by the type WARNING/ERROR (Default: off). Replaces or complements warning/criticalpattern.
  • --rununique This parameter prevents check_logfiles from starting when there’s already another instance using the same config file. (exits with UNKNOWN)
  • --timeout=<seconds>. This parameter causes an abort of a running search after a defined number of seconds. It is an aborted in a controlled manner, so that the lines which have been read so far, are used for the computation of the final result.
  • --warning=<Number>. Complex handler-scripts can be provided with a warning-parameter (of course –critical is possible, too) this way. Inside the scripts the value is accessible as the macro CL_WARNING (resp. CL_CRITICAL).

Format of a configuration file

The definitions in this file are written with Perl-syntax. There is a distinction between global variables which influence check_logfiles as a whole and variables which are related to the single searches. A “search” combines where to search, what to search for, which weight a hit has, which action will be triggered in case of a hit, and so on…

$seekfilesdir A directory where files with status information will be saved after a run of check_logfiles. This status information helps check_logfiles to remember up to which position the log file has been scanned during the last run. This way only newly written lines of log files will be read. The default is /tmp or the directory which has been specified with the –with-seekfiles-dir of ./configure.
$protocolsdir A directory where check_logfiles writes protocol files with the matched lines. The default is /tmp or the directory which has been specified with the –with-protocol-dir of ./configure.
$protocolretention The lifetime of protocol files in days. After these days the files are deleted automatically The default is 7 days.
$scriptpath A list of directories where the triggered scripts can be found.(Separated by : under Unix and ; under Windows) The default is /bin:/usr/bin:/sbin:/usr/sbin or the directories which has been specified with the –with-trusted-path of ./configure.
$MACROS A hash with user-defined macro definitions. see below.
$prescript An external script which will be executed during the startup of check_logfiles. The macro $CL_TAG gets the value “startup”. $prescriptparams, $prescriptstdin and $prescriptdelay may be used like scriptparams, scriptstdin and scriptdelay.  
$postscript An external script which will be executed before the termination of check_logfiles. The macro $CL_TAG$ gets the value “summary”. $postscriptparams, $postscriptstdin and $postscriptdelay may be used like scriptparams, scriptstdin and scriptdelay.  
$options A list of options which control the influence of pre- and postscript. Known options are smartpostscript, supersmartpostscript, smartprescript and supersmartprescript. With the option report=”short|long|html” you can customize the plugin’s output. With report=long/html, the plugin’s output can possibly become very long. By default it will be truncated to 4096 characters (The amount of data an unpatched Nagios is able to process). The option maxlength can be used to raise this limit, e.g. maxlength=8192. The option seekfileerror defines the errorlevel, if a seekfile cannot be written, e.g. seekfileerror=unknown (default:critical). The same applies to protocolfileerror (default: ok). Usually the last error message will be shown in the first line of the output. With preview=5 you can tell check_logfiles to show for example the last 5 hits. (default is: preview=1)  
@searches An array whose elements (hash references) describe the actual work of check_logfiles. The keys for these hash references can be found in the next table.  

The single searches are further specified by the following parameters:

tag A unique identifier.
logfile The name of the log file to scan.
archivedir The name of the directory where archives will be moved to after a log file rotation. The default is the directory where the logfile resides. nagios/check_logfiles/rotation.png
rotation One of the predefined methods or a regular expression, which helps identify the rotated archives. If this key is missing, check_logfiles assumes that the log file will be simply overwritten instead of rotated.
type One of “rotating” (default if rotation was given), “simple” (default if no rotation was given), “virtual” (for files which will strictly be scanned from the beginning), “errpt” (if instead of a logfile the output of the AIX errpt command should be scanned), “ipmitool” (if the IPMI System Event Log should be scanned), “oraclealertlog” (if the alertlog of an Oracle database should be scanned through a database connection) or “eventlog” if the windows Eventlog should be scanned.
criticalpatterns A regular expression or a reference to an array of such expressions. If one of these expressions matches a line in the logfile, this is considered a critical error. If the expression begins with a “!”, then the meaning is reversed. It counts as a critical error if no match for this pattern is found.
criticalexceptions One or more regular expressions which invalidate a preceding match of criticalpatterns.
warningpatterns Corrensponds to criticalpatterns, except a warning instead of a critical error is created.
warningexceptions see above
okpatterns A regular expression or a reference to an array of such expressions. If one of these expressions matches a line in the logfile, all previous found warnings and criticals are discarded.
script If a pattern matches, this script will be executed. It must reside under one of the directories specified in $scriptpath. The script gets plenty of information about the hit via environment variables.
scriptparams Yo can provide command line parameters for the script here. They may contain macros. If $script is a code reference, $scriptparams must be a pointer to an array.
scriptstdin If the script expects input through stdin, you can describe it here. The string may also contain macros.
scriptdelay After the script has finished, check_logfiles may sleep for <delay> seconds before continuing it’s work.
options This is a string with a comma-separated list of options which let you fine-tune the search. Each option can be switched off be preceeding it’s name with “no”. The options in detail are explained in the next table:
template Instead of a tag , a search can also be identified by a template name. If you call check_logfiles with the –tag option, the according search will be run as if it was defined with a tagname. See examples.

Options

[no]script Controls whether a script can be executed. default: off
[no]smartscript Controls whether exitcode and output of the script shall be treated like an additional match. default: off
[no]supersmartscript Controls whether exitcode and output of the script should replace the triggering match. default: off
[no]protocol Controls whether the matching lines are written to a protocol file for later investigation. default: on
[no]count Controls whether hits are counted and decide over the final exit code. If not you can use check_logfiles also just to execute the triggered scripts. default: on
[no]syslogserver If set, only lines originating from the local host are taken into account. This is important if check_logfiles runs on a syslog server where many other hosts report their events to. default: off
[no]syslogclient=string A prefilter. Only lines matching the string are further examined. default:off
[no]perfdata Controls whether performance data should be added to the output. default: on
[no]logfilenocry Controls how to react, if the log file does not exist. By default this is a reason for an UNKNOWN error. If nologfilenocry is set, the missing log file will be acquiesced. default: on
logfilemissing Is used to change this UNKNOWN to a different status. With logfilemissing=critical you can have check_file_existence-functionality. default: unknown
[no]case Controls whether regular expressions are case-sensitive default: on
[no]sticky[=seconds] Controls whether an error is propagated through successive runs of check_logfiles. Once an error was found, the exitcode will be non-zero until an okpattern resets it or until the error expires after <second> seconds. Do not use this option until you know exactly what you do. default: off
[no]savethresholdcount Controls whether the hit counter will be saved between the runs. If yes, hit numbers are added until a threshold is reached (criticalthreshold). Otherwise the run begins with resetted counters. default: on
[no]encoding=string The logfile is encoded in Unicode. (e.g. ucs-2) default: off
[no]maxlength=number Truncates very long lines at the <number>-th character default: off
[no]winwarncrit Can be used instead of patterns to find all events of type WARNING/ERROR in the Windows-Eventlog default: off
[no]criticalthreshold=number A number which denotes how many lines have to match a pattern until they are considered a critical error. default: off
[no]warningthreshold=number A number which denotes how many lines have to match a pattern until they are considered a warning. default: off
[no]allyoucaneat With this option check_logfiles scans the entire logfile during the initial run (when no seekfile exists) default: off
[no]eventlogformat This option allows you to rewrite the message text of a Windows event. Normally it only consists of the field Message. You can enrich this string with additional information (EventID, Source,….)Scroll down for details. default: off
[no]preferredlevel If warningpattern and criticalpattern were chosen in a way that a specific line matches both of them (so the output looks like “1 error, 1 warning”), you can use this option to count only one of them. (e.g. with preferredlevel=critical the output would be “1 error”). default: off
[no]randominode This is used for a very special case, where the inode of the logfile is constantly changing. (for example because with every appended line the logfile is written entirely new) default: off
[no]randomdevno This is used for a very special case, where the device number of the device, where the logfile resides, is constantly changing. (this can happen with lvm and kvm disks) default: off
[no]savestate This option forces the creation of a seekfile for searches of type virtual default: off
[no]capturegroups If a pattern contains round parentheses for grouping, the variables $1, $2, … are stored in the macros CL_CAPTURE_GROUP1, CL_CAPTURE_GROUP2, … The number of these macros (the highest counter of CL_CAPTURE_GROUPx) can be found in CL_CAPTURE_GROUPS. These macros are best used as environment variables in a handler script. default:off
maxage=timerange Can be used for an extra check regarding the last modification time of the logfile. Timerange is: <number>[s|m|h]. If the logfile was not changed since this time (ex. 2h) then this counts as CRITICAL. default:off

Predefined macros

$CL_USERNAME$ The name of the user executing check_logfiles
$CL_HOSTNAME$ The hostname without domain
$CL_DOMAIN$ The DNS-domain
$CL_FQDN$ Both together
$CL_IPADDRESS$ The IP-adress
$CL_DATE_YYYY$ The current year
$CL_DATE_MM$ The current month (1..12)
$CL_DATE_DD$ The day of the month
$CL_DATE_HH$ The current hour (0..23)
$CL_DATE_MI$ The current minute
$CL_DATE_SS$ The current second
$CL_DATE_CW$ The current calendar week (ISO 8601:1988)
$CL_SERVICEDESC$ The name of the config file without extension.
$CL_NSCA_SERVICEDESC$ the same
$CL_NSCA_HOST_ADDRESS$ The local address 127.0.0.1
$CL_NSCA_PORT$ 5667
$CL_NSCA_TO_SEC$ 10
$CL_NSCA_CONFIG_FILE$ send_nsca.cfg
  The following macros change their value during the runtime.
$CL_TAG$ The tag of the current search ($CL_tag$ is the tag in minor letters)
$CL_TEMPLATE$ The name of the template used (if any).
$CL_LOGFILE$ The file to be scanned next
$CL_SERVICEOUTPUT$ The last matched line.
$CL_SERVICESTATEID$ The error level as a number 0..3
$CL_SERVICESTATE$ The error level as a word (OK, WARNING, CRITICAL, UNKNOWN)
$CL_SERVICEPERFDATA$ The Performancedata.
$CL_PROTOCOLFILE$ The file where all matching lines are written.

These macros are also available in scripts called out of check_logfiles. Their values are stored in environment variables, whose names are derived from the macro’s names. The preceding CL_ is replaced by CHECK_LOGFILES_. You can also access user defined macros. Their names are also prefixed with CHECK_LOGFILES_.

nagios:~> cat check_logfiles.cfg
$scriptpath = '/usr/bin/my_application/bin:/usr/local/nagios/contrib';
$MACROS = {
    MY_FUNNY_MACRO => 'hihihihohoho',
    MY_VOLUME => 'loud'
};

@searches = (
  {
    tag => 'fun',
    logfile => '/var/adm/messages',
    criticalpatterns => 'a funny pattern',
    script => 'laugh.sh',
    scriptparams => '$MY_VOLUME$',
    options => 'noprotocol,script,perfdata'
  },
);

nagios:~> cat /usr/bin/my_application/bin/laugh.sh
#! /bin/sh
if [ -n "$1" ]; then
  VOLUME=$1
fi
printf "It is %d:%d and my status is %s\n" \
  $CHECK_LOGFILES_DATE_HH \
  $CHECK_LOGFILES_DATE_MI \
  $CHECK_LOGFILES_SERVICESTATE

printf "I found something funny: %s\n" "$CHECK_LOGFILES_SERVICEOUTPUT"
if [ "$VOLUME" == "Xloud" ]; then
  echo "$CHECK_LOGFILES_MY_FUNNY_MACRO" | tr 'a-z' 'A-Z'
else
  echo "$CHECK_LOGFILES_MY_FUNNY_MACRO"
fi
printf "Thank you, %s. You made me laugh.\n" "$CHECK_LOGFILES_USERNAME"

Performance data

The number of scanned lines as well as the number of pattern matches (critical, warning and unknown) are appended to the plugin’s output in performance data format. You can suppress this by using the noperfdata option.

nagios$ check_logfiles --logfile /var/adm/messages \
     --criticalpattern 'Failed password' --tag ssh
CRITICAL - (4 errors) - May  9 11:33:12 localhost sshd[29742] Failed password for invalid user8 ... |ssh_lines27 ssh_warnings=0 ssh_criticals=4 ssh_unknowns=0

nagios$ check_logfiles --logfile /var/adm/messages \
     --criticalpattern 'Failed password' --tag ssh --noperfdata
CRITICAL - (2 errors) - May  9 11:58:48 localhost sshd[29813] Failed password for invalid user8 ...

Scripts

It is possible to execute external scripts out of check_logfiles. This can be at the startup phase ($prescript), before termination ($postscript) or every time a pattern matches a line. See example above. With the option “smartscript” output and exitcode of the script are treated like a match in the logfile and reflected in the overall result. The option “supersmartscript” makes output and exitcode of the script replace those of the triggering match. Pre- and Postscript declared as supersmart scripts directly influence the process of check_logfiles. The option “supersmartprescript” causes an immediate abort of check_logfiles if the prescript has a non-zero exit code. In this case output and exitcode of check_logfiles correspond to those of the prescript.

With the option “supersmartpostscript” output and exitcode of check_logfiles can be determined by the postscript. Thus a more meaningful output is possible.

Integration in Nagios

If you have just one service which uses check_logfiles you can hard-code the config file in your services.cfg/nrpe.cfg

define service {
  service_description   check_sanlogs
  host_name              oaschgeign.muc
  check_command       check_nrpe!check_logfiles
  is_volatile           1
  check_period          7x24
  max_check_attempts    1
  ...
}

define command {
  command_name          check_nrpe
  command_line          $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}

command[check_logfiles]=/opt/nagios/libexec/check_logfiles
     --config logdefs.cfg

If multiple services are based on check_logfiles you need multiple config files. I propose to name them after the service_description. In the following example we would have a directory cfg.d with config files solaris_check_sanlogs and solaris_check_apachelogs.

define service {
  service_description  logfilescan
  register             0
  is_volatile          1
  check_period         7x24
  max_check_attempts   1
  ...
}

define service {
  service_description  solaris_check_sanlogs
  host_name            oaschgeign.muc
  check_command
       check_nrpe_arg!20!check_logfiles!cfg.d/$SERVICEDESC$
  contact_group        sanadmin
  use                  logfilescan
}

define service {
  service_description  solaris_check_apachelogs
  host_name            oaschgeign.muc
  check_command
       check_nrpe_arg!20!check_logfiles!cfg.d/$SERVICEDESC$
  contact_group        webadmin
  use                  logfilescan
}

define command {
  command_name         check_nrpe_arg
  command_line         $USER1$/check_nrpe
       -H $HOSTADDRESS$ -t $ARG1$ -c $ARG2$ -a $ARG3$
}

The corresponding line in the host’s nrpe.cfg looks like that:

[check_logfiles]=/opt/nagios/libexec/check_logfiles --config $ARG1$

If you use nsclient++ under Windows, the entry in the NSC.ini looks like that:

check_logfiles=C:\Perl\bin\perl C:\libexec\check_logfiles --config $ARG1$

Installation

  • After unpacking the tar-archive you have to call ./configure. With ./configure –help you can show the options if you want to modify the default settings. However, these settings can later be overridden again by variables in the config file.
  • Linux systems are more restrictive regarding the permission of log files. The /var/log/messages file is not readable for non-root users. If you run check_logfiles as an unprivileged user, follow the link below and look for a trick in the examples.
  • –prefix=BASEDIRECTORY Specify here the directory where you want to install check_logfiles. (default: /usr/local/nagios)
  • –with-nagios-user=SOMEUSER The user which will own the check_logfiles script. (default: nagios)
  • –with-nagios-group=SOMEGROUP The group (default: nagios)
  • –with-perl=PATH_TO_PERL The path to your perl binary. (default: The perl in the current PATH)
  • –with-gzip=PATH_TO_GZIP The path to your gzip binary. (default: The gzip in the current PATH)
  • –with-trusted-path=PATH_YOU_TRUST The path where you expect your triggered scripts. (default: /sbin:/usr/sbin:/bin:/usr/bin)
  • –with-seekfiles-dir=SEEKFILES_DIR The directory where status files will be kept. (default: /tmp)
  • –with-protocols-dir=PROTOCOLS_DIR The directory where protocol files will be written to. (default: /tmp)
  • Under Windows you build the plugin with perl winconfig.pl. This will result in plugins-scripts/check_logfiles.
  • The file README.exe contains instructions how to build a Windows ninary check_logfiles.exe.

Scanning of an Oracle-Alertlog with the operating mode “oraclealertlog”

If you want to scan the alert log of an oracle database without having access to the database server on the operating system level (e.g. it is a Windows server or you are not allowed to log in to a Unix server for security reasons) and therefore no access to the alert file, then this file can be mapped to a database table. The contents of the file are then visible through a database connection by executing SQL SELECT statements. If you specify the type “oraclealertlog” in a check_logfiles configuration, this method is used to scan the alert log. You need some extra parameters in the configuration.

# extra parameters in the configuration file
@searches = ({
  tag => 'oratest',
  type => 'oraclealertlog',
  oraclealertlog => {
    connect => 'db0815',       # connect identifier
    username => 'nagios',      # database user
    password => 'hirnbrand',   # database password
  },
  criticalpatterns => [
...

Preparations on the part of the database administrator

Mapping external files to database tables is possible since Version 9. Use this script to prepare your database:
create_alert_log_table.sql

Preparations on the part of the Nagios administrator

Installation of the Perl-Modules DBI and DBD::Oracle (http://search.cpan.org/~pythian/DBD-Oracle-1.74/lib/DBD/Oracle.pm).

Scanning the Windows EventLog with the operating mode “eventlog”

The eventlog of Windows systems can be processed by check_logfiles like any other logfile. Each event is treated like a line. Also only those events get analyzed which appeared since the last run of check_logfiles.

In it’s most simple form an eventlog search looks like this:

@searches = ({
  tag => 'evt_sys',
  type => 'eventlog',
  criticalpatterns => ['error', 'fatal', 'failed', ....
  # logfile anzugeben ist hier nicht nötig, da sinnlos.

If the evaluation of events should not be based on patterns, but the windows-internal stati WARNING and ERROR, use the option winwarncrit.

@searches = ({
  tag => 'evt_sys',
  type => 'eventlog',
  options => 'winwarncrit',

It is also possible to analyze only a subset of all the events in the eventlog. You can use include- and exclude-filters for that.

@searches = ({
  tag => 'winupdate',
  type => 'eventlog',
  eventlog => {
    eventlog => 'system',
    include => {
      source => 'Windows Update Agent',
      eventtype => 'error,warning',
    },
    exclude => {
      eventid => '15,16',
    },
  },
  criticalpatterns => '.*',

With these settings, only those events are fetched from the eventlog which comply with the following requirements:
* The System-Eventlog is used
* Only events with the source “Windows Update Agent” are read.
* Only errors and warnings are read.
* Events with the IDs 15 and 16 are discarded.

Please be aware that the single include-requirements are combined by logical AND and the exclude-requirements are combined by logical OR. The comma-separated lists are always combined by OR.

filter = ((source == "Windows Update Agent") AND ((eventtype == "error") OR (eventtype == "warning"))) AND NOT ((eventid == 15) OR (eventid == 16))

You can change this behavior with the key “operation”. It takes the arguments “and” or “or”.

@searches = ({
  tag => 'winupdate',
  type => 'eventlog',
  eventlog => {
    eventlog => 'system',
    include => {
      source => 'Windows Update Agent',
      eventtype => 'error,warning',
      operation => 'or',
    },
    exclude => {
      eventid => '15,16',
    },
  },
  criticalpatterns => '.*',

Now the filter means: “Windows Update Agent” OR (“error” OR “warning”)

type => 'eventlog',
  eventlog => {
    eventlog => 'system',                 # system (default), application, security
    include => {
      source => 'Windows Update Agent',   # die Herkunft (Source) des Events
      eventtype => 'error,warning',       # error, warning, info, success, auditsuccess, auditfailure
      operation => 'or'                      # die logische Verknüpfung. Default ist "and"
    },
    exclude => {
      eventid => '15,16',                  # die ID des Events
    },
  },

Filters can also be used in commandline-mode.

check_logfiles --type "eventlog:eventlog=application,include,source=Windows Update Agent,eventtype=error,eventtype=warning,exclude,eventid=15,eventid=16"

With another option it is possible to rewrite an event’s message text. Normally check_logfiles sees the field Message when it tries to match a pattern. This is also what is shown in the plugin’s output. The option eventlogformat can be used to include the fields EventType, Source, Category, Timewritten and TimeGenerated in the output.

EventType: ERROR
EventID: 16
Source: W32Time
Category: None
Timewritten: 1259431241
TimeGenerated: 1259431241
Message: Der NtpClient verfügt über keine Quelle mit genauer Zeit.
options => 'eventlogformat="%w src:%s id:%i %m"',

With this eventlogformat the message text of the above event will be rewritten to:

2009-11-28T19:04:16 src:W32Time id:16 Der NtpClient verfügt über keine Quelle mit genauer Zeit.

The formatstring knows the following tokens:

%t EventType 
%i EventID 
%s Source 
%c Category 
%w Timewritten 
%g TimeGenerated 
%m Message

With %<number>m you can shorten the message to number characters.

Scanning the Windows EventLog with the operating mode “wevtutil”

Windows operating systems prior to Windows Vista use the standard (EVT) event logging format. Windows Vista and later clients and Windows 2008 and later servers use the newer EVTX (Crimson) event log format. And, what’s more important, there are not only the Windows Logs “application”, “system”, “security”, but a great number of Application and Services Logs.
Examples are Microsoft-Windows-PowerShell/Operational, Microsoft/Exchange/HighAvailability/Operational and many more.

Since version 3.7 check_logfiles can search these channels, too.

@searches = ({
    tag => "msps",
    type => "wevtutil",
    criticalpatterns => ["Microsoft", "PowerShell.*(ready|bereit)" ],
    warningpatterns => ["PowerShell.*(started|gestartet)" ],
    wevtutil => {
      eventlog => "Microsoft-Windows-PowerShell/Operational",
    }
});

Examples

Here you can find example configurations for several scenarios.

Download

check_logfiles-4.1.1.tar.gz

Changelog

  • 4.1.1 2022-11-21
    shorten/truncate seekfile filenames exceeding a length of 250 characters
  • 4.1.0.1 2022-06-29
    update dmesg reset
  • 4.1 2022-06-20
    new type dmesg
  • 4.0.1.7 2022-06-09
    errpt no longer uses it’s own unstick
  • 4.0.1.6 2022-05-01
    bugfix in unstick
  • 4.0.1.5 2022-03-25
    bugfix in all external modules (errpt, wevtutil,…)
  • 4.0.1.4 2022-02-01
    randomdevno and randomino are now command line parameters
  • 4.0.1.3 2022-01-29
    add option randomdevno to searches, fix issue #65
  • 4.0.1.2 2021-10-06
    reenable sys/resource.ph
  • 4.0.1.1 2021-09-28
    rm the // operator which fails with perl 5.8
    normally i would not do this, but customers with a support contract get their
    wishes fulfilled any time.
  • 4.0.1 2021-01-09
    sticky errors are numbered but not saved.
  • 4.0 2021-01-08
    rewrote the sticky code, matches expire now based on individual timestamps
  • 3.13 2020-09-21
    command line arguments containing special characters can be encoded with rfc3986://encoded_string
  • 3.12 2020-06-01
    update stdout redirection
  • 3.11.0.3 2020-04-09
    add –nosavethresholdcount and –thresholdexpiry to the command line parameters
  • 3.11.0.2 2019-03-05
    fix CL_HOSTNAME again, first check hostname() for “.”
  • 3.11.0.1 2019-03-04
    fix CL_HOSTNAME
  • 3.11 2019-02-22
    resolve path before do (pull request #46 datamuc)
  • 3.10 2018-11-30
    detect systemctl
    improve message parsing for type wevtutil
  • 3.9 2018-06-11
    add option maxage
  • 3.8.1.4 2017-11-13
    increase number of capture groups to 20
  • 3.8.1.3 2017-10-26
    fix issue #33
  • 3.8.1.2 2017-10-17
    fix getfilefingerprint for automounted nfs-shares (device-no jumps)
  • 3.8.1.1 2017-07-26
    fix rununique for windows
  • 3.8.1 2017-07-13
    fix eventlog rewind&unstick
    pull request #22, journald filtering
  • 3.8.0.3 2017-07-05
    bugfix in type wevtutil when eventlog contains spaces
  • 3.8.0.2 2017-04-25
    reorg Makefile.am
  • 3.8.0.1 2017-04-24
    add forgotten files to the dist
  • 3.8 2017-04-22
    add type systemd (journald), pull request #15 from adrianlzt
  • 3.7.6.3 2016-11-14
    fix perl undef mesg when truncating with %(0-9)s part II
  • 3.7.6.2 2016-11-11
    fix perl undef mesg when truncating with %(0-9)s
  • 3.7.6.1 2016-11-11
    truncate eventlog message with %(0-9)s
  • 3.7.6 2016-11-10
    add option preview
  • 3.7.5.2 2016-07-13
    fix debian packaging infrastructure
  • 3.7.5.1 2016-07-13
    move debian files into debian/
  • 3.7.5 2016-07-13
    add debian package build tools (Thanks Hannes Hoerl)
  • 3.7.4.2 2016-06-22
    bugfix for lastoffset and 16-bit-encodings
  • 3.7.4.1 2016-04-14
    bugfix for flatfiles using privatestate
    bugfix for protocolfileerror
  • 3.7.4 2015-12-08
    add global option protocolfileerror
  • 3.7.3.1 2015-11-02
    add logfilemissing as command line option
  • 3.7.3 2015-09-29
    bugfix for inversepat&allyoucaneat (pullrequest by ibpl)
  • 3.7.2 2015-09-22
    add option logfilemissing=[warning|critical]
  • 3.7.1.5 2015-09-07
    bugfix in –rununique, also add $options=”rununique”
  • 3.7.1.4 2015-08-04
    bugfix in loadstate(), very strange scenario
  • 3.7.1.3 2015-07-22
    bugfix in –warning/–critical macros
  • 3.7.1.2 - 2015-06-11
    cleanup my pidfile
  • 3.7.1.1 - 2015-06-03
    fix a macro-bug
  • 3.7.1 - 2015-04-22
    add homevartmp as another autodetect location for seekfiles
  • 3.7 - 2015-04-01
    add type wevtutil to support EVTX (Crimson) event logs.
  • 3.6.3 - 2014-12-21
    resolve macros in seekfilesdir
  • 3.6.2.1 - 2014-04-09
    fix eventid-format for tecad_win
  • 3.6.2 - 2014-04-08
    eventlogformat tecad_win
  • 3.6.1.1 - 2014-02-04
    fix a race-condition (pid file) in unix-daemon-mode (thanks Klaus Wagner)
  • 3.6.1 - 2014-01-25
    added search-option “capturegroups”
    add forgotten –allyoucaneat
  • 3.6 - 2013-11-14
    added global option “nooutputhitcount”
    added search-option “thresholdexpiry=”
    okpattern resets threshold counters
  • 3.5.3.3 - 2013-09-24
    exe files without x-bit can now run in a cygwin environment (Thanks Michael Glaser)
  • 3.5.3.2 - 2013-03-28
    fixed a bug in allyoucaneat (if used with rotations)
  • 3.5.3.1 - 2012-11-29
    –verbose finally works on the commandline
    htmlencode can also be an option inside a config file
  • 3.5.3 - 2012-10-26
    add option htmlencode (Thanks Sven Nierlein)
  • 3.5.2.1 - 2012-09-19
    fix a bug related to nfs-mounted logfiles under linux
  • 3.5.2 - 2012-06-21
    fix a bug in CL_PATTERN_KEY (Thanks Frank Rothaupt)
  • 3.5.1 - 2012-06-02
    add parameters –warning and –critical (they become CL_WARNING/CL_CRITICAL)
    add option “savestate” for type “virtual”
  • 3.5 - 2012-04-23
    –timeout aborts searches in a controlled manner
  • 3.4.7.1 - 2012-01-16
    fix a bug in maxmemsize and solaris
    fix a bug where a supersmartpostscript’s output was overwritten by longoutput
  • 3.4.7 - 2012-01-10
    add new type dumpel (customer’s request
    bugfix in errpt’s unstick method (Thanks Jim Winkle)
  • 3.4.6.1 - 2012-01-05
    make rotatewait a global option
    make logfileerror a global option
  • 3.4.6 - 2012-01-04
    add maxmemsize
    cleanup tab-indendation
    add option logfileerror (unlike seekfileerror it is local)
    add option rotatewait (sleep until chaos during rotation is over)
    [selected]searches can be regexp
    Eliminate “Use of qw(…) as parentheses is deprecated” warnings in perl 5.14 (Thanks Tommi)
  • 3.4.5.2 - 2011-11-08
    set the path to gzip for hpux /opt/contrib..)
    fix a bug where % in error messages caused ugly perl-errors when used with scriptstdin (Thanks Thomas Klaradic)
  • 3.4.5.1 - 2011-09-28
    seekfilesdir can be “autodetect” with a configfile
    also protocolsdir (dirname(dirname(cfgfile)) + [/var/tmp|/tmp]
    also scriptpath (dirname(dirname(cfgfile)) + [/local/lib/nagios/plugins|/lib/nagios/plugins]
    type executable
    fix a perl undef (patternkey stuff which i don’t remember)
  • 3.4.5
    add parameter –rununique
  • 3.4.4.2 - 2011-08-03
    patterns can be hashes
  • 3.4.4.1 - 2011-05-31
    seekfilesdir is now local (./var/tmp) in an OMD environment
  • 3.4.4 - 2011-04-19
    add parameter patternfile
  • 3.4.3.2 - 2011-03-15
    fix a bug with –type rotating::uniform on the commandline
  • 3.4.3.1 - 2011-03-10
    create the pidfile’s directory if it doesn’t exist
    new option unstick (Thanks Holger Reif)
  • 3.4.3 - 2011-01-19
    add pid file handling to avoid concurrent processes with –daemon
  • 3.4.2.2 - 2010-09-29
    add pattern loglog0bz2log1bz2 (Thanks Christian Schulz)
    add pattern ehl (Thanks Daniel Haist)
  • 3.4.2.1 - 2010-08-04
    add %u (User) to option eventlogformat
  • 3.4.2 - 2010-06-29
    fixed a bug where exceptions only worked if patterns were defined. (Thanks Heiko)
    small patch so filenames can be specified with –tag
  • 3.4.1 - 2010-05-08
    new option archivedirregexp
    fixed a bug in eventlogs. (take care of type EVENTLOG_SUCCESS)
  • 3.4 - 2010-05-07
    used a new version of par::packer for check_logfiles.exe (there were problems if PERL5LIB was set by an oracle/veritas/… perl installation)
  • 3.3 - 2010-04-27
    speedup in pattern matching
    new (global) option seekfileerror
    added Win32::Daemon to the windows version
  • 3.2 - 2010-04-08
    better errorhandling for type=eventlog. non-existing eventlogs and dead remote servers result in unknown
    type=eventlog now opens a secure channel to ipc$ if necessary
    type=eventlog now checks if the desired eventlog exists (reads registry)
    speedup in tivoli mode
    add 099benchmark.t
  • 3.1.5 - 2010-03-05
    lookback option is now allowed in the config file
    fixed a bug which caused a perl-warning (only if criticalpattern=.* and last line is empty). (Thanks Sven Nierlein)
    matching empty lines are displayed as (null)
  • 3.1.4 - 2010-02-24
    just beautify the release string
  • 3.1.3.2 - 2010-02-24
    added option randominode (Thanks Sergio)
    implemented the allyoucaneat option in Eventlog
    added option preferredlevel
  • 3.1.3.1 - 2010-01-14
    made the logfile name visible in PRIVATESTATE
    changed HOMEPATH to USERPROFILE for the Windows HOME (Thanks Richard Tryzna)
  • 3.1.3 - 2009-12-12
    fixed a bug in module Ipmitool
  • 3.1.2 - 2009-12-08
    fixed a bug in scriptparams+macros+batsccipt
  • 3.1.1 - 2009-12-02
    max plugin output length is now configurable with $options=”maxlength=8192”
  • 3.1 - 2009-11-22
    report can now be set in a cfgfile (global, e.g. $options=”report=long”)
    new option “allyoucaneat” (the initial run starts from line 0)
    new option “eventlogformat” (e.g. options=’eventlogformat=”id:%i %m”,..’)
    Eventlog can now be filtered with include and exclude
    new module Esxdiag
  • 3.0.4 - 2009-09-20
    accept the contents of a config file as encoded string
  • 3.0.3.1 - 2009-09-07
    Fixed a bug where incorrect EventIDs were read from the EventLog
  • 3.0.3 - 2009-08-26
    Speedup in Eventlog scans
    Under some OSs the daemon did not detach itself from the terminal
  • 3.0.2 - 2009-07-23
    fixed a bug for –config. (Windows uses HOMEPATH instead of HOME)
    fixed a bug in Eventlog+Tivoli (Thanks Werner Breitschmid)
  • 3.0.1 - 2009-06-25
    fixed a bug in Eventlog+Tivoli
    added match_them_all and match_never_ever as predefined patterns
  • 3.0 - 2009-06-19
    added the ability to run as a windows service. (needs Win32::Daemon)
  • 2.6 2009-05-26
    added the –lookback parameter to simulate filter-written of CheckEventLog
    –critical/warningpattern can now be “match_them_all” instead of “.*”
    –archivedir is now also a cmd line parameter
    added the –daemon parameter.
    warning/criticalthreshold moved into options.
    added –warning/criticalthreshold to the list of possible comdline parameters
    Sven Nierlein wrote a module which reads patterns from a Tivoli Format File.
    fixed incorrect calculation of protocolretention. (Thanks Rainer Rose)
  • 2.5.6.1 - 2009-03-25
    there was some debugging output left from 2.5.6
  • 2.5.6 - 2009-03-25
    fixed a bug in oraclealertlog+sticky
    rewrote oraclealertlog so that the key is database time and not the plugin’s system time
    added the –macro parameter, e.g. –macro CL_LOGIN=nagios –macro CL_PASS=secret
    added errorresource to type errpt
    added the –nocase parameter
    fixed a bug with line endings in unicode files
  • 2.5.5.2 - 2009-02-20
    added the report variable to config files
    more extensive testing of the logfile’s permissions
    added the option maxlength which truncates lines (Thanks Thomas Borger)
    added the option winwarncrit which uses EventLog types instead of patterns
  • 2.5.5.1 - 2009-02-02
    another bugfix for blanks in protocolsdir
    i accidentially published a messed-up version of 2.5.5
  • 2.5.5 - 2008-01-23
    multiline output with –report=long/html
    bugfix in rotation patterns (Thanks Elbert Lai and Prasana Iyengar)
    bugfix in type=oraclealertlog
    bugfix in scripts and windows pathnames with blanks. (Thanks Juergen Walker)
  • 2.5 - 2008-11-04
    native support for Windows eventlog (type=eventlog)
  • 2.4.1.9 - 2008-10-30
    bugfix in handling of config file paths. (Thanks Ken Harford)
  • 2.4.1.8 - 2008-10-24
    bugfix in Windows scriptpath. (Thanks Markus Wagner)
    relative pathnames for config files are now possible under windows
  • 2.4.1.7 - 2008-10-10
    bugfix in rotating::uniform and macros in rotation
    bugfix scriptparams with CL_TAG (Thanks Markus Wagner)
  • 2.4.1.6 - 2008-09-03
    added parameter –environment
  • 2.4.1.5 - 2008-08-15
    syslogclient hostnames can be case insensitive with option nocase
  • 2.4.1.4 - 2008-07-20
    scripts have access to a state hash, Environ. LAST_RUNTIME, RUN_COUNT
    bugfix in type=uniform
  • 2.4.1.3 - 2008-06-24
    fixed a bug in –sticky= (Thanks Severin Rossignol)
  • 2.4.1.2 - 2008-06-18
    fixed a bug in CL_DATE_YY (Thanks beboy)
  • 2.4.1.1 - 2008-05-29
    archivedir can contain macros
  • 2.4.1 - 2008-05-22
    fixed a bug in sticky code (Thanks Nils Mueller)
  • 2.4 - 2008-05-07
    added support for oracle alert log through database connection
  • 2.3.3 - 2008-04-10
    introduced -F which allows directories full of configfiles
    (ending in .cfg or .conf)
    fixed a typo in LOGLOG0LOG1 definition
  • 2.3.2.1 - 2008-02-26
    fixed a bug which appeared with perl 5.10
    tinkered with encoding.
  • 2.3.2 - 2008-02-12
    added support for ipmitool system event log.
    fixed a small errpt bug.
    added decoding of ucs-2 encoded files as proposed by Dominic Horn.
  • 2.3.1.3 - 2008-01-28
    small change to make it work with perl 5.10
  • 2.3.1.2 - 2007-12-27
    added macro CL_PROTOCOLFILE
    more commandline options
    Fixed a bug in conjunction with very big logfiles.
  • 2.3.1.1 - 2007-11-16
    Fixed a bug concerning sticky. (Thanks Marc Richter)
    New option savethresholdcount. (Thanks Hannu Kivimäki)
  • 2.3.1 - 2007-10-14
    Added search templates. Thanks Axel.
    Threshold counters are remembered.
    Fixed a big in scriptparams found by Niall Downie.
    Support for bzip2’ed archives
  • 2.3 - 2007-09-10
    Added AIX errpt as a new type of logfile.
    Performance data are now in the expected format.
    Added the sticky option. (I hate it. No support!)
    Added the syslogclient option.
    Error counters can now be reset with okpatterns.
    Buxfixes for supersmart postscript output.
  • 2.2.4.1 - 2007-06-11
    Fixed a bug (–searches) found by Mark Petersen
  • 2.2.4 - 2007-06-06
    Added support for “virtual” files. (like /proc/*)
  • 2.2.3 - 2007-06-05
    Fixed a bug with non-linux unices.
    Banged my head against the table.
  • 2.2.2 - 2007-06-02
    Added support for supersmart pre/postscripts with no output
  • 2.2.1 - 2007-06-01
    Added parameters to perl-based scripts
    Fixed bugs in DOS batch files
  • 2.2 - 2007-05-31
    Scripts can now be code references.
    Added smart scripts.
  • 2.1.2 - 2007-05-24
    Added the [no]case option to enable case insensitive searches.
    Fixed a bug related to acls and linux. (thx Marcus Fleige).
  • 2.1.1 - 2007-05-21
    Removed sloppyness from the release 2.1
  • 2.1 - 2007-05-21
    Added support for Windows (ActiveState Perl)
    Added the mod_log_rotate method for Apache and Windows.
    Fixed a bug in configure for solaris/awk
    Added “selectedsearches” as proposed by Lars Stavholm.
  • 2.0 - 2007-05-09
    New layout of code using perl objects.
    Added handling of nonrotating logfiles as proposed by Kai Nielsen.
    Added performance data.
    Bugs, improvements, cosmetics, tests
  • 1.3.6.1 - 2006-10-20
    Corrected a bug which created protocol files even if no pattern matched.
    Added a delay option as proposed by some users of the syslog check method.
  • 1.3.6
    Added execution of scripts if inverse patterns are not found.
    Corrected typos in README
    Added command line parameters as proposed by Hendrik “Andurin” Baecker.
  • 1.3.5 - 2006-08-23
    Code cleanup
    Removed nsca support in favour of a more flexible script handling.
    Added support for script parameters and modeling of the script’s input.
    Matches are now passed as environment parameters to scripts.
  • 1.3.4 - 2006-08-06
    Added support for shifting logfilenames through macros.
  • 1.3.3 - 2006-07-03
    Added nsca support for standalone use of check_logfiles
  • 1.3.2 - 2006-07-27
    Added “watchdog” patterns which raise an alert when not found.
    Added exceptions for patterns which stop processing of a preceding match.
    More documentation in README
    Fixed syntax errors found by Doug Lochart.
  • 1.3.1 - 2006-07-24
    Added automatic deletion of old protocol files.
    Added handling of an unreadable logfile.
  • 1.3 - 2006-07-04
    Added an option for syslogservers to filter out foreign log entries.
    Added macros in patterns.
    Fixed a bug in timeout handling.
    Added Debian to the list of predefined rotation methods.
  • 1.2.6 - 2006-07-03
    Added options {critical,warning}threshold as proposed by jorge cabrera.
  • 1.2.5 - 2006-04-17
    Fixed a bug in the fake seek algorithm.
  • 1.2.4 - 2006-04-15
    Added a workaround to enable seeking in a pipe.
    Minor bugfixing in tracing output.
    Added some more examples to README.
  • 1.2.3 - 2006-04-11
    Minor modifications to tracing.
    Duplicate file detection to prevent scanning the same file twice.
    Added examples to README.
  • 1.2.2 - 2006-03-31
    Added an new item to tracing as requested
  • 1.2.1 - 2006-03.27
    Fixed a small bug in mtime comparison
  • 1.2 - 2006-03-27
    Changed the default timeout to 60 seconds
    Added a better rotation detection
    Added -d option to activate extensive tracing
    Bugfixing and commenting
  • 1.1 - 2006-03-24
    Added the first match to the plugin’s output
  • 1.0 - 2006-03-12
    Initial release

Gerhard Laußer

Check_logfiles wird unter der GNU General Public License zur Verfügung gestellt.

Autor

Gerhard Laußer (gerhard.lausser@consol.de) beantwortet gerne Fragen zu diesem Plugin.