check_logfiles

Posted on July 12th, 2009 by lausser

Description

check_logfiles is a Plugin for Nagios which scans log files for specific patterns.

Motivation

The conventional plugins which scan log files are not adequate in a mission critical environment. Especially the missing ability to handle logfile rotation and inclusion of the rotated archives in the scan allow gaps in the monitoring. Check_logfiles was written because these deficiencies would have prevented Nagios from replacing a propritetary monitoring system.

Features

  • Detection of rotations – usually nightly logfiles are rotated and compressed. Each operating system or company has it’s own naming scheme. If this rotation is done between two runs of check_logfiles also the rotated archive has to be scanned to avoid gaps. The most common rotation schemes are predefined but you can describe any strategy (shortly: where and under which name is a logfile archived).
  • More than one pattern can be defined which again can be classified as warning patterns and critical patterns.
  • Triggered actions – Usually nagios plugins return just an exit code and a line of text, describing the result of the check. Sometimes, however, you want to run some code during the scan every time you got a hit. Check_logfiles lets you call scripts either after every hit or at the beginning or the end of it’s runtime.
  • Exceptions – If a pattern matches, the matched line could be a very special case which should not be counted as an error. You can define exception patterns which are more specific versions of your critical/warning patterns. Such a match would then cancel an alert.
  • Thresholds – You can define the number of matching lines which are necessary to activate an alert.
  • Protocol – The matching lines can be written to a protocol file the name of which will be included in the plugin’s output.
  • Macros – Pattern definitions and logfile names may contain macros, which are resolved at runtime.
  • Performance data – The number of lines scanned and the number of warnings/criticals is output.
  • Windows – The plugin works with Unix as well as with Windows (e.g. with ActiveState Perl).

Introduction

Usually you call the plugin with the –config option which gets the name of a configuration file:

nagios$ check_logfiles --config 
OK - no errors or warnings

In it’s most simple form check_logfiles can get all the essential parameters as command line options. However, not all features can be utilized in this case.

nagios$ check_logfiles --tag=ssh --logfile=/var/adm/messages
     --rotation=SOLARIS
     --criticalpattern="Failed password for root"
OK - no errors or warnings |ssh=1722;0;0;0
nagios$ check_logfiles --tag=ssh --logfile=/var/adm/messages
     --rotation=SOLARIS
     --criticalpattern="Failed password for root"
CRITICAL - (1 errors in check_logfiles.protocol-2007-04-25-20-59-20)
     - Apr 25 20:59:15 srvweb8 sshd[10849]:
[ID 800047 auth.info] Failed password for root
     from 172.16.224.11 port 24206 ssh2 |ssh=2831;0;1;0

In principle check_logfiles scans a log file until the end-of-file is reached. The offset will then be saved in a so-called seekfile. The next time check_logfiles runs, this offset will be used as the starting position inside the log file. In the event that a rotation has occurred in the meantime, the rest of the rotated archive will be scanned also.

Documentation

For the most simple applications it is sufficient to call check_logfile with command line parameters. More complex scan jobs can be described with a config file.

Command line options

  • –tag=<identifier> A short unique descriptor for this search. It will appear in the output of the plugin and is used to separare the different services.
  • –logfile=<filenname> This is the name of the log file you want to scan.
  • –rotation=<method> This is the method how log files are rotated.
  • –criticalpattern=<regexp> A regular expression which will trigger a critical error.
  • –warningpattern=<regexp> The same…a match results in a warning.
  • –criticalexception=<regexp> / –warningexception=<regexp> Exceptions which are not counted as errors.
  • –okpattern=<regexp> A pattern which resets the error counters.
  • –noprotocol Normally all the matched lines are written into a protocol file with this file’s name appearing in the plugin’s output. This option switches this off.
  • –syslogserver With this option you limit the pattern matching to lines originating from the host check_logfiles is running on.
  • –syslogclient=<clientname> With this option you limit the pattern matching to lines originating from the host named in this option.
  • –sticky[=<lifetime>] Errors are propagated through successive runs.
  • –unstick Resets sticky errors.
  • –config The name of a configuration file. The syntax of this file is described in the next section.
  • –configdir The name of a configuration directory. Configfiles ending in .cfg or .conf are (recursively) imported.
  • –searches=<tag1,tag2,…> A list of tags of those searches which are to be run. Using this parameter, not all searches listed in the config file are run, but only those selected. (–selectedsearches is also possible)
  • –report=[short|long|html]This option turns on multiline output (Default: off). The setting html generates a table which display the last hits in the service details view.
  • –maxlength=[length] With this parameter long lines are truncated (Default: off). Some programs (e.g. TrueScan) generate entries in the eventlog of such a length, that the output of the plugin becomes longer than 1024 characters. NSClient++ discards these.
  • –winwarncrit With this parameter messages in the eventlog are classified by the type WARNING/ERROR (Default: off). Replaces or complements warning/criticalpattern.
  • –rununique This parameter prevents check_logfiles from starting when there’s already another instance using the same config file. (exits with UNKNOWN)
  • –timeout=<seconds>. This parameter causes an abort of a running search after a defined number of seconds. It is an aborted in a controlled manner, so that the lines which have been read so far, are used for the computation of the final result.
  • –warning=<Number>. Complex handler-scripts can be provided with a warning-parameter (of course –critical is possible, too) this way. Inside the scripts the value is accessible as the macro CL_WARNING (resp. CL_CRITICAL).

Format of the configuration file

The definitions in this file are written with Perl-syntax. There is a distinction between global variables which influence check_logfiles as a whole and variables which are related to the single searches. A “search” combines where to search, what to search for, which weight a hit has, which action will be triggered in case of a hit, and so on…

$seekfilesdir A directory where files with status information will be saved after a run of check_logfiles. This status information helps check_logfiles to remember up to which position the log file has been scanned during the last run. This way only newly written lines of log files will be read. The default is /tmp or the directory which has been specified with the –with-seekfiles-dir of ./configure.
$protocolsdir A directory where check_logfiles writes protocol files with the matched lines. The default is /tmp or the directory which has been specified with the –with-protocol-dir of ./configure.
$protocolretention The lifetime of protocol files in days. After these days the files are deleted automatically The default is 7 days.
$scriptpath A list of directories where the triggered scripts can be found.(Separated by : under Unix and ; under Windows) The default is /bin:/usr/bin:/sbin:/usr/sbin or the directories which has been specified with the –with-trusted-path of ./configure.
$MACROS A hash with user-defined macro definitions. see below.
$prescript An external script which will be executed during the startup of check_logfiles. The macro $CL_TAG gets the value “startup”. $prescriptparams, $prescriptstdin and $prescriptdelay may be used like scriptparams, scriptstdin and scriptdelay.  
$postscript An external script which will be executed before the termination of check_logfiles. The macro $CL_TAG$ gets the value “summary”. $postscriptparams, $postscriptstdin and $postscriptdelay may be used like scriptparams, scriptstdin and scriptdelay.  
$options A list of options which control the influence of pre- and postscript. Known options are smartpostscript, supersmartpostscript, smartprescript and supersmartprescript. With the option report=”short|long|html” you can customize the plugin’s output. With report=long/html, the plugin’s output can possibly become very long. By default it will be truncated to 4096 characters (The amount of data an unpatched Nagios is able to process). The option maxlength can be used to raise this limit, e.g. maxlength=8192. The option seekfileerror defines the errorlevel, if a seekfile cannot be written, e.g. seekfileerror=unknown (default:critical)  
@searches An array whose elements (hash references) describe the actual work of check_logfiles. The keys for these hash references can be found in the next table.  

The single searches are further specified by the following parameters:

tag A unique identifier.
logfile The name of the log file to scan.
archivedir The name of the directory where archives will be moved to after a log file rotation. The default is the directory where the logfile resides.
rotation One of the predefined methods or a regular expression, which helps identify the rotated archives. If this key is missing, check_logfiles assumes that the log file will be simply overwritten instead of rotated.
type One of “rotating” (default if rotation was given), “simple” (default if no rotation was given), “virtual” (for files which will strictly be scanned from the beginning), “errpt” (if instead of a logfile the output of the AIX errpt command should be scanned), “ipmitool” (if the IPMI System Event Log should be scanned), “oraclealertlog” (if the alertlog of an Oracle database should be scanned through a database connection) or “eventlog” if the windows Eventlog should be scanned.
criticalpatterns A regular expression or a reference to an array of such expressions. If one of these expressions matches a line in the logfile, this is considered a critical error. If the expression begins with a “!”, then the meaning is reversed. It counts as a critical error if no match for this pattern is found.
criticalexceptions One or more regular expressions which invalidate a preceding match of criticalpatterns.
warningpatterns Corrensponds to criticalpatterns, except a warning instead of a critical error is created.
warningexceptions see above
okpatterns A regular expression or a reference to an array of such expressions. If one of these expressions matches a line in the logfile, all previous found warnings and criticals are discarded.
script If a pattern matches, this script will be executed. It must reside under one of the directories specified in $scriptpath. The script gets plenty of information about the hit via environment variables.
scriptparams Yo can provide command line parameters for the script here. They may contain macros. If $script is a code reference, $scriptparams must be a pointer to an array.
scriptstdin If the script expects input through stdin, you can describe it here. The string may also contain macros.
scriptdelay After the script has finished, check_logfiles may sleep for <delay> seconds before continuing it’s work.
options This is a string with a comma-separated list of options which let you fine-tune the search. Each option can be switched off be preceeding it’s name with “no”. The options in detail are explained in the next table:
template Instead of a tag , a search can also be identified by a template name. If you call check_logfiles with the –tag option, the according search will be run as if it was defined with a tagname. See examples.

Options

[no]script Controls whether a script can be executed. default: off
[no]smartscript Controls whether exitcode and output of the script shall be treated like an additional match. default: off
[no]supersmartscript Controls whether exitcode and output of the script should replace the triggering match. default: off
[no]protocol Controls whether the matching lines are written to a protocol file for later investigation. default: on
[no]count Controls whether hits are counted and decide over the final exit code. If not you can use check_logfiles also just to execute the triggered scripts. default: on
[no]syslogserver If set, only lines originating from the local host are taken into account. This is important if check_logfiles runs on a syslog server where many other hosts report their events to. default: off
[no]syslogclient=string A prefilter. Only lines matching the string are further examined.  
[no]perfdata Controls whether performance data should be added to the output. default: on
[no]logfilenocry Controls how to react, if the log file does not exist. By default this is a reason for an UNKNOWN error. If nologfilenocry is set, the missing log file will be acquiesced. default: on
[no]case Controls whether regular expressions are case-sensitive default: on
[no]sticky[=seconds] Controls whether an error is propagated through successive runs of check_logfiles. Once an error was found, the exitcode will be non-zero until an okpattern resets it or until the error expires after <second> seconds. Do not use this option until you know exactly what you do. default: off
[no]savethresholdcount Controls whether the hit counter will be saved between the runs. If yes, hit numbers are added until a threshold is reached (criticalthreshold). Otherwise the run begins with resetted counters. default: on
[no]encoding=string The logfile is encoded in Unicode. (e.g. ucs-2) default: off
[no]maxlength=number Truncates very long lines at the <number>-th character default: off
[no]winwarncrit Can be used instead of patterns to find all events of type WARNING/ERROR in the Windows-Eventlog default: off
[no]criticalthreshold=number A number which denotes how many lines have to match a pattern until they are considered a critical error. default: off
[no]warningthreshold=number A number which denotes how many lines have to match a pattern until they are considered a warning. default: off
[no]allyoucaneat With this option check_logfiles scans the entire logfile during the initial run (when no seekfile exists) default: off
[no]eventlogformat This option allows you to rewrite the message text of a Windows event. Normally it only consists of the field Message. You can enrich this string with additional information (EventID, Source,….)

Scroll down for details.

default: off
[no]preferredlevel If warningpattern and criticalpattern were chosen in a way that a specific line matches both of them (so the output looks like “1 error, 1 warning”), you can use this option to count only one of them. (e.g. with preferredlevel=critical the output would be “1 error”). default: off
[no]randominode This is used for a very special case, where the inode of the logfile is constantly changing. (for example because with every appended line the logfile is written entirely new) default: off
[no]savestate This option forces the creation of a seekfile for searches of type virtual default: off
[no]capturegroups If a pattern contains round parentheses for grouping, the variables $1, $2, … are stored in the macros CL_CAPTURE_GROUP1, CL_CAPTURE_GROUP2, …
The number of these macros (the highest counter of CL_CAPTURE_GROUPx) can be found in CL_CAPTURE_GROUPS. These macros are best used as environment variables in a handler script.
 

Predefined macros

$CL_USERNAME The name of the user executing check_logfiles
$CL_HOSTNAME$ The hostname without domain
$CL_DOMAIN$ The DNS-domain
$CL_FQDN$ Both together
$CL_IPADDRESS$ The IP-adress
$CL_DATE_YYYY$ The current year
$CL_DATE_MM$ The current month (1..12)
$CL_DATE_DD$ The day of the month
$CL_DATE_HH$ The current hour (0..23)
$CL_DATE_MI$ The current minute
$CL_DATE_SS$ The current second
$CL_DATE_CW$ The current calendar week (ISO 8601:1988)
$CL_SERVICEDESC$ The name of the config file without extension.
$CL_NSCA_SERVICEDESC$ the same
$CL_NSCA_HOST_ADDRESS$ The local address 127.0.0.1
$CL_NSCA_PORT$ 5667
$CL_NSCA_TO_SEC$ 10
$CL_NSCA_CONFIG_FILE$ send_nsca.cfg
  The following macros change their value during the runtime.
$CL_TAG$ The tag of the current search ($CL_tag$ is the tag in minor letters)
$CL_TEMPLATE$ The name of the template used (if any).
$CL_LOGFILE$ The file to be scanned next
$CL_SERVICEOUTPUT$ The last matched line.
$CL_SERVICESTATEID$ The error level as a number 0..3
$CL_SERVICESTATE$ The error level as a word (OK, WARNING, CRITICAL, UNKNOWN)
$CL_SERVICEPERFDATA$ The Performancedata.
$CL_PROTOCOLFILE$ The file where all matching lines are written.

These macros are also available in scripts called out of check_logfiles. Their values are stored in environment variables, whose names are derived from the macro’s names. The preceding CL_ is replaced by CHECK_LOGFILES_. You can also access user defined macros. Their names are also prefixed with CHECK_LOGFILES_.

nagios$ cat check_logfiles.cfg
$scriptpath = '/usr/bin/my_application/bin:/usr/local/nagios/contrib';
$MACROS = {
    MY_FUNNY_MACRO => 'hihihihohoho',
    MY_VOLUME => 'loud' 
};
 
@searches = (
  {
    tag => 'fun',
    logfile => '/var/adm/messages',
    criticalpatterns => 'a funny pattern',
    script => 'laugh.sh',
    scriptparams => '$MY_VOLUME$',
    options => 'noprotocol,script,perfdata'
  },
);
 
nagios$ cat /usr/bin/my_application/bin/laugh.sh
#! /bin/sh
if [ -n "$1" ]; then
  VOLUME=$1
fi
printf 'It is %d:%d and my status is %s\n' \
  $CHECK_LOGFILES_DATE_HH \ 
  $CHECK_LOGFILES_DATE_MI \
  $CHECK_LOGFILES_SERVICESTATE
 
printf "I found something funny: %s\n" "$CHECK_LOGFILES_SERVICEOUTPUT"
if [ "X$VOLUME" == "Xloud" ]; then
  echo "$CHECK_LOGFILES_MY_FUNNY_MACRO" | tr 'a-z' 'A-Z'
else
  echo "$CHECK_LOGFILES_MY_FUNNY_MACRO"
fi
printf "Thank you, %s. You made me laugh.\n" "$CHECK_LOGFILES_USERNAME"

Performance data

The number of scanned lines as well as the number of pattern matches (critical, warning and unknown) are appended to the plugin’s output in performance data format. You can suppress this by using the noperfdata option.

nagios$ check_logfiles --logfile=/var/adm/messages
     --criticalpattern="Failed password" --tag=ssh
CRITICAL - (4 errors) - May  9 11:33:12 localhost sshd[29742]
     Failed password for invalid user8 ... |ssh_lines27
     ssh_warnings=0 ssh_criticals=4 ssh_unknowns=0
 
nagios$ check_logfiles --logfile=/var/adm/messages
     --criticalpattern="Failed password" --tag=ssh --noperfdata
CRITICAL - (2 errors) - May  9 11:58:48 localhost sshd[29813]
     Failed password for invalid user8 ...

Scripts

It is possible to execute external scripts out of check_logfiles. This can be at the startup phase ($prescript), before termination ($postscript) or every time a pattern matches a line. See example above. With the option “smartscript” output and exitcode of the script are treated like a match in the logfile and reflected in the overall result. The option “supersmartscript” makes output and exitcode of the script replace those of the triggering match. Pre- and Postscript declared as supersmart scripts directly influence the process of check_logfiles. The option “supersmartprescript” causes an immediate abort of check_logfiles if the prescript has a non-zero exit code. In this case output and exitcode of check_logfiles correspond to those of the prescript. With the option “supersmartpostscript” output and exitcode of check_logfiles can be determined by the postscript. Thus a more meaningful output is possible.

Using check_logfiles with Nagios

If you have just one service which uses check_logfiles you can hard-code the config file in your services.cfg/nrpe.cfg

define service {
  service_description   check_sanlogs
  host_name              oaschgeign.muc
  check_command       check_nrpe!check_logfiles
  is_volatile           1
  check_period          7x24
  max_check_attempts    1
  ...
}
 
define command {
  command_name          check_nrpe
  command_line          $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
 
command[check_logfiles]=/opt/nagios/libexec/check_logfiles
     --config logdefs.cfg

If multiple services are based on check_logfiles you need multiple config files. I propose to name them after the service_description. In the following example we would have a directory cfg.d with config files solaris_check_sanlogs and solaris_check_apachelogs.

define service {
  service_description  logfilescan
  register             0
  is_volatile          1
  check_period         7x24
  max_check_attempts   1
  ...
}
 
define service {
  service_description  solaris_check_sanlogs
  host_name            oaschgeign.muc
  check_command
       check_nrpe_arg!20!check_logfiles!cfg.d/$SERVICEDESC$
  contact_group        sanadmin
  use                  logfilescan
}
 
define service {
  service_description  solaris_check_apachelogs
  host_name            oaschgeign.muc
  check_command
       check_nrpe_arg!20!check_logfiles!cfg.d/$SERVICEDESC$
  contact_group        webadmin
  use                  logfilescan
}
 
define command {
  command_name         check_nrpe_arg
  command_line         $USER1$/check_nrpe
       -H $HOSTADDRESS$ -t $ARG1$ -c $ARG2$ -a $ARG3$
}
 
# nrpe.cfg von Host
[check_logfiles]=/opt/nagios/libexec/check_logfiles --config $ARG1$

The corresponding line in the host’s nrpe.cfg looks like that:

[check_logfiles]=/opt/nagios/libexec/check_logfiles --config $ARG1$

If you use nsclient++ under Windows, the entry in the NSC.ini looks like that:

check_logfiles=C:\Perl\bin\perl C:\libexec\check_logfiles --config $ARG1$

Installation

  • After unpacking the tar-archive you have to call ./configure. With ./configure –help you can show the options if you want to modify the default settings. However, these settings can later be overridden again by variables in the config file.
  • Linux systems are more restrictive regarding the permission of log files. The /var/log/messages file is not readable for non-root users. If you run check_logfiles as an unprivileged user, follow the link below and look for a trick in the examples.
  • –prefix=BASEDIRECTORY Specify here the directory where you want to install check_logfiles. (default: /usr/local/nagios)
  • –with-nagios-user=SOMEUSER The user which will own the check_logfiles script. (default: nagios)
  • –with-nagios-group=SOMEGROUP The group (default: nagios)
  • –with-perl=PATH_TO_PERL The path to your perl binary. (default: The perl in the current PATH)
  • –with-gzip=PATH_TO_GZIP The path to your gzip binary. (default: The gzip in the current PATH)
  • –with-trusted-path=PATH_YOU_TRUST The path where you expect your triggered scripts. (default: /sbin:/usr/sbin:/bin:/usr/bin)
  • –with-seekfiles-dir=SEEKFILES_DIR The directory where status files will be kept. (default: /tmp)
  • –with-protocols-dir=PROTOCOLS_DIR The directory where protocol files will be written to. (default: /tmp)
  • Under Windows you build the plugin with perl winconfig.pl. This will result in plugins-scripts/check_logfiles.
  • The file README.exe contains instructions how to build a Windows ninary check_logfiles.exe.

Scanning of an Oracle-Alertlog with the operating mode “oraclealertlog”

If you want to scan the alert log of an oracle database without having access to the database server on the operating system level (e.g. it is a Windows server or you are not allowed to log in to a Unix server for security reasons) and therefore no access to the alert file, then this file can be mapped to a database table. The contents of the file are then visible through a database connection by executing SQL SELECT statements. If you specify the type “oraclealertlog” in a check_logfiles configuration, this method is used to scan the alert log. You need some extra parameters in the configuration.

# extra parameters in the configuration file
@searches = ({
  tag => 'oratest',
  type => 'oraclealertlog',
  oraclealertlog => {
    connect => 'db0815',       # connect identifier
    username => 'nagios',      # database user
    password => 'hirnbrand',   # database password
  },
  criticalpatterns => [
...

Preparations on the part of the database administrator

Maping external files to database tables is possible since Version 9. Use this script to prepare your database.

Preparations on the part of the Nagios administrator

Installation of the Perl-Modules DBI and DBD::Oracle (http://search.cpan.org/~pythian/DBD-Oracle-1.21/Oracle.pm).

Scanning the Windows EventLog with the operating mode “eventlog”

The eventlog of Windows systems can be processed by check_logfiles like any other logfile. Each event is treated like a line. Also only those events get analyzed which appeared since the last run of check_logfiles.

In it’s most simple form an eventlog search looks like this:

@searches = ({
  tag => 'evt_sys',
  type => 'eventlog',
  criticalpatterns => ['error', 'fatal', 'failed', ....
  # logfile is not necessary. It doesn't make sense here.

If the evaluation of events should not be based on patterns, but the windows-internal stati WARNING and ERROR, use the option winwarncrit.

@searches = ({
  tag => 'evt_sys',
  type => 'eventlog',
  options => 'winwarncrit',

It is also possible to analyze only a subset of all the events in the eventlog. You can use include- and exclude-filters for that.

@searches = ({
  tag => 'winupdate',
  type => 'eventlog',
  eventlog => {
    eventlog => 'system',
    include => {
      source => 'Windows Update Agent',
      eventtype => 'error,warning',
    },
    exclude => {
      eventid => '15,16',
    },
  },
  criticalpatterns => '.*',

With these settings, only those events are fetched from the eventlog which comply with the following requirements:

  • The System-Eventlog is used
  • Only events with the source “Windows Update Agent” are read.
  • Only errors and warnings are read.
  • Events with the IDs 15 and 16 are discarded.

Please be aware that the single include-requirements are combined by logical AND and the exclude-requirements are combined by logical OR. The comma-separated lists are always combined by OR.

filter = ((source == "Windows Update Agent") AND ((eventtype == "error") OR (eventtype == "warning"))) 
         AND NOT ((eventid == 15) OR (eventid == 16))

Yo can change this behavior with the key “operation”. It takes the arguments “and” or “or”.

@searches = ({
  tag => 'winupdate',
  type => 'eventlog',
  eventlog => {
    eventlog => 'system',
    include => {
      source => 'Windows Update Agent',
      eventtype => 'error,warning',
      operation => 'or',
    },
    exclude => {
      eventid => '15,16',
    },
  },
  criticalpatterns => '.*',

Now the filter means: “Windows Update Agent” OR (“error” OR “warning”)

  type => 'eventlog',
  eventlog => {
    eventlog => 'system',                 # system (default), application, security
    include => {
      source => 'Windows Update Agent',   # The source of the event
      eventtype => 'error,warning',       # error, warning, info, success, auditsuccess, auditfailure
      operation => 'or'                   # The logical operation. Default is "and"
    },
    exclude => {
      eventid => '15,16',                  # The ID of the event
    },
  },

Filters can also be used in commandline-mode.

check_logfiles --type "eventlog:eventlog=application,include,source=Windows Update Agent,eventtype=error,eventtype=warning,exclude,eventid=15,eventid=16"

With another option it is possible to rewrite an event’s message text. Normally check_logfiles sees the field Message when it tries to match a pattern. This is also what is shown in the plugin’s output. The option eventlogformat can be used to include the fields EventType, Source, Category, Timewritten and TimeGenerated in the output.

EventType: ERROR
EventID: 16
Source: W32Time
Category: None
Timewritten: 1259431241
TimeGenerated: 1259431241
Message: Der NtpClient verfügt über keine Quelle mit genauer Zeit.
  options => 'eventlogformat="%w src:%s id:%i %m"',

With this eventlogformat the message text of the above event will be rewritten to:

2009-11-28T19:04:16 src:W32Time id:16 Der NtpClient verfügt über keine Quelle mit genauer Zeit.

The formatstring knows the following tokens:

%t EventType
%i EventID
%s Source
%c Category
%w Timewritten
%g TimeGenerated
%m Message

Examples

Here you can find example configurations for several scenarios.

Download

check_logfiles-3.6.2.1.tar.gz

check_logfiles-3.5.3.2.zip

External Links

Changelog

  • 3.6.2.1 – 2014-04-09 fix eventid-format for _tecad_win_
  • 3.6.2 – 2014-04-08 eventlogformat _tecad_win_
  • 3.6.1.1 – 2014-02-04 fix a race-condition (pid file) in unix-daemon-mode (thanks Klaus Wagner)
  • 3.6.1 – 2014-01-25 added search-option “capturegroups”, add forgotten –allyoucaneat
  • 3.6 – 2013-11-14 added global option “nooutputhitcount”, added search-option “thresholdexpiry=”, okpattern resets threshold counters
  • 3.5.3.3 – 2013-09-24 exe files without x-bit can now run in a cygwin environment (Thanks Michael Glaser)
  • 3.5.3.2 – 2013-03-28fixed a bug in allyoucaneat (if used with rotations)
  • 3.5.3.1 – 2012-11-29- –verbose finally works on the commandline- htmlencode can also be an option inside a config file
  • 3.5.3 – 2012-10-26- add option htmlencode (Thanks Sven Nierlein)
  • 3.5.2.1 – 2012-09-19- fix a bug related to nfs-mounted logfiles under linux
  • 3.5.2 – 2012-06-21- fix a bug in CL_PATTERN_KEY (Thanks Frank Rothaupt)
  • 3.5.1 – 2012-06-02add parameters –warning and –critical (they become CL_WARNING/CL_CRITICAL)add option “savestate” for type “virtual”
  • 3.5 – 2012-04-23fix –timeout. Searches are now aborted in a controlled manner (Sponsored feature. You like it? Buy an Audi!)
  • 3.4.7.1 – 2012-01-16fix a bug in maxmemsize and solarisfix a bug where a supersmartpostscript’s output was overwritten by longoutput
  • 3.4.7 – 2012-01-10add new type dumpel (customer’s request. If this feature is useful for you and you want to thank them, buy an Audi)bugfix in errpt’s unstick method (Thanks Jim Winkle)
  • 3.4.6.1 – 2012-01-05make rotatewait a global optionmake logfileerror a global option
  • 3.4.6 – 2012-01-04add maxmemsizecleanup tab-indendation

    add option logfileerror (unlike seekfileerror it is local)

    add option rotatewait (sleep until chaos during rotation is over)

    [selected]searches can be regexp

    Eliminate “Use of qw(…) as parentheses is deprecated” warnings in perl 5.14 (Thanks Tommi)

  • 3.4.5.2 – 2011-11-08set the path to gzip for hpux /opt/contrib..)fix a bug where % in error messages caused ugly perl-errors when used with scriptstdin
  • 3.4.5.1 – 2011-09-28seekfilesdir can be “autodetect” with a configfilealso protocolsdir (dirname(dirname(cfgfile)) + [/var/tmp|/tmp]

    also scriptpath (dirname(dirname(cfgfile)) + [/local/lib/nagios/plugins|/lib/n agios/plugins]

    type executable

    fix a perl undef (patternkey stuff which i don’t remember)

  • 3.4.5add parameter –rununique
  • 3.4.4.2 – 2011-08-03patterns can be hashes
  • 3.4.4.1 – 2011-05-31seekfilesdir is now local (./var/tmp) in an OMD environment
  • 3.4.4 – 2011-04-19add parameter patternfile
  • 3.4.3.2 – 2011-03-15fix a bug with –type rotating::uniform on the commandline
  • 3.4.3.1 – 2011-03-10add option –nostickcreate the pidfile’s directory if it doesn’t exist
  • 3.4.3 – 2011-01-19add pid file handling to avoid concurrent processes with –daemon
  • 3.4.2.2 – 2010-10-01add rotation pattern loglog0bz2log1bz2  (Thanks Christian Schulz)add rotation pattern ehl (Thanks Daniel Haist)
  • 3.4.2.1 – 2010-08-04add %u (User) to option eventlogformat
  • 3.4.2 – 2010-06-30Bugfix, criticalexceptions now work without criticalpatternsThe argument to –tag can now contain special characters (like a file name)
  • 3.4.1 – 2010-05-09Bugfix in type=eventlog (EVENTLOG_SUCCESS was shown as UnknType)New option archivedirregexp
  • 3.4 – 2010-05-06check_logfiles.exe was built with a newer compiler (PERL5LIB problems under Windows)
  • 3.3 – 2010-04-27PerformancetuningNew (global) option seekfileerror

    The exe-file now contains Win32::Daemon

  • 3.2 – 2010-04-12type=eventlog now handles remote eventlogs. Options computer,username,password can contain macros.Faster patternmatching in tivoli-mode.
  • 3.1.5 – 2010-03-08loopback option is now allowed in the config file.matching empty lines are displayed as _(null)_
  • 3.1.4 – 2010-02-25Bugfix in the IPMI-moduleThe $PRIVATESTATE contains now the logfile name

    New option preferredlevel

    new option randoiminode

  • 3.1.2 – 2009-12-08Bugfix in the resolving of macros in scriptparams+external bat file
  • 3.1.1 – 2009-12-03New (global) option maxlength.
  • 3.1 – 2009-11-22New option allyoucaneat. New option eventlogformat. New (global) option report. More filter options for eventlog entries.
  • 3.0.4 – 2009-09-20

    accept the contents of a config file as encoded string
  • 3.0.3.1 – 2009-09-07

    Fixed a bug where incorrect EventIDs were read from the EventLog
  • 3.0.3 – 2009-08-26

    Speedup in Eventlog scans- Under some OSs the daemon did not detach itself from the terminal
  • 3.0.2 – 2009-07-23

    fixed a bug for –config. (Windows uses HOMEPATH instead of HOME)

    fixed a bug in Eventlog+Tivoli (Thanks Werner Breitschmid)

  • 3.0.1 – 2009-06-25

    fixed a bug in Eventlog+Tivoli

    added match_them_all and match_never_ever as predefined patterns

  • 2009-06-19 3.0 new parameters –service, –install, –deinstall. check_logfiles now runs as Windows-Service.
  • 2009-05-25 2.6 new parameters –lookback, –archivedir, –daemon, –warning/criticalthreshold. warning/criticalthresholds moved to options, match_them_all instead of .* on the command line
  • 2009-03-27 2.5.6.1 I forgot to delete debugging output from 2.5.6
  • 2009-03-27 2.5.6 Bugfixes in oraclealertlog+sticky, new parameter –macro, new parameter –nocase
  • 2009-02-20 2.5.5.2 Option maxlength truncates long lines. Option winwarncrit uses Eventlog Type WARNING/ERROR instead of Patterns.
  • 2009-02-02 2.5.5.1 2.5.5 was crap
  • 2009-01-23 2.5.5 Bugfixes, support for Windows eventlog with Win32, multiline output
  • 2008-10-30 2.4.1.9 Bugfix which allows absolute configfile-paths again
  • 2008-10-24 2.4.1.8 Bugfix in $scriptpath under Windows (Thanks Markus Wagner).
  • 2008-10-10 2.4.1.7 Bugfix in rotating::uniform and Macros in rotation. Bugfix in scriptparams with $CL_TAG$. Thanks Markus Wagner.
  • 2008-09-03 2.4.1.6 new parameter –environment
  • 2008-08-15 2.4.1.5 syslogclient hostnames can be case-insensitive (with nocase)
  • 2008-07-28 2.4.1.4 Bugfix in type=uniform, scripts have access to a state-hash
  • 2008-06-24 2.4.1.3 Bugfix (–sticky=<…>). Thanks Severin Rossignol.
  • 2008-06-18 2.4.1.2 Bugfix in CL_DATE_YY
  • 2008-05-29 2.4.1.1 Archivedir can now contain Macros
  • 2008-05-27 2.4.1 Bugfix in sticky-Code. A warningpattern could downgrade a Critical to Warning. Thanks Nils Müller.
  • 2008-05-07 2.4 Support for Oracle Alertlogs through a database connection.
  • 2008-05-06 2.3.3 Option -F which is used to search multiple configfiles in a directory.
  • 2008-02-26 2.3.2.1 Bugfix to support Perl 5.10. More encoding tinkering.
  • 2008-02-12 2.3.2 Support for IPMI System Event Log, Errpt Bugfix, ucs-2 encoded files for Windows.
  • 2007-12-27 2.3.1.2 Can now handle very large files, $CL_PROTOCOLFILE$, $CL_SERVICEPERFDATA$, more commandline options.
  • 2007-11-16 2.3.1.1 Bugfix in sticky code. Thanks Marc Richter. New option savethresholdcount. Thanks Hannu Kivimäki.
  • 2007-10-16 2.3.1 Templates, bzip2 archives, scriptparam bugfix, threshold counters are inherited.
  • 2007-09-10 2.3 Bugfixes. Type errpt. Okpatterns. Options sticky and syslogclient. New format for performance data.
  • 2007-06-08 2.2.4.1 Bugfix (–searches)
  • 2007-06-06 2.2.4 Support for “virtual” files like Linux /proc/*
  • 2007-06-05 2.2.3 Bugfixes
  • 2007-06-02 2.2.2 Support for supersmart scripts with empty output.
  • 2007-06-01 2.2.1 Smart scripts. Scripts can be embedded perl code.
  • 2007-05-21 2.1.1 Bugfixes
  • 2007-05-21 2.1 Native Windows now supported. New option –selectedsearches. New rotation method mod_log_rotate.
  • 2007-05-10 2.0 Complete Redesign. Official handling of non-rotating logfiles. Performancedata.

Copyright

Gerhard Laußer Check_logfiles is released under the GNU General Public License. GPL

Author

Gerhard Laußer (gerhard.lausser@consol.de) will gladly answer your questions.

512 Responses to “check_logfiles”

  1. Charles Says:
    October 12th, 2009 at 22:44

    How can I pass a regular expression like “ORA-(03113|24762)” as an argument for –criticalexception ? Do I have to use a config file on the client side? I’m running the check via check_nrpe on my nagios server. The whole command should be like this:

    [Mon Oct 12 16:32:48 root@gator: nagios] # /usr/local/nagios/libexec/check_logfiles –logfile=/ORACLE/clfyprod/oraadmin/bdump/alert_clfyprod.log –tag=clfyprod –criticalpattern=ORA- –criticalexception=”ORA-[03113|24761]” OK – no errors or warnings|clfyprod_lines=32 clfyprod_warnings=0 clfyprod_criticals=0 clfyprod_unknowns=0 [Mon Oct 12 16:43:35 root@gator: nagios] #

    Which as you see works from the local command line, but on my nagios server using check_nrpe it fails due to illegal metacharacters.

    [Reply]

    lausser Reply:

    Hi Charles, a config file on the remote side would be the preferred solution. But there is a very, very ugly hack which might help you. It is possible to transform the contents of a config file into a flat, encoded string and use this as the argument instead of the filename. Create a script “encodeconfig” with the following code:

    #! /usr/bin/perl
    if (-f $ARGV[0]) {
      my $contents = do { local (@ARGV, $/) = $ARGV[0];  };
      $contents =~ s/([^A-Za-z0-9])/sprintf("%%%02X", ord($1))/seg;
      printf "%s\\n", $contents;
    } else {
      printf STDERR "usage: encodeconfig <configfile>\\n";
    }

    Then create a config file /tmp/clfyprod.cfg

    @searches = ({
      tag => 'clfyprod',
      logfile => '/ORACLE/clfyprod/oraadmin/bdump/alert_clfyprod.log',
      criticalpattern => 'ORA&#92;&#92;-',
      criticalexception => 'ORA&#92;&#92;-(03113|24761)'
    });

    Encode this configuration file with:

    $ ./encodeconfig /tmp/clfyprod.cfg 
    %40searches%20%3D%20%28%7B%0A%20%20tag%20%3D%3E%20%27clfyprod%27%2C%0A%20%20logfile%20%3D%3E%20%27%2FORACLE%2Fclfyprod%2Foraadmin%2Fbdump%2Falert%5Fclfyprod%2Elog%27%2C%0A%20%20criticalpattern%20%3D%3E%20%27ORA%5C%2D%27%2C%0A%20%20criticalexception%20%3D%3E%20%27ORA%5C%2D%2803113%7C24761%29%27%0A%7D%29%3B%0A

    Now you have an encoded string which contains your configuration. Use this as the argument for the –config parameter.

    check_logfiles --config %40searches%20.......

    Gerhard

    [Reply]

  2. Vitalik Says:
    October 21st, 2009 at 2:49

    In a scenario with multiple criticalpatterns/warningpatterns in on search, is there a way to return all lines if multiple lines are matched in one search with separate patterns? In other words, if there is a critical pattern “panic” and a warning pattern “scsi timeout,” configured within a same search in /var/adm/messages, and if it happened so that two matching messages were written within a second one after another, how to include both matched lines in the alert? The goal is to guarantee that all matches are displayed in the alert.

    Thanks!

    [Reply]

    lausser Reply:

    You can use the parameter “−−report long” (or the global variable $report = ‘long'; in a config file) if you want all the matches to be displayed.

    [nagios@nagsrv1 ~]$ echo "dev:0:1:2 error scsi timeout" >> /tmp/test.log
    [nagios@nagsrv1 ~]$ echo "panic: cannot read device" >> /tmp/test.log
    [nagios@nagsrv1 ~]$ check_logfiles --tag scsi --logfile /tmp/test.log \
    --warningpattern 'scsi timeout' --criticalpattern 'panic' \
    --report long
    CRITICAL - (1 errors, 1 warnings in check_logfiles.protocol-2009-10-21-09-02-02) - panic: cannot read device |scsi_lines=2 scsi_warnings=1 scsi_criticals=1 scsi_unknowns=0
    tag scsi CRITICAL
    panic: cannot read device
    dev:0:1:2 error scsi timeout
    
    If you want to display the check results only in the Nagios webinterface, you can even use “−−report html”, which will output a html table with colors.

    [Reply]

    Vitalik Reply:

    @lausser, thanks a lot, that is perfect! Overall, very impressed with the plug-in.

    [Reply]

  3. Frode Says:
    October 27th, 2009 at 3:01

    Looks good, but I think I’ve found a bug: If you have a logfile with CRLF (dos) lineendings, the search will never find the pattern you look for and always return the OK status.

    I’m seeing this on a Linux box, searching for items in a logfile that was made on a Windows box.

    Maybe I’m doing it wrong?

    [Reply]

    flo Reply:

    Same problem here – any hints?

    [Reply]

  4. Frode Says:
    October 27th, 2009 at 7:42

    Ignore my previous comment – I didn’t realise that it doesn’t process a log the first time it see it – I was deleting the .seek files as I was testing the regex I was using… Ooops. :P

    [Reply]

  5. ARPwatch – Netzwerk Anomalien schnell und einfach erkennen « ROOT ON FIRE Says:
    October 31st, 2009 at 10:04

    […] verschiedensten Gründen nicht möglich, dann kann man die Logfileauswertung z.B. mit dem Plugin check_logfiles in eine Nagios Monitoring Umgebung […]

  6. Ryan Kovar Says:
    November 4th, 2009 at 16:26

    Hi! I Love your check and use it extensively. I do have one question however; Is it possible to have it other than acceptable log entries? For example, if the log writes anything other than “none”, spit out a critical Example log output:

    20091103 15:31:44.52 “none” 20091103 15:32:36.10 “none” 20091103 15:36:31.89 “none” 20091103 15:37:01.25 “ReadOnly” 20091103 15:37:09.08 “none”

    Alert on Read Only

    [Reply]

    lausser Reply:

    You can reverse the pattern matching by adding an exclamation mark to the regular expression. criticalpattern => ‘!none’ should do the trick. Now you get a CRITICAL each time a line does not match “none”. Gerhard

    [Reply]

  7. Hombre Says:
    November 5th, 2009 at 23:42

    I have exactly the same problem with the type=errpt as described under the following link:

    http://www.icinga-portal.org/wbb/index.php?page=Thread&postID=103725

    Everytime I execute the script there were no patterns found, although there are lots of lines which matches the regex pattern.

    thanks for your help

    PS: my Version is: check_logfiles v3.0.4

    [Reply]

    lausser Reply:

    Hi, there’s a difference between errpt and ordinary files. When checking the latter, check_logfiles remembers the last position in “logoffset”. Errpt however is more time-based, so check_logfiles saves a timestamp in “logtime”. Edit your statefile and set logtime to 1 (not 0!!!). The next time you run check_logfiles it will scan the entire errpt. (you can watch what happens behind the scenes if you create a trace-file with “touch /tmp/check_logfile.trace; tail -f /tmp/check_logfiles.trace”. Don’t forget to delete it later)

    [Reply]

  8. Stephen Says:
    November 16th, 2009 at 20:40

    I was wondering if there is a way to check for a line position? Here is the logfile line:

    [11/12/09 10:28:32:131 EST] 0000000a TrustAssociat E com.ibm.ws.security.spnego.TrustAssociationInterceptorImpl initialize CWSPN0009

    The important character is E in the 52nd position. This application always places it in the 52nd position, but searching for E would result in all lines returning an error do to other E’s in the line such as EST. The application returns E, I, W in that position to inform of the type of error….

    Is there any way to do this with check_logfiles?

    Thanks

    [Reply]

    lausser Reply:

    What about the TrustAssociat? Does this label change or is always in the lines you’re interested in? You could use “TrustAssociat E” as criticalpattern (and “TrustAssociat W” as warningpattern).

    [Reply]

  9. Stephen Says:
    November 17th, 2009 at 15:26

    The Trust Associat is one of many applications in that group that can be in error. That was the first thing I asked the apps guys :) But according to them, they all put the severity flag at the same place….But the app name can change.

    [Reply]

    lausser Reply:

    Maybe this works: criticalpatterns => ‘\[.*?\]\s+\w+\s+\w+\s+E\s+’ It matches the datestring, the 000000a (or any other code), another word (which is the application name) and then a standalone E.

    [Reply]

  10. Michael Koeppl Says:
    November 25th, 2009 at 11:43

    Hello, ever thought about a “dryrun” parameter which did not change the offset value, so that searches can be evaluated without changing the seek file. If you would implement this feature maybe the seek file infos without the dryrun parameter should be made available in the file but with comments deactivated. So is the tester wants he can check the changes made by the run.

    [Reply]

  11. Vikas Vysetti Says:
    November 25th, 2009 at 21:14

    Hi

    I am looking to use check_logfiles on a standalone system. I just can’t figure out how to use it with nagios. It would be great if someone can help me out.

    [Reply]

  12. Norbert Says:
    December 1st, 2009 at 14:27

    Does anyone has a working spec file for check_logfiles?

    [Reply]

  13. Michal Says:
    December 2nd, 2009 at 15:40

    Hello

    i think there is a bug in report parameter (in config file)

    Im using following config file:

    $seekfilesdir = ‘/tmp'; $report = “html”;

    @searches = ( { tag => ‘PHPError’, logfile => ‘/var/log/httpd/httpd_error.$CL_DATE_YYYY$$CL_DATE_MM$$CL_DATE_DD$.log’, criticalpatterns => ‘.PHP Fatal.’, options => ‘noprotocol,noperfdata,nologfilenocry,nosavetreshholdcount,allyoucaneat’ }, );

    but im Always getting short output.

    I did some debuging and find out following:

    if ($self->get_option(‘report’) ne “short”) { $self->formulate_long_result(); }

    here is $self->get_option(‘report’) always short BUT on the same place $self->{report} is html. What could be the problem?

    In case i change the code to:

    if ($self->{report} ne “short”) { $self->formulate_long_result(); }

    check_scripts is working as expected.

    What is the reason for this please / What am I doing wrong?

    Next:

    It will be great to make my $maxlength = 1024; inside of formulate_long_result function configurable over config file.

    Thank you for your help.

    [Reply]

    lausser Reply:

    In 3.1 report has become an option. You have to move it into $option. I just published release 3.1.1, which lets you adjust the maximum output length in the config file.

    $options = "report=html,maxlength=1024";

    [Reply]

  14. Michal Says:
    December 4th, 2009 at 1:13

    It looks like that download link for 3.1.1 does not work.

    [Reply]

    lausser Reply:

    Sorry, it’s fixed now.

    [Reply]

  15. Ovidiu Says:
    December 15th, 2009 at 17:39

    Hello, nice plugin but for me it doesn’t work the criticalexception feature: @searches = ( { tag => ‘test’, logfile => ‘/tmp/test.log’, criticalpatterns => ‘ORA-‘, criticalexception => ‘ORA-(03113|24761)’ }, );

    I have the version 3.1.2 when I execute echo “ORA-03113 dfafa” >> /tmp/test.log I get a critical return

    [Reply]

    lausser Reply:

    Hi, it’s criticalexception_s_, not criticalexception. It’s important to pay attention. While on the commandline you say –criticalpattern … –criticalexception …, because only one pattern is possible here. If you use a config file, you have to use the plural criticalpatterns/criticalexceptions, because you can specify a whole array of patterns. Gerhard

    [Reply]

  16. Benny Says:
    December 16th, 2009 at 19:36

    Hallo!

    Ich spreche nicht gut Deutche. :(

    Do you have an English version of this page? It contains a LOT of very useful information, but I haven’t found an English translation of it yet, and I don’t trust Babelfish’s accuracy that much.

    Thank you so much for the plugin!

    Benny

    [Reply]

    lausser Reply:

    Hi, if you write “ich spreche nicht gut Deutsch” then it’s perfect!. Do you see the english flag in the top right corner of the page? Click and you’re there. Gerhard

    [Reply]

    Benny Reply:

    @lausser, Thanks… I looked for a while, but managed to overlook that. How excellent. Thank you for the plugin and all the great documentation!

    [Reply]

  17. matejo Says:
    December 17th, 2009 at 15:45

    Hi! I have quite huge logs – sometimes they can grow over 1 GB. Is there an option to configure it so that it continuoes scan from tha last line it scanned before?

    [Reply]

    lausser Reply:

    That’s the default behaviour of check_logfiles. It scans a file until it hits the end-of-file and saves the position in the so-called seek-file (usually in /var/tmp/check_logfiles). When it runs next time, check_logfiles “remembers” this position and starts reading here. This way, only the lines which were appended between the single runs of check_logfiles are scanned. The position is also used to detect logfile rotations.

    [Reply]

    matejo Reply:

    @lausser, It seems it doesn’t work for me…. It doesn’t create any seek-files in that directory, neither in a directory if i specify it with $seekfilesdir=’/tmp/logCheck'; file permissons are OK.

    [Reply]

    matejo Reply:

    @matejo, Forgot to tell. I have created /tmp/check_logfiles.trace. And it says that it starts everytime from beginning… moving to position 0…

    [Reply]

    lausser Reply:

    Strange… can you please mail me the config file? Also please create a fresh trace-file, run check_logfiles two times and send me the trace-file too. Gerhard p.s. did you run check_logfiles as root? Maybe the seekfile(-dir) belongs to root and the nagios-user cannot write it.

    [Reply]

  18. Craig Says:
    December 17th, 2009 at 18:13

    Awesome plugin…I have it in use on several systems and having a small problem with one of them. The script takes FOREVER to run, it will eventually return accurate results. On a similliar system doing Oracle Alert log checks it will run in under 10 seconds, on the second system it takes in excess of 2-3 minutes. I checked the perl modules and versions – identical, Oracle versions – identical, OS – identical. Not sure where to go from here.

    Thanks

    [Reply]

  19. Craig Says:
    December 17th, 2009 at 18:14

    Sorry forgot to mention this is running on HPUX 11.31

    [Reply]

    lausser Reply:

    Create the file /tmp/check_logfiles.trace with touch. If this file exists, check_logfiles writes a lot of debugging stuff in it. Watch it with tail -f /tmp/check_logfiles.trace, esp. the timestamps. This might help finding out, where it hangs or spends time. Does the name resolution work correctly on this machine? In rare cases this was the reason for a hanging plugin, because check_logfiles tries to find out the hostname.

    [Reply]

    Ryan Ash Reply:

    @lausser, You mentioned that /tmp/check_logfiles.trace can be used to further debug a nix client. Is there a similar file that can be done on windows?

    TY sir.

    [Reply]

    Ryan Ash Reply:

    @Ryan Ash, Nevermind…found it C:/TEMP/ thanks

    [Reply]

  20. Carl Says:
    December 22nd, 2009 at 20:54

    Is there a way to trigger CRIT or OK based on a multiline match?

    Basically I have an application that produces a multiline entry per log entry. I want to return CRIT when a pattern is matched on the first line but only if the next line does not contain a specific string.

    I attempted to utilize the ‘okpattern’ option but this resets to OK regardless of previous real CRIT conditions.

    [Reply]

    lausser Reply:

    check_logfiles reads the logfile line by line and treats every line separately (this means, when it reads a line, it already has forgotten the previous lines). So it is not possible to use regular expressions which span several lines. What you can do is to add your own logic with a script.

    my $flag = 0;
    @searches = ({
      tag => 'carl',
      logfile => 'log.log',
      criticalpatterns => '.*',
      options => 'supersmartscript',
      script => sub {
        my $line = $ENV{CHECK_LOGFILES_SERVICEOUTPUT};
        $flag++ if $flag;
        if ($line =~ /critical phase/) { # line 1
          $flag = 1;
          return 0;
        } elsif ($flag == 2 && $line !~ /fixed/) { # line 2 and not pattern 2
          $flag = 0;
          print $line;
          return 2;
        } else { # line 2 and pattern 2 or line not following line 1
          $flag = 0;
          return 0;
        }
      },
    });

    [Reply]

  21. john Says:
    December 26th, 2009 at 7:52

    Is it possible to associate an action in the next run of check_logfiles even though there is no new lines being added to the log?

    [Reply]

    john Reply:

    @john,

    when there is no new line being added I would like to to have a logic inside “script => sub {..” Is this possible?

    @searches =( { tag => 'n', logfile => '/tmp/n.log', options => 'supersmartscript', criticalpatterns => ['ERROR', 'WARN', 'FIX'], script => sub { if ($ENV{CHECK_LOGFILES_SERVICEOUTPUT} =~ /ERROR/) { # do error logic... return 2; } elsif ($ENV{CHECK_LOGFILES_SERVICEOUTPUT} =~ /WARN/) { # do warnining logic... return 1; } elsif ($ENV{CHECK_LOGFILES_SERVICEOUTPUT} =~ /FIX/) { # do fix logic... return O; } else{ # do NO-NEW_LINE logic... # return 0, 1, 2 or 3 based on logic; } } } );

    [Reply]

    lausser Reply:

    Like (supersmart)script, which is executed after each pattern match, there is also the option supersmartpostscript. It can be used to rewrite the plugin’s output and exitcode. With a criticalpattern of ‘.*’ you can call a handlerscript after each line and increase a linecounter. If this linecounter is 0, then the supersmartpostscript can handle the “no new lines”-case. You can keep your own pattern matching, but instead of “do xx logic” you code “set xx flag”.

    my $linecounter = 0;
    my $errflag = 0; my $warnflag = 0; my $fixflag = 0;
    $options = 'supersmartpostscript';
    @searches = ....
    $postscript = sub {
      if ($linecounter == 0) {
        printf "no new lines&#92;n";
        # do no-new-lines-logic
        return 0;
      } .....
    };

    [Reply]

  22. john Says:
    December 26th, 2009 at 8:16

    I’m using the “sticky” option to preserve the last CRIT condition. I’m monitoring a variable that has 3 possible states: FIX (okpattern), WARN (warningpatterns) and ERROR (criticalpatterns). The FIX status will reset the CRIT, but also I would like to reset CRIT when the next event is a WARNING (warningpatterns). As of today, when the var. goes from CRIT to WARN check_logfiles shows

    CRITICAL – (1 errors, 1 warnings) – |s5_lines=1 s5_warnings=1 s5_criticals=1 s5_unknowns=0.

    This is misleading in my case because the variable can have only 1 state at any given time. Is there a way to do this? Thanks,

    [Reply]

  23. Harish Says:
    December 30th, 2009 at 16:20

    Hi All,

    I have installed this script on my nagios box, but while running this, I am always get “OK” status.

    Please help me.

    [nagios@Nagios libexec]$ echo “dev:0:1:2 error scsi timeout” >> /tmp/t.log [nagios@Nagios libexec]$ echo “panic: cannot read device” >> /tmp/t.log [nagios@Nagios libexec]$ ./ [nagios@Nagios libexec]$ /usr/local/nagios/libexec/check_logfiles check_logfiles –tag scsi –logfile /tmp/test.log \

    –warningpattern ‘scsi timeout’ –criticalpattern ‘panic’ \ –report long OK – no errors or warnings|scsi_lines=0 scsi_warnings=0 scsi_criticals=0 scsi_unknowns=0 [nagios@Nagios libexec]$ cat /tmp/t.log dev:0:1:2 error scsi timeout panic: cannot read device

    [Reply]

    lausser Reply:

    Was this the first time you ran check_logfiles? From the scsi_lines=0 you see that 0 lines were scanned. This is normal behavior. The first run only initialises, that is, seeks the end of the logfile and saves the position reached. Then, with the next run, it will operate normally, do a fast forward to this saved position and then scan the lines which were added (or simply exit if no new lines were added). So call check_logfiles, call the 2 echo commands and call check_logfiles again. You should see a CRITICAL then.

    [Reply]

  24. Benny Says:
    December 31st, 2009 at 19:52

    I’m getting the hang of this plugin, and I’m happy that it is working the way it is.

    However, there is one gotcha… I am experimenting with using a config file to define a bunch of searches (actually, so the end users can write their own), and I notice that the alerts spit out with only the line matched, not with the log file matched.

    I guess I could write a script and use $CL_TAG, but is there something built-in that I’m missing? I’m trying to get away with a single service per host that checks all the searches the users have defined…

    Thanks!

    Benny

    [Reply]

    lausser Reply:

    Maybe you should try –report long or $options = “report=long”; in the configfile. Then you will see the matching lines grouped by tags.

    [Reply]

  25. haitauer Says:
    January 2nd, 2010 at 1:59

    Hi,

    report option (command line or config file) does not work in v3.1.2. Its always short.

    [Reply]

    lausser Reply:

    I can’t reproduce this.

    [Reply]

    haitauer Reply:

    @lausser, Hi lausser,

    /dev/shm/check_logfiles-v3.1.2.pl -t 50 -f test.conf –searches=”test-cacti-partial-snmp,test-cacti-partial-cmd” WARNING – (2 warnings) – 01/28/2010 09:49:20 AM – CMDPHP: Poller[0] Host[86] DS[1543] WARNING: Result from SNMP not valid. Partial Result: U …

    cat test.conf @searches = (

        {
                tag => 'test-cacti-partial-cmd',
                options => 'noperfdata,noprotocol,sticky=86400,nocase,report=long,maxlength=512',
                logfile => '/var/log/test-cacti/cacti.log',

            warningpatterns =&gt; [
                    'Result from CMD not valid',
            ],
            warningthreshold =&gt; 1,
    
            criticalpatterns =&gt; [
                    'Result from CMD not valid',
            ],
            criticalthreshold =&gt; 200,
    },
    
    {
            tag =&gt; 'test-cacti-partial-snmp',
            options =&gt; 'noperfdata,noprotocol,sticky=86400,nocase,report=long,maxlength=512',
            logfile =&gt; '/var/log/test-cacti/cacti.log',
    
            warningpatterns =&gt; [
                    'Result from SNMP not valid',
            ],
            warningthreshold =&gt; 1,
    
            criticalpatterns =&gt; [
                    'Result from SNMP not valid',
            ],
            criticalthreshold =&gt; 200,
    },
    

    );

    /dev/shm/check_logfiles-v3.1.2.pl -t 50 -f test.conf –searches=”test-cacti-partial-snmp,test-cacti-partial-cmd” –report long WARNING – (11 warnings) – 01/28/2010 09:50:25 AM – CMDPHP: Poller[0] Host[86] DS[1543] WARNING: Result from SNMP not valid. Partial Result: U …

    /dev/shm/check_logfiles-v3.1.2.pl -t 50 -f test.conf –searches=”test-cacti-partial-snmp,test-cacti-partial-cmd” –report=long WARNING – (2 warnings) – 01/28/2010 09:50:25 AM – CMDPHP: Poller[0] Host[86] DS[1543] WARNING: Result from SNMP not valid. Partial Result: U …

    [Reply]

  26. haitauer Says:
    January 2nd, 2010 at 19:12

    Hi,

    with check_logfiles v3.1.2 every entry read out from the eventlog is listed twice in the check_logfiles output.

    [Reply]

    lausser Reply:

    I can’t reproduce this. Maybe you have criticalpatterns so that an event matches twice?

    [Reply]

  27. haitauer Says:
    January 3rd, 2010 at 23:29

    Hi,

    how do I reverse the output of report=long or html? i.e. newest errors/warnings first … thanks.

    [Reply]

    lausser Reply:

    That’s not possible.

    [Reply]

  28. haitauer Says:
    January 5th, 2010 at 12:34

    Hi,

    is it possible to do something like this:

    exclude => { source => ‘Userenv’, eventid => ‘1085’, operation => ‘and’, },

    exclude => { source => ‘PureMessage’, eventid => ‘8’, operation => ‘and’, },

    i.e. I want to exclude some event IDs from defined source as eventIDs are not unique in windows, so I have to specify the source also to exclude things.

    [Reply]

    lausser Reply:

    No, more than one exclude key is not possible. But i understand the problem. I’ll have a look at this.

    [Reply]

  29. charleshb Says:
    January 5th, 2010 at 22:25

    What happened to English language version of this page?

    [Reply]

    lausser Reply:

    Just click on the english flag you see in the right upper corner of this page.

    [Reply]

  30. haitauer Says:
    January 6th, 2010 at 0:14

    hello? anyone awake here? :)

    [Reply]

    lausser Reply:

    Holydays. Sorry for not providing free 7×24 support. I’m writing and maintaining this software in my leisure time.

    [Reply]

  31. Gene Siepka Says:
    January 6th, 2010 at 18:19

    Hi all.. seem to be having an issue on Solaris watching /var/adm/messages.. at random times during the day I’ll get “cannot open file /var/adm/messages” and last night at 3:10am when the log rotated, seems like check_logfiles got stuck, until I got into the office and ran it manually.. Running thru NRPE if it makes any difference.. Saw this in the trace file i created:

    Wed Jan 6 03:10:03 2010: ==================== /var/adm/messages ================== Wed Jan 6 03:10:03 2010: found seekfile /usr/local/encap/nagios/var/check_logfiles._var_adm_messages.messages Wed Jan 6 03:10:03 2010: LS lastlogfile = /var/adm/messages Wed Jan 6 03:10:03 2010: LS lastoffset = 1953 / lasttime = 1262712406 (Tue Jan 5 12:26:46 2010) / inode = 67174402:384568 Wed Jan 6 03:10:03 2010: found private state $VAR1 = { ‘runcount’ => 502, ‘lastruntime’ => 1262765207 };

    Wed Jan 6 03:10:03 2010: this is not the same logfile 67174402:384568 != 67174402:382266 Wed Jan 6 03:10:03 2010: Log offset: 1953 Wed Jan 6 03:10:03 2010: looking for rotated files in /var/adm with pattern messages.*\.[0-9]+ Wed Jan 6 03:10:03 2010: archive /var/adm/messages.2 matches (modified Tue Dec 22 11:38:27 2009 / accessed Mon Jan 4 12:58:41 2010 / inode 377118 / inode changed Wed Jan 6 03:10:00 2010) Wed Jan 6 03:10:03 2010: archive /var/adm/messages.1 matches (modified Thu Dec 31 10:41:59 2009 / accessed Sun Jan 3 01:37:05 2010 / inode 384614 / inode changed Wed Jan 6 03:10:00 2010) Wed Jan 6 03:10:03 2010: archive /var/adm/messages.3 matches (modified Mon Jan 4 14:12:33 2010 / accessed Tue Jan 5 01:32:51 2010 / inode 366513 / inode changed Wed Jan 6 03:10:00 2010) Wed Jan 6 03:10:03 2010: archive /var/adm/messages.0 matches (modified Tue Jan 5 12:26:46 2010 / accessed Wed Jan 6 03:09:46 2010 / inode 384568 / inode changed Wed Jan 6 03:10:00 2010) Wed Jan 6 03:10:03 2010: archive /var/adm/messages.0 was modified after Tue Jan 5 12:26:46 2010 Wed Jan 6 03:10:03 2010: archive messages.0 cannot be opened Wed Jan 6 03:10:03 2010: although a logfile rotation was detected, no archived files were found Wed Jan 6 03:10:03 2010: stat (/var/adm/messages) failed, try access instead Wed Jan 6 03:10:03 2010: could not open logfile /var/adm/messages Wed Jan 6 03:10:03 2010: first relevant files: Wed Jan 6 03:10:03 2010: relevant files: Wed Jan 6 03:10:03 2010: nothing to do Wed Jan 6 03:10:03 2010: keeping position 1953 and time 1262712406 (Tue Jan 5 12:26:46 2010) for inode 67174402:384568 in mind

    Any ideas? This is a great plugin and seems to be the only one that can pattern match and then do exceptions for crap we dont want to be alerted on..

    [Reply]

    lausser Reply:

    The trace looks normal. Well, normal for a situation where the nagios-user cannot read the logfile. I know there are a lot of solaris-users running check_logfiles, but i never heard of a problem like this. It looks like during/a short time after the rotation, the logfiles are not world-readable. The rotation detection works. You see inode=67174402:384568. This is the device/inode of the messages file when check_logfiles was run last time. Now it’s inode has changed. The old 67174402:384568 appears also, but as that of messages.0. If check_logfiles only could open the files… How is this rotation managed? Is there something like /etc/logrotate.conf? Any chance to add some chmod to the rotation script?

    [Reply]

    Gene Siepka Reply:

    @lausser,

    Yes its /etc/logadm.conf in Solaris10. It rotates the log weekly and renames /var/adm/messages to /var/adm/messages.0, then .0 to .1 etc…

    I did a force log rotate just now and see the same results.. checked the permissions on /var/adm/messages and /var/adm/messages.0 and they are fine, nagios userid should be able to read them. Here is some more info:

    ls -la /var/adm/messages

    -rw-r—– 1 root sysadmin 0 Jan 7 11:14 /var/adm/messages

    ls -la /var/adm/messages.0

    -rw-r—– 1 root sysadmin 224 Jan 6 12:32 /var/adm/messages.0

    id -a nagios

    uid=502(nagios) gid=14(sysadmin) groups=500(nagios)

    and trace entry again:

    Thu Jan 7 11:19:01 2010: ==================== /var/adm/messages ================== Thu Jan 7 11:19:01 2010: found seekfile /usr/local/encap/nagios/var/check_logfiles._var_adm_messages.messages Thu Jan 7 11:19:01 2010: LS lastlogfile = /var/adm/messages Thu Jan 7 11:19:01 2010: LS lastoffset = 224 / lasttime = 1262799146 (Wed Jan 6 12:32:26 2010) / inode = 67174402:382266 Thu Jan 7 11:19:01 2010: found private state $VAR1 = { ‘runcount’ => 982, ‘lastruntime’ => 1262880561 };

    Thu Jan 7 11:19:01 2010: this is not the same logfile 67174402:382266 != 67174402:384476 Thu Jan 7 11:19:01 2010: Log offset: 224 Thu Jan 7 11:19:01 2010: looking for rotated files in /var/adm with pattern messages.*\.[0-9]+ Thu Jan 7 11:19:01 2010: archive /var/adm/messages.2 matches (modified Thu Dec 31 10:41:59 2009 / accessed Thu Jan 7 00:22:02 2010 / inode 384614 / inode changed Thu Jan 7 11:14:26 2010) Thu Jan 7 11:19:01 2010: archive /var/adm/messages.1 matches (modified Tue Jan 5 12:26:46 2010 / accessed Thu Jan 7 00:22:02 2010 / inode 384568 / inode changed Thu Jan 7 11:14:26 2010) Thu Jan 7 11:19:01 2010: archive /var/adm/messages.3 matches (modified Tue Dec 22 11:38:27 2009 / accessed Thu Jan 7 00:22:02 2010 / inode 377118 / inode changed Thu Jan 7 11:14:26 2010) Thu Jan 7 11:19:01 2010: archive /var/adm/messages.0 matches (modified Wed Jan 6 12:32:26 2010 / accessed Thu Jan 7 11:10:13 2010 / inode 382266 / inode changed Thu Jan 7 11:14:26 2010) Thu Jan 7 11:19:01 2010: archive /var/adm/messages.0 was modified after Wed Jan 6 12:32:26 2010 Thu Jan 7 11:19:01 2010: archive messages.0 cannot be opened Thu Jan 7 11:19:01 2010: although a logfile rotation was detected, no archived files were found Thu Jan 7 11:19:01 2010: stat (/var/adm/messages) failed, try access instead Thu Jan 7 11:19:01 2010: could not open logfile /var/adm/messages Thu Jan 7 11:19:01 2010: first relevant files: Thu Jan 7 11:19:01 2010: relevant files: Thu Jan 7 11:19:01 2010: nothing to do Thu Jan 7 11:19:01 2010: keeping position 224 and time 1262799146 (Wed Jan 6 12:32:26 2010) for inode 67174402:382266 in mind

    If it helps this is my config file. kind of lengthy, sorry for the wall of text:

    cat /usr/local/encap/nagios/etc/check_logfiles.cfg

    @searches = ( { tag => ‘messages’, logfile => ‘/var/adm/messages’, rotation => ‘SOLARIS’, criticalpatterns => [ ‘pamsmb’, ‘offlin’, ‘Offlin’, ‘OFFLINE’, ‘fault’, ‘Fault’, ‘FAULT’, ‘fail’, ‘Fail’, ‘FAIL’, ‘down’, ‘Down’, ‘emerg’, ‘Emerg’, ‘EMERG’, ‘alert’, ‘Alert’, ‘ALERT’, ‘crit’, ‘Crit’, ‘CRIT’, ‘err’, ‘Err’, ‘ERR’, ‘xntpd.time reset’, ‘kern’ ], criticalexceptions => [ ‘My unqualified host name.unknown’, ‘WARNING.forceload’, ‘Command terminated on signal 9′, ‘sshd’, ‘TLD.going to UP state’, ‘ntpdate’, ‘ttsession’, ‘Tt_session ‘, ‘GMT LOM time reference ‘, ‘Automatic cleaning of’, ‘MQSeries.FFST’, ‘using kernel phase-lock loop’, ‘chiunix-mq.FFST record created’, ‘postfix.watchdog timeout’, ‘named.enforced delegation-only’, ‘Computer Associates Licensing’, ‘failure detection time’, ‘myin.incorrect password’, ‘kern.info.devinfo0′, ‘named.* dispatch .* connection reset’, ‘no cleaning tape available’, ‘postfix.timeout.status’, ‘LOGOUT for port id’, ‘itmpt0.RESCAN’, ‘rsync. name lookup failed’, ‘/stage. file system full’, ‘IOCStatus = 804b’, ‘lw8. . Main, up’, ‘zcons.online’, ‘rsync error.some files could not be transferred’, ‘incorrect password attempt’, ‘WARNING pools facility is disabled’, ‘rsyncd.*daemon.warning’, ], })

    And again, if I run the check_logfiles manually on the server it runs it correctly, notices the logfile was rotated and is happy. Starting to think maybe something is wrong running this thru NRPE.

    [Reply]

    lausser Reply:

    Maybe nrpe runs as nagios:nagios as opposed to nagios:sysadmin?

    [Reply]

    Gene Siepka Reply:

    @lausser,

    While at first I shrugged this off, knowing that the nagios user did have its primary group as “sysadmin”, same as the file permission…

    But got me thinking and actually you were right.. I had compiled nrpe before making the groupid change, and because of that the nrpe daemon was indeed running as nagios:nagios instead of nagios:sysadmin. re-compiled nrpe and rotated my log several times.. check_logfiles picked it up right away.

    Thanks for the suggestion and great plugin!

    [Reply]

  32. matejo Says:
    January 12th, 2010 at 14:09

    Hello!

    Is there an option so that the output of the plugin includes all error messages which it discoverd since last scan?

    I have used: $options = “report=long,maxlength=8192″;

    But all I see in nagios is the last out of 13 error strings it has found?

    [Reply]

    lausser Reply:

    strange… so this means you only get a single line?

    [Reply]

    matejo Reply:

    @lausser, yes… only single line…

    [Reply]

    lausser Reply:

    Do you have the latest version of the plugin? Can you mail me the config file and the command line parameters you used?

    [Reply]

    flo Reply:

    @lausser, I have the same problem. My application always logs TWO lines containing ‘ERROR’ but only the first line is useful. with my config attached below I always get the second line as output for nagios… my version is 3.1.2 the commandline includes the -f option only

    my config-file: $protocolretention = 14; $options=”report=long”; @searches = ( { tag => ‘Source’, logfile => ‘/var/icoserve/logs/Source.log’, criticalpatterns => [‘.WARNING.‘, ‘.ERROR.‘ ], archivedir => ‘/var/icoserve/logs/archive’, rotation => ‘Source\.log\.\d+\.gz’ });

    [Reply]

    lausser Reply:

    i create some test messsges

    echo "text" >> Source.log
    echo "1ERROR1" >> Source.log
    echo "1WARNING1" >> Source.log
    echo "2ERROR2" >> Source.log
    echo "2WARNING2" >> Source.log
    
    then i call check_logfiles and i get 4 lines
    check_logfiles --config cfg.cfg
    CRITICAL - (4 errors in cfg.protocol-2010-01-22-01-16-01) - 2WARNING2 ...|Source
    _lines=5 Source_warnings=0 Source_criticals=4 Source_unknowns=0
    tag Source CRITICAL
    1ERROR1
    1WARNING1
    2ERROR2
    2WARNING2
    
    With a nagios3 you should see all the lines in the web interface. But notifications usually only show the first line, because the macro $SERVICEOUTPUT$ is used in the notification command. The long output is in $SERVICELONGOUTPUT$

    Stephen Sunners Reply:

    @lausser,

    HI – I seem to have the same issue , if I specify the report=long/report=html on the command line it works fine , but the $options seem to be ignored in the config file , so i must be doing something wrong :-)

    I am running version 3.1.2

    put values in log file

    $ echo “1ERROR1″ >> SS.log $ echo “1ERROR1″ >> SS.log $ echo “1WARNING1″ >> SS.log $ echo “21ERROR12″ >> SS.log

    run on the command line

    $ /usr/local/nagios/libexec/check_logfiles –logfile=/opt/nagios/ss-nagios/SS.log –tag=abc –criticalpattern=’ERROR’ –warningpattern=’WARNING’ –report html CRITICAL – (3 errors, 1 warnings in check_logfiles.protocol-2010-02-26-11-35-17) – 21ERROR12 …|abc_lines=4 abc_warnings=1 abc_criticals=3 abc_unknowns=0

    tag abc1ERROR11ERROR121ERROR121WARNING1

    works fine

    show config file

    $ cat cfg.cfg

    @searches = ({ tag => ‘abc’, logfile => ‘/opt/nagios/ss-nagios/SS.log’, criticalpatterns => [ ‘ERROR’ # error in reading control file ], warningpatterns => [ ‘WARNING’ # end of file on communication channel ],

    options => [ ‘noprotocol’, ‘report=html’ ] });

    put values in logfile

    $ echo “1WARNING1″ >> SS.log $ echo “21ERROR12″ >> SS.log

    run using cfg.cfg

    [nagios@localhost ss-nagios]$ /usr/local/nagios/libexec/check_logfiles -f cfg.cfg CRITICAL – (1 errors, 1 warnings in cfg.protocol-2010-02-26-11-36-31) – 21ERROR12 |abc_lines=2 abc_warnings=1 abc_criticals=1 abc_unknowns=0

    $options ignored

    [Reply]

    lausser Reply:

    report does not belong to the options of a single search. It’s a global setting, because long/html output affects all the @searches members (even you the array has only one element). Try this:

    $options = 'report=long';
    @searches = ({
       ...
       options => 'noprotocol',
    });

    Stephen Sunners Reply:

    @Stephen Sunners,

    Thanks a lot – that worked fine !

  33. flo Says:
    January 14th, 2010 at 11:17

    Hi,

    I have this situation: My logfile rotation is almost like described in scheme loglog0gzlog1gz. The only difference is, that the rotation is starting with log.1.gz (log.0.gz is not created) Can I use this scheme without problems?

    For now i tried with: rotation => ‘Source\.log\.\d+\.gz’ But everytime a logfile containing errors is rotated again i get a error raised :(

    I hope you can provide any helpful hints…

    best regards flo

    p.s.: Very GREAT work!!!

    [Reply]

    lausser Reply:

    Yes, loglog0log1gz should work. Which kind of error did you get?

    [Reply]

    flo Reply:

    with “I get an error raised” I meant that check_logfiles returns critical when the error gets rotated but no new error is in the main logfile… I’ll try with loglog0log1gz and keep an eye on it. thanks anyway for providing free support here :)

    [Reply]

    lausser Reply:

    create the file /tmp/check_logfiles.trace and watch it with “tail -f /tmp/check_logfiles.trace” while the plugin is running. You will see some debugging output which might shed light on this.

    [Reply]

  34. Sergio Guzman Says:
    January 25th, 2010 at 21:52

    Hi, Great product!!! I’m trying to work with a Windows share mounted in Linux where I run check_logfiles against files created by Windows, the problem I have is that the “devino” value keeps changing but is the same file. It’s working ok, with an old version of Linux (2.6.17) but in (2.6.29) it keeps changing the devino, do you any idea what can I do?

    Maybe modifying the plugin so it ignores the devino and and treats the file as the same file?

    Thanks in advance for any help you can give me.

    [Reply]

    lausser Reply:

    Ignoring devino would render the plugin completely useless, as the rotation detection depends on this value. I have no idea what has changed in the linux kernel, but maybe there is a mount option to get the old behavior?

    [Reply]

    Sergio Guzman Reply:

    @lausser, The log file in this case is created once per day and it’s called MQlog-1.27.2010.log so there should be no problem ignoring the rotation as the file changes every day, I have the file called:

    logfile => ‘/mnt/shares/logs/MQlog-$CL_DATE_mm$.$CL_DATE_dd$.$CL_DATE_YYYY$.log’

    (I modified your plugin to have this new variables)

    CL_DATE_mm => 1 -> January CL_DATE_dd => 9 instead of 09

    Thanks, Sergio,

    [Reply]

    lausser Reply:

    Ah, ok. Instead of modifying the plugin (which you have to repeat with every new release) you could also create the logfilename in the configfile. my($sec, $min, $hour, $mday, $mon, $year) = (localtime)[0, 1, 2, 3, 4, 5]; $logfile = sprintf “MQlog-%d.%d.%d.log”, …

    and then in the @search logfile => $logfile

    [Reply]

  35. Coda Says:
    January 29th, 2010 at 13:26

    Hello lausser! I really love your script!

    I have a quick question that I’m trying to figure out: If I use a config file with multiple searches, Is there any way to use the ‘–logfile=’ parameter (on the command line) instead of setting it in the config file?

    I mean, I execute your script remotely, and I have many oracle alert logs from different servers that I would like to check, but they are all not located in the same directory, so I would like to use the same config file (multiple searches) and be able to specify the logfile name from command line.. Is that possible?

    Bests Regards, and sorry for my poor english.. Pablo.

    [Reply]

    lausser Reply:

    Sorry, there is no way to mix a config file and command line parameters. You might consider to write a little perl code in the config file where you set a $logfile variable .

    foreach ("/path1/alertlog", "/path2/alertlog",...) {
        $logfile = $_ if -f $_;
    }
    @searches = ({
        logfile => $logfile,
    ...
    Somehow you have to find out, which is the correct logfile path for the machine, check_logfiles is running on.

    [Reply]

  36. Matt Hawkins Says:
    February 1st, 2010 at 23:15

    Lausser,

    This is a great plugin and I use it a lot.

    I was wondering if there is an option to limit the amount of lines written to the protocol file. This would help in situations where there are thousands of match lines being written to the protocol file and it can fill up the /tmp file system if not caught in time.

    Matt

    [Reply]

    lausser Reply:

    I understand, but…no, there is no such limit. But you could set the $protocolretention parameter to 1 (default is 7), so protocolfiles older than 1 day will be deleted automatically.

    [Reply]

  37. Matt Hawkins Says:
    February 2nd, 2010 at 18:42

    Lausser,

    Thanks for the response.

    Matt

    [Reply]

  38. Ben Says:
    February 3rd, 2010 at 4:23

    Hi, I have an odd application that uses log file rotation (appends .0 .. .9) but doesn’t have a main log file. That means that it just overwrites the .1, .2, …. files so the only way to know which is the current log file is to sort by date. Do you have any advice for how to handle that? I’m running on windows but i could set up a script if you give me an idea how to do it. Thanks!!

    [Reply]

  39. lausser Says:
    February 3rd, 2010 at 17:35

    You mean there is always a fixed set of files (x.0, x.1, x.2,…) and you application just selects one of them, overwrites it and after a certain amount of time/lines it overwrites another one?

    [Reply]

    Ben Reply:

    @lausser, yes it’s a “ring” where it goes from .9 to .0, .1,…. .9, .0,.1 etc and the only way to know which one is the current one is to sort by date.

    i can’t notice a pattern in file changes, neither file size nor file date (when files are switched) have any logic. it’s just jumping to the next .X file after “a while” and it usually goes through a few files per day.

    [Reply]

    Ben Reply:

    @Ben, i noticed that this DOS command will list my logs by date: dir /o:d /t:w /b “C:\myapp\log\log*” but i’m not sure how to make it work inside the config file. i tried foreach (dir /o:d /t:w /b "C:\myapp\log\log*") { $logfile = $_ if -f $;} but this doesn’t work and fails with errors: Use of uninitialized value in pattern match (m//) at script/check_logfiles line 1183. Use of uninitialized value $[0] in substitution (s///) at C:/strawberry/perl/li b/File/Basename.pm line 338. fileparse(): need a valid pathname at script/check_logfiles line 1993

    Help is appreciated. Thanks!!

    [Reply]

    Ben Reply:

    @Ben, i’m sorry, the command inside the “foreach” is enclosed in backticks but they were stripped by the board apparently.

    [Reply]

    lausser Reply:

    Try this:

    $tracefile = 'C:\TEMP\check_logfiles.trace';
    @searches = ({
      type => 'rotating::uniform',
      # this is a regexp, thats why you need double backslashes
      rotation => 'C:\\myapp\\log\\log\.\d+',
      # no logfile => necessary
    });
    This should do what you intend. Always the file with the newest modification time is the current logfile. Please create an empty file C:\TEMP\check_logfiles.trace and have an eye on it. As long as this file exists, check_logfiles writes debugging informations in it. You will see what goes on behind the scenes. (Change the tracefile parameter in the config file if you prefer another path).

    [Reply]

    Ben Reply:

    @lausser, Thanks for the reply! i tried this but it’s not working, i’m getting errors for some reason…

    Use of uninitialized value in pattern match (m//) at script/check_logfiles line 1183. Use of uninitialized value $_[0] in substitution (s///) at C:/strawberry/perl/lib/File/Basename.pm line 338. fileparse(): need a valid pathname at script/check_logfiles line 1993

    [Reply]

    lausser Reply:

    Please make a DIR C:\myapp\log I’d like to know which files exist, their timestamps and size. If it’s not well formatted in a response posting here, please mail me the output.

    [Reply]

    Ben Reply:

    @lausser, Hi, here’s the dir C:\myapp\log as you requested: 02/08/2010 11:00 AM 98 log.0 02/08/2010 11:11 AM 100 log.1 02/08/2010 11:05 AM 99 log.2 02/08/2010 11:02 AM 100 log.3 4 File(s) 397 bytes This is on a test machine so the log files are just dummy ones. The trace file is not created and i’m only using the exact code you provided earlier for the config file. Thanks again for taking the time!

    lausser Reply:

    Sorry, i forgot to mention you have to create the tracefile yourself. (simply with “echo start > C:\TEMP\check_logfiles.trace”) As soon as check_logfiles detects the existance of a tracefile, it starts writing debugging stuff into it. (When you delete the file later, it will not be written any more)

    Ben Reply:

    @lausser, I created the file manually but it stays empty due to the error… when i’m not using the regex but rather assign a log file, it automatically created the TEMP dir and the trace file. so there’s no debug information with the settings you suggested. could there be anything that we’re missing here? Thanks!!

    lausser Reply:

    my fault. i didn’t use rotating::uniform for a long time, i showed you a wrong config. try this:

    $tracefile = 'C:\TEMP\check_logfiles.trace';
    @searches = ({
      type => 'rotating::uniform',
      # a dummy logfile entry. it is used only because it
      # shows the plugin which directory to look in
      logfile => 'C:\myapp\log\i_dont_exist',
      # now the pattern for rotated files (in c:/myapp/log)
      rotation => 'log\.\d+',
      criticalpatterns => '........
    });

    Ben Reply:

    @Ben, this works perfectly! Thanks so much!!

    [Reply]

  40. Matt Hawkins Says:
    February 10th, 2010 at 16:45

    Lausser,

    A lot of my logs have pipe ‘|’ symbols in their lines. expmple “syslog1[2843]: A|AEiSBh|Feb 9 07:48:27 2010|log.log.app.xmlProxySvr.5010|5010|server| 2843|det |router_utils.cpp| error”

    This causes issues with the service view in Nagios because it put everything after the | as perf data. Is there any way to have Nagios ignore that? Or would I have to create a postscript to replace the | symbols?

    [Reply]

  41. Matt Hawkins Says:
    February 10th, 2010 at 19:52

    This is what we did to remove the “|” character from the check_logfiles service output. Let me know if there is a better way of doing this.

    ”’ @searches = ({ tag => ‘nagios’, logfile => ‘/tmp/mylog.log’, criticalpatterns => ‘.*’, options => ‘supersmartscript,protocol,count’, script => sub { my $line = $ENV{CHECK_LOGFILES_SERVICEOUTPUT}; if ($line =~ m/error/) { $line =~ s/\|/\;/g; print $line; return 2; } }, }); ”’

    [Reply]

    lausser Reply:

    a supersmartscript which replaces the pipe-symbols on the fly is ok. you already found the best solution. why do you configure .* as criticalpattern and then check for a /error/ in the handler script? why not write

    ...
    criticalpatterns => 'error',
    options => 'supersmartscript',
    script => sub {
      (my $line = $ENV{CHECK_LOGFILES_SERVICEOUTPUT}) =~ s/\|/\;/g;
      print $line;
      return 2;
    }

    [Reply]

    S. Groth Reply:

    @lausser, We have many tags in one check_logfiles definition, so supersmartscript ist not the best solution in this way. If I’m using smartpostscript or supersmartpostscript, I always get an additional output “tag postscript WARNING”. The only thing I want to do ist replace | with ; in the whole output on about 20 tags wihtout changing the returncode. Any ideas ? … $options=”report=long,smartpostscript”; @searches = …

    Replace | with ;

    $postscript = sub { (my $line = $ENV{CHECK_LOGFILES_SERVICEOUTPUT}) =~ s/\|/\;/g; print $line; }

    [Reply]

    lausser Reply:

    Can you add this to the supersmartscript?

    return $ENV{CHECK_LOGFILES_SERVICESTATEID};

    [Reply]

    S. Groth Reply:

    @lausser, …on holiday for the last 4 weeks… so I have to add options => ‘……,supersmartscript’ on each tag ??? with script => sub { (my $line = ENV{CHECK_LOGFILES_SERVICEOUTPUT}) =~ s/\|/\;/g; print $line; return $ENV{CHECK_LOGFILES_SERVICESTATEID}; } No shorter way ???

    [Reply]

    lausser Reply:

    The config file is a perl program.

    my $handler = sub {
     ....
    }
    @searches = ({
      tag => 'tag1',
      script => $handler,
    ...
      tag => 'tag2',
      script => $handler,
    ...

    S. Groth Reply:

    @S. Groth, Thanks a lot – seems to work !

  42. Matt Hawkins Says:
    February 11th, 2010 at 17:06

    I tried that but I kept getting a exit code of 1 even if no matching lines in the log file. I believe I had a misplaced bracket somewhere. :)

    Anyway I have updated it to use the criticalpattern instead and it is working now.

    Thanks for the help

    [Reply]

  43. isnochys Says:
    February 19th, 2010 at 16:12

    Hi,

    using the windows executable, he cannot create a status file: cannot write status file c:\temp/export.log looks like the filename begins with “/” but should be “\” under windows Using $seekfilesdir doesn’t change it

    [Reply]

    lausser Reply:

    Can you show me your configuration file? The backslash is only necessary with the CMD.EXE shell. Inside a program or a perl-script you can use the normal slash as separator as well.

    [Reply]

    isnochys Reply:

    @lausser, “cannot write status file C:\opt\/ExportTestO”

    $seekfilesdir = “C:\\opt\\”; $protocolssdir = ‘C:\opt'; $MACROS ={ GOMESDIR => ‘D:\Projects\xxx’, GOMESDIRP => ‘D:\Projects\xxx’};

    @searches =({ tag => ‘xxxITUQA’, logfile => ‘$GxxDIR$\export\testorder\log\*.log’, warningpatterns => [ “Warning”], options => ‘noprotocol’ });

    [Reply]

    lausser Reply:

    You can’t use wildcards in the logfile …testorder\log\*.log

    The status filename is derived from the logfile name, that’s why it doesn’t work.

    [Reply]

  44. Ben Says:
    February 19th, 2010 at 19:01

    Hi, I’m trying to check the windows event log for a faulting application (error / warning). I have an event from a few weeks ago recorded (“Faulting application nstray.exe”) but the “allyoucaneat” option does not appear to work on the event log, because I’m getting “OK – no errors or warnings|evt_log_lines=0″ even when removing the seek file. Can I force it somehow to read ALL of it? Here’s my code: [code] @searches = ( { tag => ‘evt_log’, criticalpatterns => ‘.*’, type => ‘eventlog’, options => ‘eventlogformat=”%w src:%s id:%i %m”,noprotocol,nocase,maxlength=1024,report=long,allyoucaneat’, eventlog => { eventlog => ‘application’, include => { source => ‘Application Error’, eventtype => ‘error,warning’, }, }, }, );[/code] Thanks!!

    [Reply]

    lausser Reply:

    You’re right. I didn’t implement allyoucaneat for the eventlog type. I’ll have a look at it. Meanwhile you can try

    check_logfiles --config <cfgfile> --reset
    It will reset the data in the seekfile so that the plugin should scan all of the eventlog.

    [Reply]

    Ben Reply:

    @lausser, great that worked perfectly! Thanks again!

    [Reply]

  45. angry_admin Says:
    February 24th, 2010 at 13:49

    http://ideas.nagios.org/a/dtd/22035-3955

    [Reply]

  46. Derek Says:
    March 2nd, 2010 at 19:25

    Is there no way to handle a situation like this?

    /etrade/home/edwinst1/scripts/logs/backuplog/db2backup_dw_prd1_20100116_0743.log

    The datestamp is handled using $CL_DATE_ variables but I have no way of knowing what the timestamp of the log will be. Stupid app adds HHMM to the log name for some reason. Be great if I could just use _????.log and check_logfiles would just use the most recent matching log file name.

    [Reply]

    lausser Reply:

    check_logfiles already handles the weirdest situation, guessing filenames or finding logfiles by regular expression alas is beyond the scope of this tool. What you can do:

    The configfile is simply a piece of perl-code. Why not write

    code...code...code
    $logfile = what i found to be the current logfile;
    @searches = ({
    ....
      logfile => $logfile,
      options => 'allyoucaneat', #start from the beginning
    ....
    });
    An alternative would be rotating::uniform. (look above in these comments, there already is an example)
    $tracefile = 'a file where debugging will be written to';
    @searches = ({
      type => 'rotating::uniform',
      # a dummy logfile entry. it is used only because it
      # shows the plugin which directory to look in
      logfile => '/home/edwinst....backuplog/i_dont_exist',
      # now the pattern for rotated files (in ..../backuplog)
      rotation => 'db2backup_dw_prd\d+_\d{8}_\d+\.log',
      criticalpatterns => '........
    });
    Now always the newest logfile is considered the current, active logfile and all the others are considered rotated archives.

    Create the tracefile with the touch-command and watch it’s contents with tail -f.

    Play around with it, it should work.

    [Reply]

    Derek Reply:

    @lausser,

    ah, cool, thanks. replied b4 I saw that you had already.

    [Reply]

  47. Derek Says:
    March 2nd, 2010 at 23:06

    Did this but it still gets an UNKNOWN error if there are no valid log files. I “think” it works other than that… See issues?

    $scriptpath = ‘/pkgs/linux/intel/nagiosplug/et0.1/libexec'; @searches = (); foreach my $logfile (glob ‘/home/edwinst1/scripts/logs/backuplog/db2backup_dw_prd1_*.log’) { next if (-M “$logfile” > 8); push(@searches, { tag => basename($logfile), logfile => $logfile, options => ‘protocol,count’, criticalpatterns => [‘ERROR’,’DB2\sbackup.+failed’] }); } 1;

    [Reply]

    lausser Reply:

    If you expect situations where no valid logfile exists, you need to set options=>’nologfilenocry’ to suppress the UNKNOWN.

    [Reply]

  48. JK Says:
    March 4th, 2010 at 16:26

    Thank you for that great plugin. I have one issue: Is it possible to include the tag and logfile parameter in the output of the plugin?

    [Reply]

    lausser Reply:

    If you use the option report=long, then the tag is also shown in the output. Tag and logfile are available as environment variables to handler scripts. Something like this is possible:

    @searches = ({
        tag => 'xyxy',
        logfile => 'xyxy.log',
        options => 'supersmartscript',
        script => sub {
          printf "%s - %s - %s\n",
              $ENV{CHECK_LOGFILES_TAG},
              $ENV{CHECK_LOGFILES_LOGFILE},
              $ENV{CHECK_LOGFILES_SERVICEOUTPUT};
          return $ENV{CHECK_LOGFILES_SERVICESTATEID};
        },
       ....
    This adds tag and current logfile as prefix to the matched lines. Of course, this will bloat your output. Alternatively you might have a look at supersmartpostscript. (See the examples page).

    [Reply]

  49. Ruben Says:
    March 5th, 2010 at 12:17

    Hi,

    thanks a lot for your plugin, it’s very useful. I have a problem with the Windows version plugin: if I execute it with the “criticalexception” param, I get an error message that says “Unknown option”. You have below the whole error message.

    Best regards.

    C:\ARCHIV~1\NAGIOS~1\plugins>check_logfiles.exe –logfile=”test.log” –criticalpattern=”Error” –criticalexception=”Invalid credentials” Unknown option: ûcriticalexception This Nagios Plugin comes with absolutely NO WARRANTY. You may use it on your own risk! Copyright by ConSol Software GmbH, Gerhard Lausser.

    This plugin looks for patterns in logfiles, even in those who were rotated since the last run of this plugin.

    You can find the complete documentation at http://www.consol.com/opensource/nagios/check-logfiles or http://www.consol.de/opensource/nagios/check-logfiles

    Usage: check_logfiles [-t timeout] -f

    The configfile looks like this:

    $seekfilesdir = ‘/opt/nagios/var/tmp';

    where the state information will be saved.

    $protocolsdir = ‘/opt/nagios/var/tmp';

    where protocols with found patterns will be stored.

    $scriptpath = ‘/opt/nagios/var/tmp';

    where scripts will be searched for.

    $MACROS = { CL_DISK01 => “/dev/dsk/c0d1″, CL_DISK02 => “/dev/dsk/c0d2″ };

    @searches = ( { tag => ‘temperature’, logfile => ‘/var/adm/syslog/syslog.log’, rotation => ‘bmwhpux’, criticalpatterns => [‘OVERTEMP_EMERG’, ‘Power supply failed’], warningpatterns => [‘OVERTEMP_CRIT’, ‘Corrected ECC Error’], options => ‘script,protocol,nocount’, script => ‘sendnsca_cmd’ }, { tag => ‘scsi’, logfile => ‘/var/adm/messages’, rotation => ‘solaris’, criticalpatterns => ‘Sense Key: Not Ready’, criticalexceptions => ‘Sense Key: Not Ready /dev/testdisk’, options => ‘noprotocol’ }, { tag => ‘logins’, logfile => ‘/var/adm/messages’, rotation => ‘solaris’, criticalpatterns => [‘illegal key’, ‘read error.$CL_DISK01$’], criticalthreshold => 4 warningpatterns => [‘read error.$CL_DISK02$’], } );

    C:\ARCHIV~1\NAGIOS~1\plugins>

    [Reply]

    lausser Reply:

    Did you copy&paste the error message to your comment? I see a strange character here: ….Unknown option: ûcriticalexception …

    The –criticalexception does work, i just doublechecked it. Please try it again.

    check_logfiles.exe --logfile "test.log" --criticalpattern "Error" --criticalexception "Invalid credentials"
    Please type the comand yourself, do not copy&paste from this website. I have a suspicion that wordpress messes up the page contents (as it does with the double-dash)

    [Reply]

  50. Steven Says:
    March 10th, 2010 at 19:10

    “In principle check_logfiles scans a log file until the end-of-file is reached. The offset will then be saved in a so-called seekfile. The next time check_logfiles runs, this offset will be used as the starting position inside the log file”

    Where is this so-called seekfile ? I want to delete it during my tests…

    [Reply]

    lausser Reply:

    By default it will be in the directory /var/tmp/check_logfiles (unless you specify another directory with the $seekfilesdir parameter). The filename of the seekfile is composed from the tag and the logfile’s name.

    [Reply]

  51. jack Says:
    March 23rd, 2010 at 15:46

    Hi,

    Problem with check_logfiles in command line, I have a logfile with this message “SCR 313 KO” and “SCR 313 OK” ( without “). Whith the following command line, check_logfile send a critical alert for KO an OK messages.

    /usr/lib64/nagios/plugins/check_by_ssh -H xx.xx.xx.xx -C ‘./libexec/check_logfiles –logfile=’/tmp/jmi.log’ –criticalpattern=’SCR 313 KO”

    I try with the following command but i have the message Could not open pipe :

    /usr/lib64/nagios/plugins/check_by_ssh -H xx.xx.xx.xx -C ‘./libexec/check_logfiles –logfile=’/tmp/jmi.log’ –criticalpattern=”SCR 313 KO”‘ Could not open pipe: /usr/bin/ssh xx.xx.xx.xx ‘./libexec/check_logfiles –logfile=/tmp/jmi.log –criticalpattern=”SCR 313 KO”‘

    Any ideas ? Many thanks

    [Reply]

    lausser Reply:

    That’s a problem with check_by_ssh and quoting/escaping. This may help: http://osdir.com/ml/network.nagios.plugins/2007-06/msg00047.html I don’t use check_by_ssh myself, ask the nagios-users mailing list and surely somebody will show you the trick.

    [Reply]

  52. Benny Says:
    March 23rd, 2010 at 21:55

    Hi all,

    I am using this plugin to check Windows event logs, and I’m pretty damned happy that it’s working great! It’s the only plugin/agent I’ve found yet that is accurate and consistent. Happy day.

    However, I notice that I can’t seem to get command-line checking working. Ie, from your example above:

    check_logfiles –type ‘eventlog:eventlog=application,include,source=Windows Update Agent,eventtype=error,eventtype=warning,exclude,eventid=15,eventid=16′

    Any permutation of command-line stuff I try simply gives me the usage message, including your example. Is this no longer supported? It would be nice to build a command like this rather than having to distribute a bunch of .cfg files like I am at the moment while I test.

    Thank you!

    Benny

    [Reply]

    lausser Reply:

    Good catch. The example works with the Windows Power Shell, but it fails with the DOS box. It’s the single quotes. I debugged it and saw, when you use single quotes in a DOS box, the argument of the –type parameter is truncated after the “Windows”. I didn’t know that a space character inside single quotes behaves like that. Please use double quotes instead, then it will work. I’ll also correct the example.

    check_logfiles --type "eventlog:eventlog=application,include,source=Windows Update Agent,eventtype=error,eventtype=warning,exclude,eventid=15,eventid=16"

    [Reply]

  53. Martin Baddie Says:
    March 23rd, 2010 at 22:42

    How can I create different alarms for every detected line. for example between 2 consecutive runs 10 errors was detected but when check_logfiles runs it only detects and prints out last error line (10th error). How can I change this behaviour so I can see all errors on one alarm even it has lots of characters (words)

    [Reply]

    lausser Reply:

    Have a look at the options. report=long might be what you’re looking for. It will output not only the last matching line but all ot the matches.

    [Reply]

    Martin Baddie Reply:

    @lausser,

    Still same. I have also tried to use http://labs.consol.de/nagios/check_logfiles/check_logfiles-beispielecheck_logfiles-examples/ “Example 3: Again, but this time with a notification for each single hit” but I think I am missing a point here. I can successfully use send_ncsa but using send_ncsa in check_logfiles doesn’t work

    [Reply]

    lausser Reply:

    First, have a look at the internal processing with

    touch /tmp/check_logfiles.trace
    tail -f /tmp/check_logfiles.trace
    Don’t forget to delete this file later. As long as it exists, check_logfiles will write debugging info into it. Did you set the scriptpath correctly? It must contain the directory where your send_nsca binary can be found.

    [Reply]

    Martin Baddie Reply:

    @lausser, I found the problem. In your examples at “Example 3: Again, but this time with a notification for each single hit” you have forgotten to mention to put

    options => 'script,protocol,nocount'
    

    line to @searches so no send_nsca was sent using send_nsca. Please correct http://labs.consol.de/nagios/check_logfiles/check_logfiles-beispielecheck_logfiles-examples/ so everyone can benefit .

    Regards.

    [Reply]

    lausser Reply:

    Your right. Thanks a lot for pointing me to this!

  54. Benny Says:
    March 24th, 2010 at 17:02

    Hmmmm, almost… I use the double quotes now, like: check_logfiles –type “eventlog:eventlog=application,include,eventtype=error,eventid=9999,options=winwarncrit” and it gives me a default_lines=1 (so it seems to have matched my test 9999 event), but the output is “OK – no errors or warnings”. I’ve also tried it with criticalpatterns=.* appended, no change. Oooooooh, I’m so close to getting this working!

    [Reply]

    lausser Reply:

    Very close :-)

    check_logfiles \
    --type "eventlog:eventlog=application,include,eventtype=error,eventid=999" \
    --winwarncrit

    [Reply]

  55. Mike Says:
    March 24th, 2010 at 17:27

    Hi all

    I’m trying to set up a config for monitoring the Windows system log and so far I have it working

    but I’d like to set it to ignore errors with a specific text string and I’m having trouble getting the syntax correct

    would this example be a way around it?

    @searches = ( { tag => ‘system’, type => ‘eventlog’, eventlog => { eventlog => ‘system’, eventtype => ‘error,warning’, criticalexception=> ‘SomeText’ }, }, }, );

    [Reply]

    lausser Reply:

    be careful. it’s criticalexception (singular) when used as a command line parameter, but it’s criticalexceptions (plural) when used in the config file.

    [Reply]

    Mike Reply:

    @lausser,

    ah thanks that may explain why it didn’t work before

    I’ll set up the config

    @searches = ( { tag => ’system’, type => ‘eventlog’, eventlog => { eventlog => ’system’, eventtype => ‘error,warning’, criticalexceptions=> ‘SomeText’ }, }, }, );

    and see how that works

    [Reply]

    Mike Reply:

    @Mike,

    been having no luck can anyone help?

    all I want to do is create a config that will check the windows system eventlog and notify for all warnings or errors conditions (using winwarncrit) except for errors which have a specifed text string

    I’ve tried all sorts of settings and either winwarncrit overrides any criticalexceptions strings I use or real errors don’t get picked up

    there is a pint in it for the first correct answer

    [Reply]

    lausser Reply:

    @searches = ( {
      tag => ’system’, 
      type => ‘eventlog’, 
      eventlog => { 
        eventlog => ’system’,
        eventtype => ‘error,warning’,
      },
     criticalexceptions=> 'SomeText',
    });

    [Reply]

  56. lausser Says:
    March 26th, 2010 at 0:00

    This won’t work. Please don’t mix criticalexceptions with eventlog. criticalexceptions do not belong inside this hash.

    [Reply]

  57. Benny Says:
    March 26th, 2010 at 20:06

    Just a quick comment on this page – in the section talking about oraclealertlog, you have what is intended to be a link to a script (I think), that has empty anchors around it. Probably not intentional?

    Benny

    [Reply]

    lausser Reply:

    Thanks! I fixed it.

    [Reply]

  58. Mike Says:
    March 29th, 2010 at 10:26

    hi all

    just a quick question if I want to specify more then 1 criticalexception what would be the format in a config file

    would it be:- 1) criticalexceptions=> ‘Error text’, ‘Error text2′,

    or

    2) criticalexceptions=> ‘Error text’, criticalexceptions=> ‘Error text2′,

    or is there another way?

    cheers

    Mike

    [Reply]

    lausser Reply:

    It’s an array reference.

    criticalexceptions => [‘pattern1′, ‘pattern2′, ‘pattern3′, ….]

    [Reply]

    Mike Reply:

    @lausser,

    cheers mate, sorry for the slow reply, been busy on other things

    [Reply]

  59. ledskof Says:
    March 31st, 2010 at 21:27

    I don’t see how to get allyoucaneat to work. I delete the seek file and it just CRITICALS again:

    $seekfilesdir = ‘c:\temp';

    $MACROS = { LOGDIR => ‘C:\temp’ };

    @searches = ({ tag => ‘test.log’, type => ‘rotating’, logfile => ‘$LOGDIR$\test.log’, rotation => “test.log\\d{8}”, options => ‘noprotocol,allyoucaneat’, criticalpatterns => ‘!20′, });

    [Reply]

  60. Ray Says:
    April 13th, 2010 at 15:32

    On windows, I am calling this config:

    @searches = ({ tag => ‘InternalServerError’, logfile => ‘C:\tianshan\logs\rtspproxy.log’, criticalpatterns => [ “500 Internal Server” ] });

    C:\ProgramFiles\NSClient++\scripts\check_logfiles3.2>check_logfiles.exe –config check_streamsmith.cfg OK – no errors or warnings|InternalServerError_lines=0 InternalServerError_warnings=0 InternalServerError_criticals=0 Inter nalServerError_unknowns=0

    It returns OK, but if I run a findstr on the log, I get the entries that should be throwing a critical.

    findstr /c:”500 Internal Server” c:\tianshan\logs\rtspproxy*

    c:\tianshan\logs\RtspProxy.log:03-12 10:22:23.661 [ INFO ] Request processed: session[2060169217] seq[1191] verb[SETUP] duration=[110/110]msec startline [RTSP/1.0 500 Internal Server Error] socket(000009bc 10.12.105.88:4349) c:\tianshan\logs\RtspProxy.log:03-12 10:22:23.661 [ DEBUG ] SOCKET: 000009bc 10.12.105.88:4349 RTSP/1.0 500 Internal Serv er Error..CSeq: 1191..Method-Code: SETUP..Notice: RTSP/1.0 500 Internal Server Error..OnDemandSessionId: 2f5c9a130000004080 00000041258762..Server: ssm_NGOD2/1.10..Session: 2060169217..Supported: com.comcast.ngod.r2,com.comcast.ngod.r2.decimal_npt s..Date: 12 Mar 2010 15:22:23.661 GMT…. c:\tianshan\logs\RtspProxy.log:03-12 10:22:25.364 [ INFO ] Request processed: session[2060234754] seq[1192] verb[SETUP] duration=[16/16]msec startline [RTSP/1.0 500 Internal Server Error] socket(000009bc 10.12.105.88:4349) c:\tianshan\logs\RtspProxy.log:03-12 10:22:25.364 [ DEBUG ] SOCKET: 000009bc 10.12.105.88:4349 RTSP/1.0 500 Internal Serv er Error..CSeq: 1192..Method-Code: SETUP..Notice: RTSP/1.0 500 Internal Server Error..OnDemandSessionId: 2f5c9a130000004080 00000041258762..Server: ssm_NGOD2/1.10..Session: 2060234754..Supported: com.comcast.ngod.r2,com.comcast.ngod.r2.decimal_npt s..Date: 12 Mar 2010 15:22:25.364 GMT…. c:\tianshan\logs\RtspProxy.log:03-12 10:22:27.208 [ INFO ] Request processed: session[2060300291] seq[1193] verb[SETUP] duration=[16/16]msec startline [RTSP/1.0 500 Internal Server Error] socket(000009bc 10.12.105.88:4349) c:\tianshan\logs\RtspProxy.log:03-12 10:22:27.208 [ DEBUG ] SOCKET: 000009bc 10.12.105.88:4349 RTSP/1.0 500 Internal Serv er Error..CSeq: 1193..Method-Code: SETUP..Notice: RTSP/1.0 500 Internal Server Error..OnDemandSessionId: 2f5c9a130000004080 00000041258762..Server: ssm_NGOD2/1.10..Session: 2060300291..Supported: com.comcast.ngod.r2,com.comcast.ngod.r2.decimal_npt s..Date: 12 Mar 2010 15:22:27.208 GMT….

    [Reply]

    lausser Reply:

    That’s normal behaviour. check_logfiles only scans lines which were added since the last run of check_logfiles. Add a new line to the logfile and you will see.

    [Reply]

    ray Reply:

    @lausser,

    Confused, so it will never find it in the initial scan? even if I delete the seek files that keep track of where in the file it has looked?

    [Reply]

    lausser Reply:

    If you delete the seekfile, check_logfiles will think it’s running for the very first time, so it’ll not scan the entire file but only initialize itself. (set the pointer to the end-of-file and save that position in the seekfile) However, you can force it to start from the beginning of the logfile during the initial run by using the option “allyoucaneat”

    @searches = ({ 
      tag => 'InternalServerError',
      logfile => 'C:\tianshan\logs\rtspproxy.log',
      criticalpatterns => [ "500 Internal Server" ],
      options => 'allyoucaneat',
    });

    [Reply]

    Dan Wittenberg Reply:

    @lausser, “If you delete the seekfile, check_logfiles will think it’s running for the very first time, so it’ll not scan the entire file but only initialize itself. ” – I’m seeing this not really work right. It seems when I run the first time it’s finding really old stuff, plus I’ve also noticed that every night at 1:00am it seems to re-scan again though I don’t have any reset commands, the protocol retention=3, so not sure why it would keep finding old stuff or alerting every day at same time. Any ideas?

    [Reply]

    lausser Reply:

    Create a file /tmp/check_logfiles.trace and look inside. The 1:00am is not hard-coded in check_logfiles. You have to find out what happens at 1:00 on your machine, i can’t tell you that. The tracefile will tell you which logfiles were accessed and why.

  61. Günter Says:
    April 14th, 2010 at 12:03

    Hello, I try to call check_logfiles with check_mk’s mpre. My problem is that the output of check_logfiles seems to be not the standard nagios output, which is also expected by check_mk. I get the following output: OK – no errors or warnings|OneBaseLogfile_lines=0 OneBaseLogfile_warnings=0 OneBaseLogfile_criticals=0 OneBaseLogfile_unknowns=0 The standard nagios output should be SERVICE STATUS: Information text as you can read at http://nagiosplug.sourceforge.net/developer-guidelines.html#PLUGOUTPUT Is it possible to include the tag infront of the status, like OneBaseLogfile OK – no errors or warnings|OneBaseLogfile_lines=0 OneBaseLogfile_warnings=0 OneBaseLogfile_criticals=0 OneBaseLogfile_unknowns=0 Or did I miss something and there is already an option?

    [Reply]

    Günter Reply:

    @Günter, Hello, I think there is something wrong with check_mk. Have to check some more details. Sorry!

    [Reply]

  62. Ollie Bridges Says:
    April 16th, 2010 at 8:00

    Hi there,

    Is it possible to exit check_logiles after a certain amount of errors (and then move onto the psotscript)? I cannot find a way to access the hitcount anywhere – am I missing something?

    [Reply]

    lausser Reply:

    You cannot abort the search. But you can set a counter-variable in the config-file which you increment with the help of a handler script every time a pattern matches. Then you have the number of hits available in the postscript.

    [Reply]

  63. Pablo Says:
    April 21st, 2010 at 13:18

    Hi there, I think I’m having some problems with the value of the variable “devino” in the tracefiles. As far as I know, this variable lets the script detect the position of the logfile in the filesystem and allows it to search for rotations. The length of these values might vary from tracefile to tracefile, but there are many tracefiles where this value seem to depend on the size of the logfile found or on something else, thus leading to tracefile sizes which vary between 1 and several KBs (even MB sometimes). Is this a normal feature or am I doing something wrong? Thank you!

    [Reply]

    lausser Reply:

    devino is composed of the device number (which identifies the filesystem) and the inode number. It is a unique identifier for a file and helps check_logfiles to detect situations where example.log is not the same file example.log during the last run. You should not have tracefiles, except when something is not working correctly. So just delete them.

    [Reply]

    Pablo Reply:

    @lausser, Hi again, first of all, I would like to thank you Lausser for the description of devino. I’m afraid I have misused the word “tracefiles”. I meant the seekfiles, where the script saves the current status (with all these variables like runcount, serviceoutput, thresholdcnt and the devino variable self), I hope this one is the correct word :-) Sorry for the inconvenience and thank you again.

    [Reply]

  64. Benny Says:
    April 21st, 2010 at 22:07

    Hello all! I just have a quick question about –sticky. I am using check_logfiles to check Windows event logs, and I’m using the sticky flag to require more than one failure before Nagios notifies (this cuts down on nuisance alerts drastically). Is the –sticky “timer” started when check_logfiles first registers a hit on an event ID, or is it when the event was logged? I have had several hits in my event logs that should have tripped notifications but didn’t, and I’m wondering if my assumption that the timer starts when check_logfiles registers the first hit is incorrect. Thank you!

    [Reply]

    Benny Reply:

    @Benny, Any clarification on this? If someone could tell me when the timer starts counting for the –sticky option, I would very much appreciate it!

    [Reply]

    lausser Reply:

    Hi, the timer starts with the regular expression match. (the runtime of the plugin).

    [Reply]

    Benny Reply:

    @lausser, Thank you Lausser! With your verification, I think I know what I’m doing wrong now (my sticky value needs to be longer). I appreciate it!

    [Reply]

  65. Anatoly Rabkin Says:
    April 22nd, 2010 at 14:52

    Hi,

    First of all – thank for this great script, it’s really makes life more easier :).

    I have the following question: I need to search for some string and to present not just the line that contains this pattern, but the X lines before and Y lines after that pattern. I know that its possible to do such logic with supersmartpostscript option but I getting some issues here. I don’t want to add this logic to your script or something like this, can you please suggest me how to perform it better?

    Thanks in advance

    [Reply]

    lausser Reply:

    Sorry, there is no way to display the “surroundings” of a matching line, except implementing it with supersmart scripts.

    [Reply]

  66. Matthias Says:
    April 22nd, 2010 at 17:17

    Hello Lausser,

    first of all, thank you very much for this plugin. I use it to monitor nginx (a webserver). Since yesterday, we are getting a lot of overlong uris, and Nagios has real problems with what nrpe/check_logfiles puts out. I tried to set a maxlength, but this does not make the output shorter. check_logfiles does read my config file (I put ‘noperfdata in there, and it stopped putting that in afterwards).

    Here is my config

    @searches = ( { tag => ‘nginx errors’, logfile => ‘/data/log/nginx/error_log’, options => ‘noperfdata,maxlength=50′, criticalpatterns => ‘[error]’, criticalexceptions => [‘prematurely’,’http:/tag’,’client sent invalid userid’,’keepalive’], } );

    and here is what the output looks like:

    nagios@ccp3 /var/run $ /usr/lib64/nagios/plugins/check_logfiles –config /usr/lib64//nagios/plugins/check_logfiles.cfg CRITICAL – (4 errors in check_logfiles.protocol-2010-04-22-10-10-53) – 2010/04/22 10:10:49 [info] 3745#0: *22630386 client sent too long URI while reading client request line, client: 64.106.215.72, server: ccp3, request: “GET /am_bidder?admeld_publisher_id=221&admeld_request_id=e6059968-ccb9-42c2-81da-d9e8499cab93&admeld_tag_id=259056&admeld_user_id=304a8a05-3ca3-4a29-b3c4-96ecc6b98a9a&admeld_website_id=541&external_user_id=C3D0C0AD5F85BB4BB8454457023F7004&ip_address=98.229.46.144&language=en-us&max_response_time=150&position=below&refer_url=http%3a%2f%2ftag.admeld.com%2fad%2fiframe%2f221%2ftmz%2f300x250%2faf-bottom-right%3ft%3d1271949048496%26tz%3d240%26hu%3d%26ht%3djs%26hp%3d0%26url%3dhttp%253A%252F%252Fwww.tmz.com%252Fa0e4cc1a-08e3-42ec-bbba-4e3447d7e890%253Fd757bd5d029f4634864ef694fcbbf9d8%252F3f2a29b915b64a509188eacff137fe43d757bd5d029f4634864ef694fcbbf9d8.js%253Fspd%253D2%2526atdmt%253D%2526a4eclickmacro%253Dhttp%25253A%25252F%25252Fg.va.bid.invitemedia.com%25252Fpixel%25253FreturnType%25253Dredirect%252526key%25253DClick%252526message%25253D%2525257B%25252522nhpgvbaVQ%25252522%2525253A%25252B%25252522n0r4pp1n-08r3-42rp-ooon-4r3447q7r890%25252522%2525252C%25252B%25252522havirefny_fvgr%25252522%2525253A%25252B5702%2525252C%25252B%25252522fhoVQ%25252522%2525253A%25252Bahyy%2525252C%25252B%25252522yvarvgrzVQ%25252522%2525253A%25252B97022%2525252C%25252B%25252522cho_yvarvgrz%25252522%2525253A%25252B29822%2525252C%25252B%25252522vai_fvmr%25252522%2525253A%25252B70074%2525252C%25252B%25252522mvc_pbqr%25252522%2525253A%25252B%2525252201841%25252522%2525257D%252526redirectURL%25253Da4edelim%2526a4ehtm%253Da0e4cc1a-08e3-42ec-bbba-4e3447d7e890a4edelima4eflag%2526fn%253Da4edelim%2526imgSrv%253DHTTP%253A%252F%252Frmd.atdmt.com%252Ftl%252FO6SOHINVEINV%252Fa4edelim%2526armver%253Difb.9%26refer%3dhttp%253A%252F%252Ftag.admeld.com%252Fad%252Fiframe%252F221%252Ftmz%252F300x250%252Fbf-bottom-right%253Ft%253D1271949043010%2526tz%253D240%2526hu%253D%2526ht%253Djs%2526hp%253D0%2526url%253Dhttp%25253A%25252F%25252Fwww.tmz.com%25252F%2526refer%253D&size=300×250&time_zone=240&url=http%3a%2f%2fwww.t …

    any idea why no cutoff happens there?

    Thanks,

    Matthias

    [Reply]

    lausser Reply:

    My fault. Maxlength seems to be ignored in the last release(s). I’ll fix it. A new version 3.3 will come soon.

    [Reply]

  67. Max Says:
    April 27th, 2010 at 16:22

    Hi Lausser,

    I have logfiles that are rotated in the following format: AssetServer.log, AssetServer.log.2010-04-26,

    How can I specify this kind of format in the rotation parameter? I

    [Reply]

    lausser Reply:

    Hi, this should do the trick:

    ...
    logfile => '..../path/.../AssetServer.log',
    rotation => 'AssetServer\.log\.\d{4}\-\d{2}\-\d{2}',
    ...

    [Reply]

  68. Max Says:
    April 30th, 2010 at 15:49

    Lausser,

    Sometimes I dont get the line in the logfile that got flagged by check_logfiles in the Nagios email.

    Example: Date/Time: Fri Apr 30 09:14:07 EDT 2010

    CRITICAL – (1 errors) – 09:13:55:151

    Additional Info:

    tag AssetServer CRITICAL 09:13:55:151

    The actual line that was flagged was:

    09:13:55:151|0226-TABLE {SYS_R_LAST_SYSERR} ACTION {DEL} DWL {0} – {Source CREDIT_BBG_OTF} {Component CREDIT_BBG_OTF} {Daemon MKV_DS_CREDIT} {ErrorLevel 3} {ErrorType 8} {Date 20100430} {Time 9134600} {Error STATUS Disconnected} {SeqId 377434}

    Is there something in this line that is preventing Nagios/check_logfiles from sending it in the email? It looks like maybe it does not like the “|” symbol ?

    [Reply]

  69. Michael Says:
    May 3rd, 2010 at 16:58

    Hi,

    it seems there is a problem when a Windows eventlog branch does not exist. We try to place an identical basic cfg-file template for check_logfiles on all our servers and we configured searches for all possible event sources (System, Application, Security, Powershell, and so on). If for example there is no powershell installed on a system, this eventlog branch does not exist. Now the warningexceptions no longer work and all warning events are reported regardless wether they are excluded by the filter or not.

    [Reply]

    lausser Reply:

    Please post your configuration file. If an eventlog branch does not exist and you have a …eventlog= configured, you normally get an error. With the option “nologfilenocry” the missing eventlog is silently ignored.

    [Reply]

    Dan Wittenberg Reply:

    @lausser, We’ve seen a similar problem to this, where a machine might be in a power-down state (say from UPS shutdown) and the check runs and then suddenly I’ll see a flood of ” open Eventlog The interface is unknown.” errors from every single check. If I add the nologfilenocry would it also address these errors?

    [Reply]

    Dan Wittenberg Reply:

    @Dan Wittenberg, After testing the answer is no, it still appears to complain, I’m guessing because of the error coming back. Is this a bug or expected behavior?

    check_logfiles EE_UU_TTCHECK_LOGFILES INTERNAL ERROR open Eventlog The interface is unknown.

    That’s the error I’m seeing.

    [Reply]

  70. haitauer Says:
    May 6th, 2010 at 9:19

    Hi Lausser,

    any news on post 25, 26 and 28?

    thank you!

    [Reply]

  71. Xiaoming Says:
    May 6th, 2010 at 16:28

    Hi, can I check that the logfile must changed between the two checks ? If the log unchanged, the concerned process is locked, then a critical event must send.

    Thanks

    [Reply]

    lausser Reply:

    @searches = ({
      tag => 'xx',
      logfile => 'test.log',
      criticalpatterns => '!.*',
      options => 'nologfilenocry',
    });
    This will alert when no lies have been added since the last run.

    [Reply]

  72. Ryan Says:
    May 12th, 2010 at 17:42

    This is an exceptional tool and I thank you for your continued development.

    Are you aware of anyone that has developed a conversion tool to take HP OVO logfile templates and convert them to check_logfiles cfgs? If it doesn’t already exist I am assuming I will have to write one because the number of templates I will have to convert is, um, insane.

    [Reply]

    lausser Reply:

    Hi, unfortunately i never hear of anybody who migrated this kind of templates to check_logfiles

    I never worked with OVO myself, but i’d like to see how such a template looks like. Can you mail me an example please?

    [Reply]

    Ryan Reply:

    @lausser, mailed….

    [Reply]

  73. haitauer Says:
    May 18th, 2010 at 15:31

    Hi Lausser,

    hmm, any news on post 25, 26 and 28?

    thank you!

    [Reply]

  74. Benny Says:
    May 25th, 2010 at 14:59

    –report=long & –maxlength seem to cancel each other out?

    I am testing a Windows 2008 server, and while testing some event log stuff, I started getting errors from NSClient++ because the output was too long.

    I started experimenting with –maxlength, and it didn’t clear up the error untli I removed –report=long from the command.

    Is this intended behavior? I see a lot of examples above of using –maxlength and –report=long together…

    Thank you!

    [Reply]

    lausser Reply:

    Thats a problem with nsclient++ with long/multiple lines

    [Reply]

  75. René Says:
    May 25th, 2010 at 16:32

    Hallo Herr Lausser,

    Gibt es eine Möglichkeit mit check_logfiles nicht einen counter der gefundenen Zeilen des Logs auszugeben, sondern für jede einzelne gefundene Zeile eine Ausgabe an das Nagios sendet?

    Das Ziel wäre es, für jede einzeln gefunden Zeile, einen Alarm zu generieren.

    [Reply]

    lausser Reply:

    http://labs.consol.de/nagios/check_logfiles/check_logfiles-beispielecheck_logfiles-examples/ Beispiel 3. Hinter dem Key script kann eine beliebige Perl-Funktion stehen. Entweder man macht es so, wie im Beispiel oder man programmiert in dem Script seine eigene Methode zum Versand des Treffers.

    ...
    script => sub {
      my $trefferzeile = $ENV{CHECK_LOGFILES_SERVICEOUTPUT};
      # man kann alles mögliche mit $trefferzeile anstellen
      # z.b. mit send_nsca versenden
      # oder in eine Datei schreiben
      # oder mit einem x-beliebigen Script z.b. an HP Openview schicken....
      return 0; # nach der Treffer-Verarbeitung nicht weiter beachten
    },
    ...

    [Reply]

    lausser Reply:

    I think, your Oracle tool just looks at used space vs. allocated space. Now if your data grow, the database automatically allocates more space and the percentage drops. (for example, it approaches 95%, then space is allocated and the percentage drops to 90%. This might lead to unnecessary alerts, because the db handles the “error” automatically). check_oracle_health takes the maximum allocatable space into account, thats why you usually get far less percentages.

    [Reply]

  76. Micha Bloch Says:
    May 27th, 2010 at 16:21

    Hi many thanks for the plugin :)

    I have a little problem with the plugin. The problem is, that the Plugin always said the Logfiles looks good, but they don´t looked good. When i try

    ./check_logfiles -t 30 –logfile=/var/log/server/mail/error-mail.log –warningpattern=’ldap’

    he pulls out:

    OK – no errors or warnings|default_lines=0 default_warnings=0 default_criticals=0 default_unknowns=0

    i don´t really understand this :/. When i try it with my config i get the same message:

    /usr/lib/nagios/plugins/check_logfiles -f checklog.cfg

    OK – no errors or warnings|Mailserver_lines=102 Mailserver_warnings=0 Mailserver_criticals=0 Mailserver_unknowns=0

    Configfile:

    $seekfilesdir=’/var/log/server'; $protocolsdir=’/var/log/server';

    @searches = ( { tag => ‘Mailserver’, logfile => ‘/var/log/server/mail/error-mail.log’, archive => ‘/Var/log/server/mail/archive/’, rotation => ‘LOGLOGDATE8′, options => ‘allyoucaneat’, criticalpattern => ‘ldap’ } )

    I use a syslog server wich is also installed on the Nagios Server. Rsyslog put every message via the severity in an another file.

    i hope you can help me :)

    [Reply]

    lausser Reply:

    Be careful: It must be criticalpatterns (plural) in the configfile.

    [Reply]

    Micha Bloch Reply:

    @lausser, oh man i worked to long yesterday -.-°

    Thank you now it works.:)

    Last question…. check_logfiles should read my logrotated files as well, but i don´t know the right decleration for the rotation option. My logrotated files have the name name.of.the.server.log-20100525

    Maybe you can help me again :)

    Thx a lot

    [Reply]

  77. IT-COW | Icinga - erweiterte Konfiguration *UPDATE* Says:
    June 1st, 2010 at 5:07

    […] nächstes wird darauf aufbauend das PlugIn check_logfiles benötigt, auf jeden Fall für den Windows-Server – der Link enthält auch direkt eine […]

  78. haitauer Says:
    June 2nd, 2010 at 10:01

    Hi Lausser,

    are you blind?? / bist Du blind??

    Ignoring questions again and again is not nice!

    :-(

    [Reply]

  79. Frank Says:
    June 7th, 2010 at 17:59

    Hello Gerhard,

    I have the problem that we are scanning various logs on application servers which grow quite large during the day (rolling over some time during the night). For now we are checking the logs for certain strings every 5 minutes. Now we have come to a point that the checks start stacking as the period of time of minutes is not long enough to parse the logs. Therefor we see the load of the system increase steadily since the checks are waiting for each other. Before we start writing a wrapper to check, as the process starts, if there is another process still running, I was wondering if you or anyone else reading this, made the same experience and maybe already has a solution at hand.

    Thanks for any hint!

    Cheers Frank

    [Reply]

    lausser Reply:

    I found your post in the spam folder :-) The size of a logfile isn’t a problem. It will not be scanned entirely, only the portion which was added since the last run of check_logfiles. Do your files really grow so fast in 5 minutes? Do you have lots of patterns? If it’s really impossible to keep up within a check_interval, you might try the daemon mode and run check_logfiles independent from Nagios.

    check_logfiles --config <cfgfile> --daemon [interval]
    With the daemon-parameter, check_logfiles puts itself in the background and starts a search every 5 minutes (which is the default. You can change it, interval is the number of seconds to sleep between each run) You can stop the damon with SIGTERM. But: in this standalone mode you must take care to send the checkresult back to Nagios. Have a look at the examples, there is a configuration which uses send_nsca to report every hit

    [Reply]

    Frank Reply:

    @lausser, good you found it! :) And thanks for your reply! Well, for now there are 6 application servers behind a loadbalancer and each app-server writes a log which at the end of the day has a size of 6-10GB for the application only. Not sure if you consider that “large” ;) The patterns to search for on those app-servers are not that many, but apart from the app-logs there are a couple of others which add to the amount of time the check runs. So overall – and yes, I need to dig deeper – I assume that the processes I see and that at one point actually prevented the server from keeping up the service(!!) which really hurt – are processes that got stuck from previous runs. I might try that daemon version but then I have to get the NSCA up on those servers as well to get the results back to the Nagios server… I will have an eye on the duration first and the try the daemon mode.

    Thanks again for ALL the plugins you have provided :)

    [Reply]

    lausser Reply:

    10GB at the end of the day is not large, but the 37MB every 5 minutes are :-) Depending on your hardware and processes running at the same time you might have hit a limit here. I tried it on a weak testmachine (1GHz Atom)

    shinken$ ls -l maillog 
    -rw-rw-r-- 1 shinken shinken 36266022 Jun 14 17:38 maillog
    shinken$ cat cfg.cfg 
    @searches = ({
      logfile =&gt; 'maillog',
      criticalpatterns =&gt; ['woot', 'w00t', 'error', 'FATAL', 
          '&#92;d+&#92;s+&#92;d+&#92;s+timestamp', 'he?a?d.?[ -]?(ache)', 'na?u?a?se?a?'],
    });
    shinken$ time check_logfiles --config cfg.cfg
    CRITICAL - (328266 errors in cfg.protocol-2010-06-14-17-40-04) - Jun  3 22:00:33 schinken sendmail[2762]: accepting connections again for daemon MTA ...|default_lines=417825 default_warnings=0 default_criticals=328266 default_unknowns=0</p>
     
    <p>real    1m19.646s
    user    1m18.920s
    sys     0m0.715s
    The processor is not really powerful and at least two of the regexps are not trivial, but 1m19 with 100% cpu means some load. I know people who are running logfile checks with such big amounts of data, but on pure syslog servers. The picture changes when you add that load to a production machine which has the priority to provide a service. So packing all the checks in one @searches-array and running check_logfiles in daemon mode (maybe with a lowered priority) should be the best alternative. Unfortunately i see no performance tuning potential any more. I already profiled and tweaked the perl code and ther regexp routines to a max.

    [Reply]

    Frank Reply:

    @lausser, Hello again Gerhard, thank you very much for the research and the suggestions!! I will forward them to the admins who are now taking care of the Nagios-System I set up. Thanks again – for all the plugins you provide! Keep it up! :D

    Cheers, Frank

    [Reply]

  80. Joe Says:
    June 16th, 2010 at 15:09

    Hi Lausser,

    at first thank you for your great plug-in and support.

    I have logfiles that with the beginning of a new day are started with a new file. They are named in the following format: logfileYYYY_MM_DD.log e. g. logfile2010_06_16.log.

    There are criticalpatterns defined in the conf that should be sticky. Each night with the beginning of a new logfile a new seekfile is written with no sticky error from last run (see excerpt from tracefile) and the error disappears in Nagios. Is there a possibility to keep the sticky information?

    Thanks in advance.

    /etc/nagios/cl.conf: @searches = ( { tag => ‘test’, logfile => ‘/var/log/logfile$CL_DATE_YYYY$$CL_DATE_MM$$CL_DATE_DD$.log’, rotation => ‘logfile\d{4}_\d{2}_\d{2}\.log’, criticalpatterns => ‘.CRITICAL.‘, options => ‘nologfilenocry,sticky=85400,noprotocol’, } )

    /tmp/check_logfiles.trace: Tue Jun 15 23:58:24 2010: ==================== /var/log/logfile2010_06_15.log ================== Tue Jun 15 23:58:24 2010: found seekfile /var/tmp/check_logfiles_test._var_log_logfile2010_06_15.log.test Tue Jun 15 23:58:24 2010: LS lastlogfile = /var/log/logfile2010_06_15.log Tue Jun 15 23:58:24 2010: LS lastoffset = 11983936 / lasttime = 1276638778 (Tue Jun 15 23:52:58 2010) / inode = 26630:560640 Tue Jun 15 23:58:24 2010: the logfile grew to 12024996 Tue Jun 15 23:58:24 2010: opened logfile /var/log/logfile2010_06_15.log Tue Jun 15 23:58:24 2010: logfile /var/log/logfile2010_06_15.log (modified Tue Jun 15 23:57:59 2010 / accessed Tue Jun 15 23:53:24 2010 / inode 560640 / inode changed Tue Jun 15 23:57:59 2010) Tue Jun 15 23:58:24 2010: first relevant files: logfile2010_06_15.log Tue Jun 15 23:58:24 2010: /var/log/logfile2010_06_15.log has fingerprint 26630:560640:12024996 Tue Jun 15 23:58:24 2010: relevant files: logfile2010_06_15.log Tue Jun 15 23:58:24 2010: moving to position 11983936 in /var/log/logfile2010_06_15.log Tue Jun 15 23:58:24 2010: stopped reading at position 12024996 Tue Jun 15 23:58:24 2010: an error level of 2 is sticking at me Tue Jun 15 23:58:24 2010: stay sticky until Wed Jun 16 09:59:45 2010 Tue Jun 15 23:58:24 2010: keeping position 12024996 and time 1276639079 (Tue Jun 15 23:57:59 2010) for inode 26630:560640 in mind Wed Jun 16 00:03:24 2010: ==================== /var/log/logfile2010_06_16.log ================== Wed Jun 16 00:03:24 2010: try seekfile /var/tmp/check_logfiles_test.logfile2010_06_16.log.test instead Wed Jun 16 00:03:24 2010: no seekfile /var/tmp/check_logfiles_test._var_log_logfile2010_06_16.log.test found Wed Jun 16 00:03:24 2010: but logfile /var/log/logfile2010_06_16.log found Wed Jun 16 00:03:24 2010: ILS lastlogfile = /var/log/logfile2010_06_16.log Wed Jun 16 00:03:24 2010: ILS lastoffset = 28742 / lasttime = 1276639380 (Wed Jun 16 00:03:00 2010) / inode = 26630:560660 Wed Jun 16 00:03:24 2010: the logfile did not change Wed Jun 16 00:03:24 2010: nothing to do Wed Jun 16 00:03:24 2010: keeping 0 Wed Jun 16 00:03:24 2010: no sticky error from last run Wed Jun 16 00:03:24 2010: keeping position 28742 and time 1276639380 (Wed Jun 16 00:03:00 2010) for inode 26630:560660 in mind

    [Reply]

    lausser Reply:

    Use type rotating::uniform when current logfile and archived logfiles use the same naming.

    ...
      type => 'rotating::uniform',
      logfile => '/var/log/dummy',
      rotation => 'logfile\d{4}\-\d{2}\-\d{2}.log',
    ...

    [Reply]

    Joe Reply:

    @lausser, Hello Lausser,

    thanks for your quick reply. If I use the following conf

    @searches = ( { tag => ‘test’, type => ‘rotating::uniform’, logfile => ‘/var/log/dummy’, rotation => ‘logfile\d{4}_\d{2}_\d{2}\.log’, criticalpatterns => ‘.CRITICAL.‘, options => ‘sticky=85400,noprotocol’, } )

    I get “cannot create rotating::uniform search test”. On another server where I adjusted the confs there was no such problem. Its not an issue with permissions – I tried as root. Do you have any hints? Thanks in advance. Joe

    [Reply]

    lausser Reply:

    Maybe a syntax error. Check it with “perl “

    [Reply]

  81. Joe Says:
    June 17th, 2010 at 15:05

    Thanks for your reply, but there is no syntax error in the conf file. Do you have any other hint?

    [Reply]

  82. Gerhard P Says:
    June 24th, 2010 at 14:32

    Hallo Gerhard,

    Thanks for the Check … simply perfect!

    i have an issue with an Win2k3 Enterprise Edition 32Bit, i have there check_logfiles version 3.4.1. When checking an log file ~42MB large it shows: $state = { ‘runcount’ => 15, ‘serviceoutput’ => ”, ‘thresholdcnt’ => {}, ‘logoffset’ => 42020840, ‘privatestate’ => { ‘runcount’ => 15, ‘lastruntime’ => 1277381367, ‘logfile’ => ‘E:\\logs\\FILE.log’ }, ‘devino’ => ‘fffe320031002e00300036002e0032003000310030002000310038003a00300030003a00330035003a003a0033003800300037003a003a0049004e0046004f003a003a002a002a002a0020005300740061007200740020004f00720067005000750062006c00690073006800650072002000730063007200690070007400200020002a002a002a000d000a’, ‘runtime’ => 1277381969, ‘logtime’ => 1277136035, ‘servicestateid’ => 0, ‘tag’ => ‘FILE’, ‘logfile’ => ‘E:\\logs\\FILE.log’ };

    1;

    the ‘logoffset’ is allways 42020840… is this an OS limit? Problem is it cannot find the ERROR….

    Other checks on the same Box (logfiles are smaler) are working perfect.

    the configfile:

    $seekfilesdir=’C:\nagios'; $protocolsdir=’C:\nagios'; @searches = ( { tag => ‘FILE’, type => ‘simple’, logfile => ‘E:\logs\FILE.log’, criticalpatterns => ‘ERROR’, options => ‘protocol’, }, );

    [Reply]

    lausser Reply:

    When you add a dummy error with

    ECHO dummyERRORdummy >> E:\logs\FILE.log
    does check_logfiles end with a critical? And if logoffset is still at 42020840 (which is too small to be a problem with 32bit), do at least logtime and runtime change? You can have a deeper look at the inner workings by adding the following line:
    $tracefile = 'C:\TEMP\check_logfiles_FILE.trace';
    After you run the plugin you will find lots of details in it.

    [Reply]

  83. Gerhard P Says:
    June 24th, 2010 at 18:08

    Hallo Gerhard,

    thanks for tip with the tracefile … i reduce the file size and now it works… i need to make additional test with large files an see what i can see in the trace. But logtime and runtime are changing…

    [Reply]

    Gerhard P Reply:

    @Gerhard P, I found the Problem … encoding=ucs-2… now it works perfect.

    Thanks Gerhard

    [Reply]

  84. Louis G Says:
    June 25th, 2010 at 15:37

    Hello Gerhard,

    We use check_logfiles to check alert logs of several Oracle DBs, with one command run and one config file for each DB. I define the SID using $MACROS$ within the config file :

    $MACROS$ = { CL_ORASID => “xxxx” };

    This works fine. But the -macro option should be a better way to do this, just i have no idea how to use it. Could you provide an example ? Thanks

    [Reply]

    lausser Reply:

    You might try this approach: http://www.nagios-portal.org/wbb/index.php?page=Thread&postID=70498#post70498 Use the configuration from the first code-box (template instead of tag and $CL_TAG$ in the logfile name). Then call the plugin as usual but with –tag ORACLESID

    [Reply]

  85. IT-COW | Icinga - erweiterte Konfiguration Says:
    June 26th, 2010 at 16:52

    […] nächstes wird darauf aufbauend das PlugIn check_logfiles benötigt, auf jeden Fall für den Windows-Server – der Link enthält auch direkt eine […]

  86. Tina D Says:
    June 30th, 2010 at 16:14

    I’ve just started experimenting with check_logfiles for Windows Event Log. It’s way more powerful than what I’ve been using so far :) However I’ve run into a bit of a problem. When multiple errors are found, only one error is printed. Is there a way to print all errors? This is my config file:

    @searches = ({ tag => ‘crit’, type => ‘eventlog’, options => ‘eventlogformat=”EventID %i: (%w%) %s”,maxlength=1024,allyoucaneat’, eventlog => { eventlog => ‘application’, include => { eventtype => ‘error’, }, }, criticalpatterns => ‘.*’, })

    BR Tina

    [Reply]

    lausser Reply:

    Use the global option report by adding

    $options = 'report=long';
    to the configfile. This will output additional lines with the single hits. report=html is also possible if you like colors. But as it generates html-code this makes only sense if you look at the output in “service details”. Using it in the text of a notification mail (or sms) for example makes it possibly unreadable.

    [Reply]

  87. Louis G Says:
    June 30th, 2010 at 17:15

    Using $ENV{ORACLE_SID} as explained in

    http://www.nagios-portal.org/wbb/index.php?page=Thread&postID=70498#post70498

    permits me to solve the problem, that is to have a generic check of alert logs whatever the DB.

    Thanks.

    Louis

    [Reply]

  88. Abhinav Gupta Says:
    July 1st, 2010 at 10:45

    Is there any way growth of a log file can be checked by check_logfiles? Say, throw CRITICAL if log file is not growing.

    Thanks for help. Abhinav

    [Reply]

    lausser Reply:

    You might try to set a negative pattern.

    check_logfiles --tag nogrow --criticalpattern '!.' --logfile /var/adm/messages
    
    which means, if . was NOT found, then get critical. So it would get critical every time there were no new lines in the logfile. For example, if you configure it with “max_attempts 5, is_volatile 0, check_interval 1, retry_interval 1″, then you should get an alarm when the logfile didn’t grow during the last 5 minutes.

    [Reply]

  89. Jan Schampera Says:
    July 8th, 2010 at 10:38

    Hi!

    Is there a way to pass the logfile as parameter to check_logfiles in a way, I can scan multiple logfiles with no standardized location using a template? Specifying $CL_LOGFILE$ for logfile in the config doesn’t work to catch –logfile, obvisually.

    [Reply]

    lausser Reply:

    With the newest release try:

    ...
       template => 'test',
       logfile => '$CL_TAG$',
       criticalpatterns ....
    and then
    check_logfiles --config the_above_configfile  --tag full_pathname_of_logfile

    [Reply]

  90. Hamza Maal Says:
    July 8th, 2010 at 15:28

    Hi.

    Nice little tool. I seem to be having a problem matching singlepatterns for ORA- errors. The log always seems to say everything is all OK when I know there are errors. I am doingthe ff:

    criticalpatterns => ‘ORA-3136′

    which I can see in the log but does not pick up.

    After that how would I apply this for multiple ORA-[3136|3136] type of patterns

    [Reply]

    lausser Reply:

    check_logfiles only finds new error messages (which were added to the logfile since the plugin ran last time). In your output you find the performance data xxx_lines=0, which means: no lines were scanned at all. Either you wait until there are new ORA-messages or you manually add one with “echo ORA-3136-test >> logfile”

    criticalpatterns can be an array of patterns like in example17 of http://labs.consol.de/nagios/check_logfiles/check_logfiles-beispielecheck_logfiles-examples/

    [Reply]

    Hamza Maal Reply:

    @lausser, Hi

    OK, so I have added in an ORA code in the bottom of the log and it works fine, however how do I get it to scan through the entire log file the first time on a machine. I have already cleared out the state file in /var/tmp/clear_logs.

    [Reply]

    lausser Reply:

    The very first run is always an initialization run, where check_logfiles positions at the end-of-file (and starts reading from this position when it is run next time). Removing the seekfile only makes check_logfiles think, that it has never run before -> init again. If you want to scan the entire file during the initialization run (including error messages which are very very old), use the option allyoucaneat.

       ....
       criticalpatterns => ['ORA.....
       options => 'allyoucaneat',
       ....

    [Reply]

  91. Hamza Maal Says:
    July 9th, 2010 at 16:01

    Thanks, I have sorted this out. I have the following config file

    @searches = ({ tag => ‘testalert’, logfile => ‘/app/oracle/admin/lpbtest/bdump/alert_db.log’, criticalpatterns => [ ‘ORA\-[0-4][0-9][0-9][1-9][^\d]’,# ORA-0440 – ORA-0485 background process failure ], warningpatterns => [ ‘ORA\-06501[^\d]’, # PL/SQL internal error ‘ORA\-0*1140[^\d]’, # follows WARNING: datafile #20 was not in online backup mode ‘Archival stopped, error occurred. Will continue retrying’, ] options => ‘report=short’ });

    When I try to run I get UNKNOWN – syntax error syntax error at /export/home/nagios/.scripts/neworalog.cfg line 12, near “options”

    What am I doing wrong with the options parameter?

    [Reply]

    lausser Reply:

    There’s a comma missing.

    criticalpatterns => [....
    ],
    options => ....

    [Reply]

  92. pirx Says:
    July 15th, 2010 at 10:20

    Hi,

    I’m having a hard time to figure out how to scan a postgres log file for certain types of messages.

    2010-07-12 11:18:09 CEST WARNUNG: Relation »pg_catalog.pg_largeobject« enthält mehr als »max_fsm_pages« Seiten mit nutzbarem freiem Platz

    I tried

    ./check_logfiles –logfile=/var/log/postgresql/postgresql-8.3-main.log –type virtual –criticalpattern “FEHLER”

    CRITICAL – (7 errors in check_logfiles.protocol-2010-07-15-10-17-51) – 2010-07-14 17:05:52 CEST FEHLER: Tabelle »temporary« existiert nicht …|postgres_lines=477 postgres_warnings=0 postgres_criticals=7 postgres_unknowns=0

    But for each run a new status file in /tmp is created:

    -rw-r–r– 1 root root 571 15. Jul 10:10 check_logfiles.protocol-2010-07-15-10-10-34 -rw-r–r– 1 root root 571 15. Jul 10:10 check_logfiles.protocol-2010-07-15-10-10-35 -rw-r–r– 1 root root 571 15. Jul 10:10 check_logfiles.protocol-2010-07-15-10-10-36 -rw-r–r– 1 root root 571 15. Jul 10:11 check_logfiles.protocol-2010-07-15-10-11-05 -rw-r–r– 1 root root 571 15. Jul 10:11 check_logfiles.protocol-2010-07-15-10-11-06 -rw-r–r– 1 root root 571 15. Jul 10:17 check_logfiles.protocol-2010-07-15-10-17-51 -rw-r–r– 1 root root 1142 15. Jul 10:18 check_logfiles.protocol-2010-07-15-10-18-04

    Any ideas?

    [Reply]

    lausser Reply:

    These are protocolfiles, not statusfiles. Their purpose is to show the admin all the error messages in one file. This way you don’t have to browse through the logfile and find the single hits. If you don’t want them, use –noprotocol (or options => ‘noprotocol’ in a config file)

    [Reply]

    pirx Reply:

    @lausser,

    hm, I’m not sure if I was clear in my first message.

    If I use the check command like above, it generates a CRITICAL warning in each run.

    <

    p> ./check_logfiles –logfile=/var/log/postgresql/postgresql-8.3-main.log –type virtual –criticalpattern “FEHLER” –tag=postgres –noprotocol

    <

    p> CRITICAL – (7 errors) – 2010-07-14 17:05:52 CEST FEHLER: Tabelle »temporary« existiert nicht …|postgres_lines=479 postgres_warnings=0 postgres_criticals=7 postgres_unknowns=0

    <

    p>I thought the protocol file would log the already catched events. So my question is a.) do I use the plugin in the right way to search for the critical pattern, because the postgres log file is not in the same format as the syslog. And b.) why does it create a new protocol file with each run?

    [Reply]

    lausser Reply:

    Of course you get a CRITICAL in each run, because you use the –virtual option, which means “scan the logfile always from the beginning”. The format (postgres, syslog) doesn’t matter at all. If you leave –virtual away, you’ll get a CRTITICAL only if new lines with FEHLER appeared in the logfile since the last run.

    [Reply]

    pirx Reply:

    @lausser,

    Thank you for your very fast replys!

    Ok, virtual is the problem. But without this option no event is detected at all when searching for FEHLER or HINWEIS as critical pattern.

    2010-07-13 08:37:03 CEST HINWEIS: Anzahl der benötigten Page-Slots (2469136) überschreitet max_fsm_pages (204800)

    ./check_logfiles –logfile=/var/log/postgresql/postgresql-8.3-main.log –criticalpattern HINWEIS OK – no errors or warnings|default_lines=0 default_warnings=0 default_criticals=0 default_unknowns=0

    Thus my question if I use the command the right way to detect this kind of events in the postgres log. I’m sure I’m missing something obvious…

    [Reply]

    lausser Reply:

    check_logfiles shows _____new______ error messages which were appended since the last run. Please read the documentation.

  93. Ledskof Says:
    July 16th, 2010 at 20:20

    if (matches < 50) then error Is there a quick way to do this, or does it require a supersmartscript and a line counter?

    [Reply]

    lausser Reply:

    No. There is an option (warning|critical)threshold, but only for (matches > 50). You need a script which increments a counter on each match and a supersmartpostscript which finally looks at the sum. Something like that:

    my $counter = 0;
    @searches = ({
    ...
      options => 'script',
      script => sub {
          $counter++;
      },
    ...
    });
    $options = 'supersmartpostscript';
    $postscript = sub {
      if ($counter < 50) {
        printf "CRITICAL - under 50\n";
        return 2;
      } else {
        printf "OK - more than 50\n";
        return 0;
      }
    };

    [Reply]

  94. Gilles Says:
    July 22nd, 2010 at 13:32

    Your plugin worked perfectly, many thanks for your job . Just for tuning, with regular expression how can i consider a criticalpattern only with the combination of 2 alerts code ? For example :

    criticalpatterns => [ ‘ERROR1′ and ‘ERROR2′, # Combination of error code 1 and 2

    it’s correct ? Thanks Thanks

    [Reply]

    lausser Reply:

    No, ‘and’ is not possible here. criticalpatterns is an array (a list) of comma-separated strings which represent regular expressions. I am not sure what’s your intention. Is it ‘ERROR1 and ERROR2 in one line”? Or is it “only of both ERRORS were found during one run of check_logfiles were found, no matter if they were in one line or in different lines”. Please show me an example or describe it a bit more.

    [Reply]

    Gilles Reply:

    @lausser, ERROR1 and ERROR2 are on different lines during the check_logfile run.

    Here an example : 10/07/22 10:01:50 CFTF01E local file [/CFT/CFT_IN/TEMP/xxx] creation error 10/07/22 10:01:50 CFTT82E transfer aborted

    The “transfer aborted” error should be critical if associed with ‘creation error’ pratically at the same time

    Gilles

    [Reply]

    lausser Reply:

    That requires using handlerscripts (option supersmartscript). So after each hit, in the handler script you look at environment $CHECK_LOGFILES_SERVICEOUTPUT if it is ERROR1 and set a flag. If it is ERROR2 and the flag is set, return 2 else return 0.

    [Reply]

    Gilles Reply:

    @lausser, Many thanks for your response 8) I will test this script soon.

    [Reply]

  95. Mariano Says:
    July 28th, 2010 at 4:48

    Hi, I’m trying to setup check_logfiles in a way that if a pattern appears once it reports a Warning. If the same pattern appears more than once before the okpattern, it should report a Critical state. In both cases it should stay on that state until an okpattern is received. I could get this behaviour BUT the problem I’m facing is that the thresholdcnt is not reset every time an okpattern is found. The thresholdcnt resets only when it reaches criticalthreshold. So, in a sequence “PATTERN -> OKPATTERN -> PATTERN” I get “WARNING, OK, CRITICAL” states, when I would expect “WARNING, OK, WARNING”.

    Here’s my config:

    $MACROS = { LOGFILE => ‘test-$CL_DATE_YYYY$.$CL_DATE_MM$.$CL_DATE_DD$.log’ }; @searches = ({ tag => ‘test’, logfile => ‘$LOGFILE$’, warningpatterns => ‘jdbc’, criticalpatterns => ‘jdbc’, criticalthreshold => 2, okpatterns => ‘ok’, options => ‘sticky’, });

    Any clues? Thanks in advance.

    [Reply]

    lausser Reply:

    Use the option nosavethresholdcnt. With this setting, counting begins at 0 at every run of the plugin.

    [Reply]

  96. Mariano Says:
    July 28th, 2010 at 15:23

    Thak you Lausser for your reply. I tried with nosavethresholdcnt but the count is still saved (maybe for sticky option?). Anyway, I don’t need to reset the count at every run, but only when an okpattern is found. I need to keep the previous state (sticky?) until the recovery pattern is found. So, is there a way to make okpattern reset the counter?

    [Reply]

    lausser Reply:

    it’s nosavethresholdcount. I thought you would look it up in the documentation.

    [Reply]

    Mariano Reply:

    @lausser, OK, sorry, I didn’t notice the typo reading the docs. Maybe I’m not explaining myself right (not native speaker), but I need to keep thresholdcnt to use it with criticalthreshold. It’s just that thresholdcnt is not reset when an okpattern is found. It seems it doesn’t work that way. Thanks for your work and help.

    [Reply]

    lausser Reply:

    Yes, the thresholdcounter is not reset, when an okpattern is found. It is reset whenever the number of hits reaches the threshold. There’s the supersmartscript option which allows you to implement any kind of logic. (the config file is just another piece of perl code)

    [Reply]

  97. Benny Says:
    August 4th, 2010 at 15:44

    Hello!

    I was wondering if adding the User field to the available format string options for type evenlog would be possible?

    Almost all of my requests for eventlog checks request that the user be included in the alert (for example, the user that initiated an event 7035), and not all events logged by Windows include the username in the event text.

    It sure would be nice. :) Thank you for all your hard work on this plugin – it really is fantastic!

    Benny

    [Reply]

    lausser Reply:

    look here: http://github.com/lausser/check_logfiles and try if the %u works. If you don’t want to get it with git, just copy the file Eventlog.pm (thats where the modifications are) to your existing source tree and run make

    [Reply]

    Benny Reply:

    @lausser, Ohhh… I use the pre-compiled check_logfiles.exe binary I downloaded from this site… I don’t have a Windows build environment, sorry.

    [Reply]

    Benny Reply:

    @lausser, I don’t know if it’s possible or if you are willing/have time, but I’d be happy to test if someone could roll a beta Windows binary.

    At previous jobs, I had a Windows build environment where I could do so, but not at this job…

    Thank you!

    [Reply]

  98. DAmash Says:
    August 6th, 2010 at 8:10

    Hello, I compiled the plugin for AIX 5.3. On the local machine everything works fine. But in my Nagios Webinterface i get the error

    ” UNKNOWN – syntax error Insecure dependency in require while running setgid at /usr/local/nagios/libexec/check_logfiles line 900. “

    any idea?

    DAmash

    [Reply]

    lausser Reply:

    I think, that was answered in the nagios-portal.de Something with loading modules belonging to another group.

    [Reply]

  99. Bridgie Says:
    August 9th, 2010 at 18:09

    Hi there,

    After reading all the above posts I noticed that you a couple of your posters have a similar issue to what I am facing.

    The problem is that I need to scan a new log files from the beginning and have it pickup a search pattern defined by the criticalpattern, i.e. bypass the initialisation stage

    @searches { tag => test1 log file => ‘/tmp/testlogfile.log. criticalpatterns => [‘BAD’],

    The process of events could be as follows..

    a) Check_logfile is called, and it correctly reports that ‘ /tmp/testlogfile.log’ is not found – (now goes back to sleep for 10 mins) b) 5 mins pass and /tmp/testlogfile.log is created by a system process and the the word ‘BAD’ is written out along with some other error message text. c) 5 more minutes pass and check_logfile runs again but fails to pickup the criticalpattern ‘BAD’. d) Thereafter check_logfiles will not pick this up event.

    I noted this is default behaviour, but would like to know if there are any workarounds.

    The type => virtual will not work for in this case as it will keep re-reading the logfile from the begin each time the check runs

    I also see comment on an options => ‘allyoucaneat’ but unable to find any text on what this does

    There appears to be options also that can be changed in the statefile (set logtime to 1 etc) which appears to be simular to the virtual option?

    I’m sure there is a simple solution to the above, appreciate any help on this one.

    Thanks

    [Reply]

    lausser Reply:

    The option allyoucaneat is relevant for the very first run (in this case, when testlogfile.log was created). Normally, during the first run, check_logfiles only seeks the end-of-file, saves that position and exits. No scanning of lines is done, because you certainly don’t want to get an alarm caused by a 4-weeks-old error message when you configure a check_logfile-based service. But if you really want to read a logfile from the beginning during the very first run of check_logfiles, then use ‘allyoucaneat’.

    [Reply]

  100. Bruce Says:
    August 23rd, 2010 at 7:49

    Hi~~ I execute: usr/local/nagios/libexec/check_logfiles –tag=DDB2 –logfile=/mnt/db2/db2diag.log –criticalpattern=”Error”

    return: OK – no errors or warnings|smit_lines=0 smit_warnings=0 smit_criticals=0 smit_unknowns=0

    But fact ther was lots of error: OK – no errors or warnings|smit_lines=0 smit_warnings=0 smit_criticals=0 smit_unknowns=0 nagios:/mnt/samba$52 cat /mnt/samba/db2diag.log | grep Error | tail 2010-06-24-07.47.54.267438+480 I1579508A437 LEVEL: Error 2010-06-24-09.52.54.383083+480 I1582381A437 LEVEL: Error

    Any suggestion is welcome.

    [Reply]

    lausser Reply:

    Read carefully the previous posts. Check_logfiles does not alert on old messages when it is run for the first time. Create a new error message and run it again.

    [Reply]

  101. Bruce Says:
    August 24th, 2010 at 3:23

    How can I reset check_logfiles(Some files) as first run.

    I try to use allyoucaneat, but doc say …when no seekfile exists.

    I need to test my sub script, user logger or echo to create new test message everytime is inconvenience.

    Is there any option with check_logfiles which I can use to change the setting ?

    [Reply]

    lausser Reply:

    check_logfiles --config cfgfile --reset
    rewinds the file position pointer to the beginning of file.

    [Reply]

    Bruce Reply:

    @lausser, Thanks!

    [Reply]

  102. Taras Says:
    August 28th, 2010 at 19:04

    Hello,

    Can you consider to add some option for chech_logfiles.exe – when a date changes the file postion for the first run on the next day is resseted to the end of a file (–reset option resets a postion to zero)? Otherwise check_logfiles returns the critical state for what have been happening in the night time but we are don’t care about these events.

    Nagios is triggered on schedule (via nrpe) to send requests to a remote windows server to find a pattern in some log file, say, from 9:00 to 21:00. Here we are not interested in what was going on at night time, but check_logfiles, when started for the first run in the morning, will show critical state.

    I even wrote a batch file to determine for check_logfiles a moment of this first morning run when it starts next morning, but for some reason calling check_logfiles.exe from a batch always return “OK” and the the same time this exact call from command line window return crutical… ++++++++++++++++++++++++++ @ECHO OFF echo %1 rem ping 127.0.0.1 -n 2 -w 1000 > NUL set current_date=%date:~6,4%%date:~3,2%%date:~0,2% echo %current_date% rem Last-Modified Date file attribue @FOR %%? IN (c:\temp\flag.txt) DO ( @SET file_date=%%~t? ) set file_date=%file_date:~6,4%%file_date:~3,2%%file_date:~0,2% echo %file_date%

    IF “%file_date%”==”%current_date%” ( rem echo 1 > c:\temp\flag.txt c:\temp\check_logfiles.exe -f c:\temp\check.cfg –tag %1 ) else ( c:\temp\check_logfiles.exe -f c:\temp\check.cfg –reset c:\temp\check_logfiles.exe -f c:\temp\check.cfg –tag %1 del c:\temp\flag.txt echo 1 > c:\temp\flag.txt ) ++++++++++++++++++++++++++

    [Reply]

    lausser Reply:

    i suggest to set a notification_period which ignores errors during the night.

    [Reply]

  103. Taras Says:
    August 28th, 2010 at 21:26

    Funny thing: For this pattern: criticalpatterns => ‘fail’ (the difference is one symbol):

    11111111111111111fail CRITICAL 1111111111111111fail OK 11111111111111111fai CRITICAL 1111111111111111fail OK

    [Reply]

  104. Kazwa Says:
    August 30th, 2010 at 9:17

    Hi, thanks for this plugin.

    I have a problem. I would like to know how to config this plugin under situations below.

    The logfiles that named “catalina.YYYYMMDD.out” are rotated 5:00 am everyday. Tomcat always write a log that has latest date. So, during 0:00 am – 5:00 am, the “active” logfile name is previous day. All logfiles, active and rotated, are named “catalina.YYYYMMDD.out”.

    ex. ===20100825 0-5am=== catalina.20100824.out <- active

    ===20100825 5am- === catalina.20100825.out <- active catalina.20100824.out

    ===20100826 0-5am=== catalina.20100825.out <- active catalina.20100824.out

    ===20100826 5am- === catalina.20100826.out <- active catalina.20100825.out catalina.20100824.out

    [Reply]

    lausser Reply:

    [Reply]

    lausser Reply:

    type => 'rotating::uniform',
    logfile => '/path/where/the/logfiles/are/dummy', # used to find the files which match the rotation pattern
    rotation => 'catalina\.\d{8}\.out',
    ...

    [Reply]

    Kazwa Reply:

    @lausser,

    Perfect!! Thanks for your reply.

    [Reply]

  105. Daniel Says:
    September 2nd, 2010 at 9:25

    First of all thanks for the great software. I had to check for some error strings in a log file and when I found it it brought a smile to my face. Anyway I get everything working until I try to use rotation… If I use the following config it goes ok:

    $seekfilesdir = ‘d:\\'; $tracefile = ‘d:\\check_logfiles.trace'; @searches = ( { tag => ‘cnxfazit’, logfile => ‘d:\\Parent.FazitAdminPearl.FazitAdminPearl00.tr’, criticalpatterns => ‘PONG message missing for PING message’, okpatterns => ‘received PONG REPLY with message’, options => ‘allyoucaneat,sticky,noprotocol,maxlength=1023′ }, );

    But when I add the parameter rotation it doesn’t find anything (returns OK)

    rotation => ‘Parent\.FazitAdminPearl\.FazitAdminPearl00\.tr\.[1-9]’

    As you see the rotation is file.tr, file.tr.1, file.tr.2…

    Thhanks for the comments

    [Reply]

    Daniel Reply:

    @Daniel, don’t bother… my fault, the program works flawlessly, I was simulating the logs wrong.

    Thanks anyway

    [Reply]

  106. David Says:
    September 14th, 2010 at 23:50

    What would be the proper syntax need to match something as below and assign the strings to a variable string?

    criticalpatterns => [ ‘ANS1007E Sending of object .* failed.-.-.-.-.*’, ],

    oben wird ich gern ein variable name zuweisen zum jeden .*, danke in voraus!

    [Reply]

    lausser Reply:

    First of all, the version you use is really, really outdated. Please use 3.4.2 You have to create the tracefile manually with ‘touch /tmp/check_logfiles.trace’, then the plugin will write debug info into it. The config looks ok, i created some test logs and it worked, even the rotation detection. The tracefile will show you deep information.

    [Reply]

    lausser Reply:

    Thats not possible. The only way is to write a handler script (options => ‘script,..’). Inside this script you can repeat the pattern match, but this time with brackets around the substrings.

    [Reply]

  107. David Says:
    September 15th, 2010 at 14:24

    Danke Gerhard; Is there an example script somewhere I could use to get my brain rapped it, the (options => ‘script,..’). Danke in voraus!

    [Reply]

  108. Per Klitgaard Madsen Says:
    September 16th, 2010 at 12:22

    Hi,

    I don’t quite understand this behaviour:

    My command is like this: C:\Users\pkm\Documents\ok test\uc4 logs>check_logfiles.exe -f cl.conf OK – no errors or warnings|UC4_lines=1 UC4_warnings=0 UC4_criticals=0 UC4_unknowns=0 — so no errors are returned.

    But this is from the trace file:

    Thu Sep 16 12:18:28 2010: ==================== WPsrv_log_001_01.txt ================== Thu Sep 16 12:18:28 2010: found seekfile C:\TEMP/cl.WPsrv_log_001_01.txt.UC4 Thu Sep 16 12:18:28 2010: LS lastlogfile = WPsrv_log_001_01.txt Thu Sep 16 12:18:28 2010: LS lastoffset = 20980069 / lasttime = 1284552546 (Wed Sep 15 14:09:06 2010) / inode = 32303130303931332f3137353134332e303630202d205530303033343739204c6f6767696e6720776173206368616e6765642e0a Thu Sep 16 12:18:28 2010: found private state $VAR1 = { ‘runcount’ => 1, ‘lastruntime’ => 0, ‘logfile’ => ‘WPsrv_log_001_01.txt’ };

    Thu Sep 16 12:18:28 2010: magic: 20100913/175143.060 – U0003479 Logging was changed.

    Thu Sep 16 12:18:28 2010: the logfile grew to 20980261 Thu Sep 16 12:18:28 2010: opened logfile WPsrv_log_001_01.txt Thu Sep 16 12:18:28 2010: magic: 20100913/175143.060 – U0003479 Logging was changed.

    Thu Sep 16 12:18:28 2010: logfile WPsrv_log_001_01.txt (modified Thu Sep 16 12:18:23 2010 / accessed Wed Sep 15 14:09:06 2010 / inode 0 / inode changed Wed Sep 15 14:09:06 2010) Thu Sep 16 12:18:28 2010: first relevant files: WPsrv_log_001_01.txt Thu Sep 16 12:18:28 2010: WPsrv_log_001_01.txt has fingerprint 32303130303931332f3137353134332e303630202d205530303033343739204c6f6767696e6720776173206368616e6765642e0a:20980261 Thu Sep 16 12:18:28 2010: relevant files: WPsrv_log_001_01.txt Thu Sep 16 12:18:28 2010: moving to position 20980069 in WPsrv_log_001_01.txt Thu Sep 16 12:18:28 2010: MATCH CRITICAL abnormally with 20100915/151306.595 – U0011007 Job ‘400.J0007.DISTRIBUTION.C.ORDRER_TIL_TOP’ (RUN# ‘0010719441’ / JobPlan-RUN#: ‘0010719435’) on Host ‘OKPRD1095′ ended abnormally (return code=’0000000011′). Thu Sep 16 12:18:28 2010: skip match and the next 3 Thu Sep 16 12:18:28 2010: magic: 20100913/175143.060 – U0003479 Logging was changed.

    Thu Sep 16 12:18:28 2010: stopped reading at position 20980261 Thu Sep 16 12:18:28 2010: keeping position 20980261 and time 1284632303 (Thu Sep 16 12:18:23 2010) for inode 32303130303931332f3137353134332e303630202d205530303033343739204c6f6767696e6720776173206368616e6765642e0a in mind

    So it does match an error, but why is the response then OK?

    Regards, Per.

    [Reply]

    lausser Reply:

    I see “skip match and the next 3″ in the trace. Looks like you defined warning/criticalthresholds in your cl.conf and there are not enough matches.

    [Reply]

  109. Per Klitgaard Madsen Says:
    September 16th, 2010 at 13:26

    What can I say – you’re right. Thanks.

    [Reply]

  110. Ryan Ash Says:
    September 23rd, 2010 at 22:52

    Is there any plan to extend the .exe functionality to include the “-tag”? We would love to see this added.

    [Reply]

    lausser Reply:

    The exe has exactly the same functionality as the script-version. It is the script check_logfiles, only in a compiled version.

    [Reply]

  111. rep1 Says:
    September 24th, 2010 at 0:10

    Works great! Now I am trying to do this, if “command successful” Pattern did not occur in last 10 minutes, then generate an alert. Could you please let me know if that’s doable.

    Thanks,

    [Reply]

    lausser Reply:

    Is

    ...
    criticalpatterns => '!command successful',
    ...
    and a check_interval of 10 minutes an option?

    [Reply]

  112. Acorn Says:
    September 24th, 2010 at 4:42

    When using the config below check_logfiles reports “could not execute cscript.exe Alert.vbs” when a match is found.

    @searches = ( { tag => ‘Win2k3.log’, options => ‘scripts’, logfile => ‘Win2k3.log’, script => ‘cscript.exe’, scriptparams => ‘Alert.vbs’, criticalpatterns => [‘ERROR’,’CRITICAL’] }

    What is the correct syntax to run a VBScript on a Windows system?

    [Reply]

    lausser Reply:

    Did you set the $scriptpath variable?

    [Reply]

    Acorn Reply:

    @lausser, Sorry for the delay in replying, but yes I’ve added the following line with the same error.

    $scriptpath = ‘C:\Windows\System32;C:\Admin\Scripts’

    Note: cscript.exe is in the System32 folder and Alerts.vbs is in the Scripts folder.

    [Reply]

    lausser Reply:

    How about writing options=>”script” instead of options=>”scripts”? :-)

    And in scriptparams you must write the full path ‘C:\Admin\Scripts\Alerts.vbs’. The scriptpath is only relevant for the command after script=>..

    [Reply]

    Acorn Reply:

    @lausser, Thanks that fixed it. Entering the full path to the VBScript in scriptparams has worked.

    [Reply]

  113. Ryan Says:
    September 28th, 2010 at 6:05

    Having issues with -searches or -selectedsearches

    @searches = ( { tag => ‘Event_7024_W2K3_BASE__(SCM)The_service_service_has_stopped_error_error’, type => ‘eventlog’, eventlog => { eventlog => ‘System’, include => { eventid => ‘7024’, }, exclude => { }, }, criticalpatterns => [ ‘.*’, ], options => ‘noperfdata,preferredlevel=critical’, }, { tag => ‘_Event_7025_W2K3_BASE_(SCM)_At_least_one_driver_or_service_failed_during_startup’, type => ‘eventlog’, eventlog => { eventlog => ‘System’, include => { eventid => ‘7025’, }, exclude => { }, }, warningpatterns => [ ‘.*’, ], options => ‘noperfdata,preferredlevel=critical’, }, };

    Running E:\APPS\NSClient\scripts>check_logfiles.exe -f=NG-BASE-W2K3_plugin.cfg -searches =Event_7024_W2K3_BASE__(SCM)_The_service_service_has_stopped,_error_error UNKNOWN – configuration incomplete

    E:\APPS\NSClient\scripts>check_logfiles.exe -f=NG-BASE-W2K3_plugin.cfg OK – no errors or warnings

    As you can see it works fine without the “-searches” but with it I get an UNKNOWN Configuraiton error.

    Any ideas? Sorry for formatting

    [Reply]

    lausser Reply:

    Your tag descriptions are too complicated :-) Inside check_logfiles these strings are used for pattern matching. Your tags contain brackets which are special characters in regular expressions, so none of your tags matches. Either leave avay the brackets or call them just ‘Event_7024′ and ‘Event_7025′, then it will work. p.s. your last bracket in the config must be a round one, not a curly one.

    [Reply]

  114. Richard Says:
    September 28th, 2010 at 7:47

    Hello,

    We have an internal tool that does a similar thing to check_logfiles however one thing that we really like about it is its ability to raise alerts to NEW events that do not match the critical or warning events. Alot of log checking tools require you to ‘train’ the tool to identify false positives and critical alerts.

    What if a new event occurs in the logfile which has never been seen before and then check_logfiles is run.

    Does check_logfiles quietly ignore this new event or does it raise an alarm?

    With our tool – over time it gets trained to ignore messages that are not critical.

    How can we be alerted to events that fail to match the patterns you pass to check_logfiles?

    Thanks!

    [Reply]

    lausser Reply:

    Look at http://labs.consol.de/lang/de/blog/nagios/damit-dem-windows-team-nichts-mehr-entgeht-anwendungsbeispiel-fr-check_logfiles/ It’s in german but you might get the idea. There are three arrays:

    • criticalpatterns – events you already know and which must be reported
    • warningexceptions – events you already know and which you consider harmless
    • warningpatterns – a wildcard for everything else. These are events which you never heard about (they are new events which you didnt categorize as critical nor as harmless). They result in a warning

    [Reply]

  115. Moshe Says:
    September 29th, 2010 at 2:25

    Hi,

    Receiving an error when i run the check_logfiles plugin from Nagios or manually:

    Use of uninitialized value in gethostbyname at /usr/local/nagios/libexec/check_logfiles line 1259.

    Any idea’s why?

    Thanks in advance, Moshe

    [Reply]

  116. Moshe Says:
    September 29th, 2010 at 3:32

    Please disregard previous posting. there was a hostname error in /etc/hosts file. Thank you.

    [Reply]

  117. David Says:
    September 29th, 2010 at 13:52

    Good Morning! What is the highest number of characters I can enter into a plugin.ini description and is there a high threshold of of different conditions I can enter into 1 check_logfiles plugin.ini? I have the feeling I’m losing messages if I have to many listed in one single plugin.ini

    Thanks!

    [Reply]

    lausser Reply:

    Plugin.ini? You mean the config file? Theoretically there is no limit. And practically…i never heard of such problems. Of course you can run into a timeout if you pack too many jobs in a config. Can you describe it a bit more? How does the error look like?

    [Reply]

  118. David Says:
    September 29th, 2010 at 14:42

    Sure, I’ve been seeing the following error: d NSClient++.cpp(1143) Injected Performance Result: ” d \NSCAThread.cpp(261) Sending to server… d \NSCAThread.cpp(268) Looked up TESTSERVER e \NSCAThread.cpp(322) Failed to make command: description to long e \NSCAThread.cpp(322) Failed to make command: description to long d \NSCAThread.cpp(340) Finnished sending to server…

    I have for example this in the plugin.ini matching up to the .cfg ANS3410E_ANS5251E_ANS5271E_ANS5272E_ANS5273E_ANS5274E_ANS7650E_ANS0103E_ANS0109E_ANS0360E_ANS1752E_ANS2037W_ANS2047E_ANS2052E_ANS2202E_ANS2203E=check_logfiles -config=scripts\NG-TSM-WIN_plugin.cfg -searches=ANS3410E_ANS5251E_ANS5271E_ANS5272E_ANS5273E_ANS5274E_ANS7650E_ANS0103E_ANS0109E_ANS0360E_ANS1752E_ANS2037W_ANS2047E_ANS2052E_ANS2202E_ANS2203E

    And I was under the impression that the error was pointing me to these descriptions, TSM error codes

    [Reply]

    lausser Reply:

    Ah, ok. That’s a problem with nsclient++ and some length limitations. I think this posting matches your problem: http://nsclient.org/nscp/ticket/400

    [Reply]

  119. Moshe Says:
    September 30th, 2010 at 3:44

    1) Scan #1: scan a logfile with the plugin – OK result. 2) Scan #2: scan the same logfile with the plugin and it pattern matches and produces a Critical result. 3) Scan #3: scan the same logfile again and returns OK (no change has occurred in the logfile since #2) 4) Scan #4: scan the logfile after some change to the logfile and it returns the same error from #2 above even though the new data in the logfile does NOT pattern match the “criticalpattern” .

    I am confused. Why is the plug-in reporting the same error condition as reported in #2 above even though the additional lines in the logfile do not patten match?

    Shouldn’t the seek start the criticalpattern matching only on the “new” lines in the logfile?

    What am i missing?

    Thanks, Moshe

    [Reply]

    lausser Reply:

    What exactly happened between #1 and #2. New lines (with an error message) were appended? How?

    [Reply]

    Moshe Reply:

    @lausser,

    Thank you for your reply.

    Apologies if i was not clear.

    To test I manually added a line to the end of the logfile in Step #1.5 and Step #3.5.

    ~Moshe

    [Reply]

    lausser Reply:

    And how did you add the line? With ‘echo lilililine >> logfile’ or with an editor? Here you have to pay attention: if you edit a file with vi, it changes it’s inode number, so check_logfiles thinks the file has been rotated. Always use the echo method.

    [Reply]

    Moshe Reply:

    Yes. I used vi. Let me retest using echo. I knew it must be something i was doing wrong. Thank you, ~Moshe

    [Reply]

  120. Max Says:
    September 30th, 2010 at 9:56

    Hi,

    I’m trying to do some monitoring of the java exception log file. Basically I have the log that records all the exceptions occurred during the execution. I need to check the log file every 1 hour and if there are any new exception occur, I need to send email notification to the admin with the content is all the stack trace or at least all the exceptions. Can i do that using check_logfiles. I saw this script keeps all of the matching lines which is the one I need. But i’m not sure how can i send all of those information through email. Can you advice me with this?

    Thanks, Max

    [Reply]

  121. al_mic Says:
    October 3rd, 2010 at 15:13

    just wanted to say that in nagios, if you want both output lines (that is complete output) you will have to place $SERVICEOUTPUT$ and $LONGSERVICEOUTPUT$

    could be that $SERVICELONGOUTPUT$ exists too, but for my nagios 3.0.6 it worked with LONGSERVICEOUTPUT

    [Reply]

    Max Reply:

    @al_mic,

    Where should I place the $SERVICEOUTPUT$ AND $LONGSERVICEOUTPUT$? Is it inside the message passing to nsca_send()?

    Thank you

    [Reply]

    al_mic Reply:

    @Max, I placed $LONGSERVICEOUTPUT$ in the main nagios installation (nagios core) in the file that defines the email command

    In commands.cfg :

    define command{ command_name notify-service-by-email command_line /usr/bin/printf “%b” “To: $CONTACTEMAIL$\nSubject: ** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ * \n\n**** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$ \n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n$LONGSERVICEOUTPUT$” | /usr/sbin/sendmail $CONTACTEMAIL$ }

    [Reply]

    al_mic Reply:

    @al_mic, be careful when you copy paste at these: ” replace them with your double quotes

    [Reply]

    Max Reply:

    @al_mic,

    Thanks for your help, i finally did it. Just one thing i want to ask lausser, sometime the check_logfiles doesn’t read from the last read line of the previous checking. It just read from the beginning of the file. If nothing added to the file, it remains the state is ok ( so i guess that it keeps track of the last checked location). But if something are added to the file, sometimes it gives the newly added content, sometimes it return the result from the beginning of the file. Can you help me explain why is that and how to fix it?

    Thank you

    [Reply]

    lausser Reply:

    Normally check_logfiles just continues to read from the last position (where it stopped at the end-of-file during the last run). But it also checks if the inode of the file has changed. If this is the case, then check_logfiles ‘thinks’, the file has been moved/rotated away and a fresh logfile was created. That’s why it starts reading from the beginning. I bet you were using ‘vi’ to add some extra content. Please always do this with ‘echo new_line >> logfile’, because vi creates a copy of the file and, when you’re finished editing, moves the copy to the original location, therefore changing the inode.

    [Reply]

    Max Reply:

    @lausser,

    Yeah I used ‘vi’ to add some extra content to test it. It’s actually the java log file, and it logs every exception occurred. So can I say that check_logfiles will work correctly if the modification of that file is not done using ‘vi’ command?

    Thank you

    lausser Reply:

    If you add extra content by appending it with ‘echo content >> logfile’, it should work as expected.

  122. al_mic Says:
    October 3rd, 2010 at 20:35

    Do you know what to place at warningpatterns to get lines from a file, that have “404” on the 9th column?

    In bash I can do it like this: awk ‘{if ($9 == “404”) {print $0}}’ /var/log/httpd/web-access_log

    But how can I combine this with a config file like:

    $seekfilesdir = ‘/usr/local/nagios/tmp'; $protocolsdir = ‘/usr/local/nagios/tmp'; $scriptpath = ‘/usr/local/nagios/tmp'; $options = ‘report=long';

    @searches = ( { tag => ‘404’, logfile => ‘/var/log/httpd/web-access_log’, warningpatterns => [‘404′], warningthreshold => 1, options => ‘noperfdata,noprotocol’, } );

    Thanks!

    [Reply]

    lausser Reply:

    criticalpatterns => ['^.+?\s+.+?\s+.+?\s+.+?\s+.+?\s+.+?\s+.+?\s+.+?\s+404\s'],
    should do the trick. If you want a shorter output:
    ...
      options => 'supersmartscript',
      script => sub {
        $ENV{CHECK_LOGFILES_SERVICEOUTPUT} =~ /..the above pattern.../;
        my $url = $7;
        print $url;
        return ${ENV{CHECK_LOGFILES_SERVICESTATEID};
      },
    ...
    The script will replace the matching line with just the url (on position 7)

    [Reply]

    al_mic Reply:

    @lausser, Thank you!

    [Reply]

  123. mark Says:
    October 13th, 2010 at 19:47

    Lausser, excellent plugin. I have an issue where it sometimes seems to randomly start seeking from the beginning of a file rather than from where it left off last time. The inode number does not change, so I can’t find an explanation for why this happens.

    Here is my config file: $tracefile = ‘/home/nagios/check_logfiles/logs/tracefile'; $seekfilesdir = ‘/home/nagios/check_logfiles/logs'; $protocolsdir = ‘/home/nagios/check_logfiles/search_matches';

    @searches = ( { tag => ‘pinkfix’, type => ‘rotating::uniform’, logfile => ‘/home/pinkfix/log/dummy’, rotation => ‘pinkfix1\-CLUSTER\d{1}\-PRIMARY\-gateway\-pinkfix1\-$CL_DATE_YYYY$\-$CL_DATE_MM$\-$CL_DATE_DD$’, criticalpatterns => [‘in-memory queue is full’], }, );

    Here is a tracefile excerpt from when it unexpectedly starts at the beginning of the log file:

    Wed Oct 13 10:20:00 2010: rewrote uniform seekfile to /home/nagios/check_logfiles/logs/logdefs._home_pinkfix_log_uniformlogfile.pinkfix Wed Oct 13 10:20:00 2010: ==================== /home/pinkfix/log/dummy ================== Wed Oct 13 10:20:00 2010: the newest uniform logfile i found is /home/pinkfix/log/pinkfix1-CLUSTER1-PRIMARY-gateway-pinkfix1-2010-10-13 Wed Oct 13 10:20:00 2010: rewrote uniform seekfile to /home/nagios/check_logfiles/logs/logdefs._home_pinkfix_log_uniformlogfile.pinkfix Wed Oct 13 10:20:00 2010: found seekfile /home/nagios/check_logfiles/logs/logdefs._home_pinkfix_log_uniformlogfile.pinkfix Wed Oct 13 10:20:00 2010: LS lastlogfile = /home/pinkfix/log/pinkfix1-CLUSTER1-PRIMARY-gateway-pinkfix1-2010-10-13 Wed Oct 13 10:20:00 2010: LS lastoffset = 1276058728 / lasttime = 1286979400 (Wed Oct 13 10:16:40 2010) / inode = 15466564:60126 Wed Oct 13 10:20:00 2010: found private state $VAR1 = { ‘runcount’ => 159, ‘lastruntime’ => 1286979221, ‘logfile’ => ‘/home/pinkfix/log/pinkfix1-CLUSTER1-PRIMARY-gateway-pinkfix1-2010-10-13′ };

    Wed Oct 13 10:20:00 2010: Log file modified time: Wed Oct 13 10:20:00 2010, last modified time: Wed Oct 13 10:16:40 2010 Wed Oct 13 10:20:00 2010: Log file is not zero bytes Wed Oct 13 10:20:00 2010: Log offset: 1276058728 Wed Oct 13 10:20:00 2010: looking for rotated files in /home/pinkfix/log with pattern pinkfix1\-CLUSTER\d{1}\-PRIMARY\-gateway\-pinkfix1\-2010\-10\-13 Wed Oct 13 10:20:00 2010: archive /home/pinkfix/log/pinkfix1-CLUSTER1-PRIMARY-gateway-pinkfix1-2010-10-13 matches (modified Wed Oct 13 10:20:00 2010 / acce ssed Wed Oct 13 10:16:40 2010 / inode 60126 / inode changed Wed Oct 13 10:20:00 2010) Wed Oct 13 10:20:00 2010: archive /home/pinkfix/log/pinkfix1-CLUSTER1-PRIMARY-gateway-pinkfix1-2010-10-13 was modified after Wed Oct 13 10:16:40 2010 Wed Oct 13 10:20:00 2010: opened logfile /home/pinkfix/log/pinkfix1-CLUSTER1-PRIMARY-gateway-pinkfix1-2010-10-13 Wed Oct 13 10:20:00 2010: logfile /home/pinkfix/log/pinkfix1-CLUSTER1-PRIMARY-gateway-pinkfix1-2010-10-13 (modified Wed Oct 13 10:20:00 2010 / accessed Wed Oct 13 10:16:40 2010 / inode 60126 / inode changed Wed Oct 13 10:20:00 2010) Wed Oct 13 10:20:00 2010: first relevant files: pinkfix1-CLUSTER1-PRIMARY-gateway-pinkfix1-2010-10-13, pinkfix1-CLUSTER1-PRIMARY-gateway-pinkfix1-2010-10-1 3 Wed Oct 13 10:20:00 2010: /home/pinkfix/log/pinkfix1-CLUSTER1-PRIMARY-gateway-pinkfix1-2010-10-13 has fingerprint 15466564:60126:1299695936 Wed Oct 13 10:20:00 2010: /home/pinkfix/log/pinkfix1-CLUSTER1-PRIMARY-gateway-pinkfix1-2010-10-13 has fingerprint 15466564:60126:1299694542 Wed Oct 13 10:20:00 2010: relevant files: pinkfix1-CLUSTER1-PRIMARY-gateway-pinkfix1-2010-10-13, pinkfix1-CLUSTER1-PRIMARY-gateway-pinkfix1-2010-10-13 Wed Oct 13 10:20:00 2010: moving to position 1276058728 in /home/pinkfix/log/pinkfix1-CLUSTER1-PRIMARY-gateway-pinkfix1-2010-10-13 Wed Oct 13 10:20:01 2010: stopped reading at position 1300130954 Wed Oct 13 10:20:01 2010: moving to position 0 in /home/pinkfix/log/pinkfix1-CLUSTER1-PRIMARY-gateway-pinkfix1-2010-10-13 Wed Oct 13 10:20:58 2010: stopped reading at position 1307263803 Wed Oct 13 10:20:58 2010: keeping position 1307263803 and time 1286979658 (Wed Oct 13 10:20:58 2010) for inode 15466564:60126 in mind

    Here is a tracefile excerpt from when it starts from where it left off, as expected: Wed Oct 13 10:32:19 2010: rewrote uniform seekfile to /home/nagios/check_logfiles/logs/logdefs._home_pinkfix_log_uniformlogfile.pinkfix Wed Oct 13 10:32:19 2010: ==================== /home/pinkfix/log/dummy ================== Wed Oct 13 10:32:19 2010: the newest uniform logfile i found is /home/pinkfix/log/pinkfix1-CLUSTER1-PRIMARY-gateway-pinkfix1-2010-10-13 Wed Oct 13 10:32:19 2010: rewrote uniform seekfile to /home/nagios/check_logfiles/logs/logdefs._home_pinkfix_log_uniformlogfile.pinkfix Wed Oct 13 10:32:19 2010: found seekfile /home/nagios/check_logfiles/logs/logdefs._home_pinkfix_log_uniformlogfile.pinkfix Wed Oct 13 10:32:19 2010: LS lastlogfile = /home/pinkfix/log/pinkfix1-CLUSTER1-PRIMARY-gateway-pinkfix1-2010-10-13 Wed Oct 13 10:32:19 2010: LS lastoffset = 1368650597 / lasttime = 1286980149 (Wed Oct 13 10:29:09 2010) / inode = 15466564:60126 Wed Oct 13 10:32:19 2010: found private state $VAR1 = { ‘runcount’ => 163, ‘lastruntime’ => 1286979969, ‘logfile’ => ‘/home/pinkfix/log/pinkfix1-CLUSTER1-PRIMARY-gateway-pinkfix1-2010-10-13′ };

    Wed Oct 13 10:32:19 2010: Log file modified time: Wed Oct 13 10:32:19 2010, last modified time: Wed Oct 13 10:29:09 2010 Wed Oct 13 10:32:19 2010: Log file is not zero bytes Wed Oct 13 10:32:19 2010: Log offset: 1368650597 Wed Oct 13 10:32:19 2010: looking for rotated files in /home/pinkfix/log with pattern pinkfix1\-CLUSTER\d{1}\-PRIMARY\-gateway\-pinkfix1\-2010\-10\-13 Wed Oct 13 10:32:19 2010: archive /home/pinkfix/log/pinkfix1-CLUSTER1-PRIMARY-gateway-pinkfix1-2010-10-13 matches (modified Wed Oct 13 10:32:19 2010 / acce ssed Wed Oct 13 10:29:10 2010 / inode 60126 / inode changed Wed Oct 13 10:32:19 2010) Wed Oct 13 10:32:19 2010: archive /home/pinkfix/log/pinkfix1-CLUSTER1-PRIMARY-gateway-pinkfix1-2010-10-13 was modified after Wed Oct 13 10:29:09 2010 Wed Oct 13 10:32:19 2010: opened logfile /home/pinkfix/log/pinkfix1-CLUSTER1-PRIMARY-gateway-pinkfix1-2010-10-13 Wed Oct 13 10:32:19 2010: logfile /home/pinkfix/log/pinkfix1-CLUSTER1-PRIMARY-gateway-pinkfix1-2010-10-13 (modified Wed Oct 13 10:32:19 2010 / accessed Wed Oct 13 10:29:10 2010 / inode 60126 / inode changed Wed Oct 13 10:32:19 2010) Wed Oct 13 10:32:19 2010: first relevant files: pinkfix1-CLUSTER1-PRIMARY-gateway-pinkfix1-2010-10-13, pinkfix1-CLUSTER1-PRIMARY-gateway-pinkfix1-2010-10-1 3 Wed Oct 13 10:32:19 2010: /home/pinkfix/log/pinkfix1-CLUSTER1-PRIMARY-gateway-pinkfix1-2010-10-13 has fingerprint 15466564:60126:1392677842 Wed Oct 13 10:32:19 2010: /home/pinkfix/log/pinkfix1-CLUSTER1-PRIMARY-gateway-pinkfix1-2010-10-13 has fingerprint 15466564:60126:1392677842 Wed Oct 13 10:32:19 2010: skipping /home/pinkfix/log/pinkfix1-CLUSTER1-PRIMARY-gateway-pinkfix1-2010-10-13 (identical to /home/pinkfix/log/pinkfix1-CLUSTER 1-PRIMARY-gateway-pinkfix1-2010-10-13) Wed Oct 13 10:32:19 2010: relevant files: pinkfix1-CLUSTER1-PRIMARY-gateway-pinkfix1-2010-10-13 Wed Oct 13 10:32:19 2010: moving to position 1368650597 in /home/pinkfix/log/pinkfix1-CLUSTER1-PRIMARY-gateway-pinkfix1-2010-10-13 Wed Oct 13 10:32:20 2010: stopped reading at position 1392698445 Wed Oct 13 10:32:20 2010: keeping position 1392698445 and time 1286980340 (Wed Oct 13 10:32:20 2010) for inode 15466564:60126 in mind

    Any thoughts on why this seems to ramdomly do “moving to postion 0″ at various times?

    Thanks, mark

    [Reply]

  124. Kazwa Says:
    October 14th, 2010 at 10:01

    Hi, thank you for your answer on my previous question.

    Well, I need your help again. Occasionally, check_logfiles report CRITICAL regarding a same line more than once. I’d like to receive one time CRITICAL per one line. It occur two or three times in a week. check_logfiles never miss error lines.

    I guess the log file’s inode is changed when the some odd logging. I’d like to use /tmp/check_logfiles.trace but some coordination with an administrator is required. And it’s impossible that to know when the incidents will happen.

    My question is… Will randominode option helps to solve this case under conditions below? Can detect error line of rotated file. Not read all files. (only unread lines will be read) Not miss error lines

    If the option won’t help, do I need to compare a detected line using postscript or something like that?

    My Best Regards and Thank you. Kazwa

    ==env== target server: RHEL3 (CPU, MEM and DISK resource is enough) logfile size: about 50MB logfile encode and $LANG: EUC-JP normal check interval: 3min

    ==conf== @searches = ( { tag => ‘tomcat-maxThreads’, type => ‘rotating::uniform’, logfile => ‘/var/log/tomcat/dummy’, criticalpatterns => ‘org.apache.tomcat.util.threads.ThreadPool logFull’, rotation => ‘catalina\.\d{8}\.out’, options => ‘noprotocol’, }, );

    ==ex.== an actual log file catalina.20101004.out

    actual line 2010/10/04 1:01:47 org.apache.tomcat.util.threads.ThreadPool logFull check_logfiles report and reported time CRITICAL – (1 errors) – 2010/10/04 1:01:47 org.apache.tomcat.util.threads.ThreadPool logFull 2010/10/04 01:04:42 OK – no errors or warnings 2010/10/04 01:07:52 CRITICAL – (1 errors) – 2010/10/04 1:01:47 org.apache.tomcat.util.threads.ThreadPool logFull 2010/10/04 04:29:42 OK – no errors or warnings 2010/10/04 04:32:32

    actual line 2010/10/04 10:03:37 org.apache.tomcat.util.threads.ThreadPool logFull check_logfiles report and reported time CRITICAL – (1 errors) – 2010/10/04 10:03:37 org.apache.tomcat.util.threads.ThreadPool logFull 2010/10/04 10:05:43 OK – no errors or warnings 2010/10/04 10:08:43 CRITICAL – (2 errors) – 2010/10/04 10:03:37 org.apache.tomcat.util.threads.ThreadPool logFull 2010/10/04 12:28:04 OK – no errors or warnings 2010/10/04 12:30:44 CRITICAL – (2 errors) – 2010/10/04 10:03:37 org.apache.tomcat.util.threads.ThreadPool logFull 2010/10/04 12:34:04 OK – no errors or warnings 2010/10/04 12:36:44

    [Reply]

    Kazwa Reply:

    @Kazwa,

    Sorry, let me correct my poor english.

    wrong; “I’d like to receive one time CRITICAL per one line.”

    right; “I’d like to receive one time CRITICAL when a detected.” like, CRITICAL-(3 errors) – hogehoge newest one line. above report needed.

    [Reply]

  125. SD Bill Says:
    October 15th, 2010 at 1:54

    Hi, and thanks for a great piece of code! I’m looking for a way to use multiple critical patterns in a config file (or a command line) that will report a Critical if met. The trick is the two (or more?) must be on the same line within the log file. For example, in /var/log/messages I will report a critical if ‘Status’ and ‘Exit” both appear on the same line in the log file. thanks! bill

    [Reply]

    lausser Reply:

    Status.*Exit

    [Reply]

  126. @neo Says:
    October 18th, 2010 at 13:33

    My logs are in the following format : “logname.2010-10-18″. Tomorrow’s log = “logname.2010-10-19″

    When I run the check I get following errors:

    Use of uninitialized value in pattern match (m//) at ./check_logfiles line 1310. Use of uninitialized value in concatenation (.) or string at ./check_logfiles line 3690. Use of uninitialized value in concatenation (.) or string at ./check_logfiles line 3690. Use of uninitialized value in concatenation (.) or string at ./check_logfiles line 3690. Use of uninitialized value in concatenation (.) or string at ./check_logfiles line 3690. Etc.

    I defined the logfile in the config file as follows:

    “logname. .$CL_DATE_YYYY$-$CL_DATE_MM$-$CL_DATE_DD$’

    What do I miss here?

    Vielen dank :-)

    [Reply]

    lausser Reply:

    can you please post the complete config? do you use the newest release of check_logfiles?

    [Reply]

  127. @neo Says:
    October 18th, 2010 at 16:06

    Using 3.4.2.2 (and Linux)

    Just trying to grep the number of ‘500’ errors from my access logs. If that seems to work correctly, I will add some nagios (nrpe) check that polls every hour with -c value (e.g. more than 25 errors an hour will throw a critical). The examples in this site are quite clear about that.

    It seemed that the maxlength (in my config file) is necessary to show all my output,, e.g if I have 60 errors and maxlength=8192 it will show only 30% of the list

    I noticed that error in question was caused by the type => ‘rotating::uniform’,

    my archive logs and active log are in the same format : log.yyyy-mm-dd (24h logs)

    If I comment it out, the check works OK but than I probably still have the rotation issue that I must deal with, or does it just continue with the new log when the date changes?

    ==========

    $seekfilesdir = ‘/tmp/logcheck'; $options = ‘report=long,maxlength=32768′;

    @searches = ( { tag => ‘500’,

    type => ‘rotating::uniform’,

    logfile => ‘/logs/access_log.$CL_DATE_YYYY$-$CL_DATE_MM$-$CL_DAT E_DD$’,

    rotation => “here I tried all possibilities”

    criticalpatterns => ‘download HTTP/1.1″ 500′, options => ‘noprotocol’ }, );

    ==========

    Anyway this check_logfiles seems to be the only one on the web that has real potential !

    [Reply]

    lausser Reply:

    please configure it like this:

    type => 'rotating::uniform',
    logfile => '/logs/dummy', # this is just a hint to the directory
    rotation => 'access_log\\.\\d{4}\\-\\d{2}\\-\\d{2}',
    ...
    (instead of the double-slash you need to write a single slash. that’s because of the cms here which refuses to accept single slashes) Remember that nrpe and nagios itself have a length limitation for the output. I think even for reaching 8192 you have to patch.

    [Reply]

  128. @neo Says:
    October 19th, 2010 at 11:07

    Worked :-)

    Thanks a lot for your input, really appreciate it!

    I noticed (had a trace file) that the first time I ran the check, it also performed a check on all the old logs (including the ones that are already gzipped) that existed in the same directory.

    [Reply]

  129. burschi Says:
    October 19th, 2010 at 15:35

    Hello,

    sorry to bother you, it might be a stupid questions in the eyes of the developer, but I cannot proceed at the moment.

    I want to monitor the status of my Avira Updater, using the following command via nrpe.conf:

    command[check_avupdate]= /usr/local/nagios/libexec/check_logfiles –logfile=/var/log/avupdate.log –tag=avupdate –rotation=simple –criticalpattern=’/Update finished successfully/’

    Just to get an output I used “Update finished successfully” instead of a non existing error.

    For my understanding, I should receive alerts every two hours when avupdate will download the latest pattern and states this into the logs. But in Nagios as well as running it manually I just get the positive OK message.

    Thanks for any hints!

    [Reply]

    lausser Reply:

    --criticalpattern='Update finished successfully'

    [Reply]

    burschi Reply:

    @lausser, Pretty sure I also tried this before – anyway, now it works like a charm! Thanks!!

    [Reply]

  130. Ralph Says:
    October 21st, 2010 at 11:23

    Hello Gerhard,

    although I heard of your splendid plug-in already two years ago while attending your talk at a Nagios Conference in Nürnberg where you presented it I only yesterday started using it because so far I have been relying on the rather insufficient check_log2.pl plug-in that came in the contrib dir of the official Nagios Plug-ins. It is really fun using check_logfile because it is so versatile and excellently documented, what cannot be said of many plug-ins that loiter in the Monitoring Exchange realm.

    My first usage of check_logfiles applied to monitoring an Oracle alert log of an instance that is an HA resource of a package in a ServiceGuard cluster. Via the script tagged in the hashref within the @searches array it will churn out SMS notifications to the Oracle DBAs when it encounters one of a couple of ORA-\d+ errors that the DBAs considered as critical and provided me with in a list to look out for. Though the text messaging could also have been covered by the Nagios notification system I chose it for now that way (as I immediately lacked a more useful purpose for the script interface of check_logfiles). This works very well and to my heart’s content.

    However, when I considered that the monitored Oracle alert log is the output of an HA resource it dawned upon me that to prevent false alerts (or rather past alerts) after a package/resource failover had occured that the check_logfiles’ $seekfilesdir actually need to rest on a cluster shared volume/filesystem. Would you agree?

    Another question. I noticed that the options parameter appears as a scalar within the global section of check_logfiles’ config file and then as a key of the hasref(s) within the @search array. I gathered that both options params refer to the same set of options, either in global scope or overridingly in “local” search context. Is that assumption correct? The documentation on this page above isn’t that unambiguous here as it seems to me. While grumbling about the doc, if I dare noting, I would rather refer to Hashref than “Hash” within the explanatory table field of the $MACROS entry above.

    Gruß Ralph

    [Reply]

    lausser Reply:

    In a clustered environment, the seekfiles have to be in a shared directory. In the config file you can set it as a global parameter like $seekfilesdir=’/clusterstorage/var/tmp'; Some global options (like report=long) do not exist in the scope of a search and vice versa (allyoucaneat).

    [Reply]

  131. Moshe Says:
    October 22nd, 2010 at 2:40

    Some help would be appreciated!

    TO test I am running check_logfiles directly and not via nagios since nagios was timing out.

    check_logfiles is completing successfully but is taking up to 45 seconds to process any logfile, even one that only contains one line.

    running using a config file or from the command line has no effect.

    i have tried 3.4.2 and 3.4.2.2 with the same results.

    running as root and as an unprivileged user has no effect

    plenty of free space in /tmp and /var/tmp

    cpu utilization is not spiking during the execution of check_logfiles

    kernel is 64bit 2.6.18-194.17.1.el5

    Any help would be appreciated!

    [Reply]

    lausser Reply:

    You need to use strace for a deeper look inside. strace -f check_logfiles ….

    This will show you the system calls and may help finding the cause. If it’s the startup of the perl-interpreter, if it’s the opening of the logfile/seekfile….

    [Reply]

  132. Moshe Says:
    October 24th, 2010 at 17:48

    That did it! I was having DNS resolution problems that were causing timeouts.

    Perhaps consider having the script not require hostname resolution for a local logfile?

    Thank you for your help and work on this project!

    ~Moshe

    [Reply]

  133. Glenn Stewart Says:
    October 27th, 2010 at 7:09

    Trying to determine syntax of log rotation methods based on a log name: logYYMMDD.

    There are examples above, but none explain how to point to a log that contains a date as the current log.

    I note on http://labs.consol.de/lang/en/nagios/check_logfiles/check_logfiles-beispielecheck_logfiles-examples/ the use of “rotation => ‘kern\d{4}-\d{2}-\d{2}'” – but there is no mention of what this actually means.

    Anywhere I can reference docs on rotation methods expanded on above?

    [Reply]

    lausser Reply:

    There is no special syntax. Like criticalpatterns, logrotations are simply Perl regular expressions. logYYMMDD would simply be log\\d{6} (a simple backslash in case a double backslash appears here. That’s because of the cms)

    [Reply]

  134. neo Says:
    October 27th, 2010 at 10:54

    I am searching my logs for “500” errors

    The script works currently just fine (criticalpatterns for 500 defined in my logsdef.cfg).

    Now I want to search my logs also for “400” errors, so I am adding this also as a critical pattern in my logsdef.cfg file and the script works just fine.

    Let’s say that I have 100 x “500” errors and 200 x “400” errors. My output shows 300 (100+200) as critical. How do I split my critical (output) into 2 subcriticals, so that my output (after the check) would have critical1=100 and critical2=200 (without defining one of the two as warning pattern)?

    [Reply]

    lausser Reply:

    Only possible if you implement this logic with your own handler scripts. Add options=>’script’ and write a script which adds lines with 500 to a global array @errors500 and which adds lines with 400 to a global array @errors400. Then you need a supersmartpostscript which counts the elements in these arrays and formulates the final plugin output.

    [Reply]

  135. Andy Says:
    October 27th, 2010 at 12:48

    Hi,

    I want to use your plugin, to monitor for nic down even ts on solaris. Can you explain to me how to prevent the critical alert from clearing itself on the next run ? So I get an error, flagged as critical in the nagios console but when the check_logfiles runs a second time it clears the alert. I want the alert to remain until either a matching okpattern or the alert is manually cleared. Also given the range of hardware is it possible to have a regex which would allow me to specify multiple types of nics i.e. hme0 ce1 qfe1 etc and have this match or would i need to have a separate service for each one ??

    Thanks

    [Reply]

    lausser Reply:

    This might work:

    @searches = ({
     tag => 'hme0',
     logfile => '/var/adm/messages',
     criticalpatterns => 'hme0.*down',
     okpatterns => 'hme0.*up',
     options => 'sticky',
    }, {
     tag => 'qfe1',
    .... same with qfe1 instead of hme0...

    [Reply]

  136. Andy Says:
    October 27th, 2010 at 16:23

    thanks lausser do you have any recommdendation for the service settings ie whether flapping should be enabled or not

    [Reply]

    lausser Reply:

    I would not configure flapping-detection here. If your interfaces are really nervous and go up-down-up in intervals of a few seconds, it is already covered by check_logfiles. It runs usually every five minutes. So even if you have hundreds of up/down messages, only the first /down/ will raise the level to critical and only the last /up/ will reset to ok. Everything between (flapping like crazy) will not be taken into account by check_logfiles.

    [Reply]

  137. Andy Says:
    October 27th, 2010 at 17:26

    lausser can I use the $CL_tag$ in the criticalpattern and okpatterns ?

    [Reply]

    lausser Reply:

    Yes, you can use macros in the patterns.

    [Reply]

  138. Andy Says:
    October 27th, 2010 at 17:51

    hmmm doesn’t appear to work – here’s my config -

    tag => ‘hme0′, logfile => ‘/var/adm/messages’, rotation => ‘SOLARIS’, criticalpatterns => ‘WARNING: $CL_tag$: No response from Ethernet network’, okpatterns => ‘NOTICE: $CL_tag$: fault cleared in device; service available’, options => ‘sticky’,

    [Reply]

  139. Andy Says:
    October 27th, 2010 at 18:01

    Ah it works ok with $CL_TAG$

    [Reply]

  140. cwaters Says:
    November 1st, 2010 at 22:11

    I am having some issues getting the sticky parameter to work. Is it possible to have sticky configured in the NSC.ini as part of the command but still use the -f flag for a configfile? What we want to do is use the sticky option for normal execution but not use it to “reset” the check. So we have two NSC.ini lines, one that includes the -sticky flag and one that does not. They both point to the same config file. The problem is that the sticky flag does not seem to be working. Setting the option in the configfile won’t work either because check_logfiles will write a separate seek file if we had 2 configs (one with sticky and one without). Any ideas?

    Thanks.

    [Reply]

    lausser Reply:

    You can’t use sticky on the commandline together with a config file. Either everything as commandline parameters or everything inside a config file.

    [Reply]

  141. OMD Version 0.44 erschienen » klimmbimm Says:
    November 15th, 2010 at 17:56

    […] … ermöglicht es, rotierende und nicht rotierende logfiles nach Mustern zu durchsuchen und anhand von Schwellwerten entsprechende Exit Codes samt Text und performance Daten auszugeben. Es ist möglich Scripte noch während der Laufzeit aufzurufen u.v.m.  Mehr Infos auf http://labs.consol.de/lang/de/nagios/check_logfiles/ […]

  142. Wayne Andersen Says:
    November 18th, 2010 at 1:03

    FYI, I am getting this warning on version check_logfiles-3.4.2.2.

    Unquoted string “c” may clash with future reserved word at /usr/lib/perl5/h2ph_pre.ph line 162. Illegal character in prototype for main::_INT16_C : c at /usr/lib/perl5/_h2ph_pre.ph line 162.

    [Reply]

    lausser Reply:

    What kind of perl interpreter did you use?

    [Reply]

    Wayne Andersen Reply:

    @lausser,

    perl-5.12.2-136.fc14.i686

    perl -v This is perl 5, version 12, subversion 2 (v5.12.2) built for i386-linux-thread-multi

    [Reply]

    lausser Reply:

    Before i start the compiler…did you install it as a package? Which distro?

    [Reply]

  143. Paul Kilgour Says:
    November 24th, 2010 at 19:31

    I am trying to use the plugin to check for errors in my windows event log. I have tried to use the simplest cfg file from the example:

    @searches = ( { tag => ‘Event Log’ type => ‘eventlog’ });

    When I pass this to the executable it returns “OK – no errors or warnings” even though there are many errors in the log. It was reporting that Event Log_lines=0 so presumed this was the reason but even when I make an error go into the log, it says Event Log_lines=1 but still reports it as no errors or warnings. Do you know why this could be? I am using Windows 2008 with check_logfiles v3.4.2.2.

    Many thanks,

    Paul

    [Reply]

    lausser Reply:

    How much time did you spend reading the documentation? Less than a minute? Less than a second?

    [Reply]

    Paul Kilgour Reply:

    @lausser, :( I’m sorry, I’m not sure why it is not picking up errors. I have also tried through the command line with check_logfiles –type “eventlog:eventlog=application,include,source=Windows Update Agent,eventtype=error,eventtype=warning” but this also returns that the log is OK even though I have errors in there and it has read the lines. Any help would be appreciated.

    [Reply]

    lausser Reply:

    check_logfiles detects new lines. Maybe the counted lines were actually new (since the last run) but not of category error/warning. Run check_logfiles, manually add a dummy error (with eventcreate or similar tool) and run it again.

    [Reply]

    Paul Kilgour Reply:

    @lausser, Thanks for the reply lausser. I have always ensured that the lines it newly reads are of type error. I have been able to create errors from both using wbadmin.exe (Windows Backup) with incorrect parameters and the eventcreate tool (using “eventcreate /t error /id 1000 /l application /d “Test Error”). Both of these create errors that can be seen in the Event Viewer with Level = Error and Source = Backup or EventCreate. When running the check_logfiles executable the number of lines it reads after I manually enter errors is only equal to the number of errors I entered using the WB tool e.g. if I use eventcreate and enter 3 errors, wait a minute and then run the plugin (using “check_logfiles.exe -type eventlog), lines=0, if I use WB and enter 3 errors, wait a minute and run the plugin, lines=3. In both scenarios the result is OK – no errors or warnings.

    I have manually changed the cfg file to use a non-default seekfile directory. I have also attempted to move the executable so that it is on the C:\ partition (previously was on E:\). I have also tried various permutations for running the plugin both through the CLI and using a cfg file, all resulting in an OK result.

    Could you possibly speculate on what I am doing wrong? All I can think of is that the plugin is not recognising the lines as errors or I am filtering the results in some way.

    Thanks for your help in this matter.

    [Reply]

    lausser Reply:

    Now is see what’s wrong. The eventlog=application,include,source=… is just some pre-filter. Events matching these criteriy are treated like lines that were read from a regular logfile. Then these lines are processed by the pattern-matching routines. In order to produce criticals and warnings these routines still need the parameters criticalpatterns and/or warningpatterns. You don’h have them, so check_logfiles thinks that the found lines/events are ok. So what you need to do is

    check_logfiles --type 'eventlog:eventlog=application,include,source=Windows Update Agent,eventtype=error,eventtype=warning' --criticalpattern 'Error' --warningpattern 'Warning'
    If you want just criticals, then use only –criticalpattern ‘.*’ This looks like double work and, yes, in this case it is. But maybe there is a scenario, where you want to get an alert on WUA event, which is classified as Success in the eventlog. Then you can still pre-filter on source=WUA and match the characteristic text of this event in a critical/warningpattern. So the more specific your eventlog-pre-filter is, the less specific you have to formulate the regular expressions. If you pre-filter on source=,eventtype=,eventid=….. then you can use ‘.*’ as critical/warning-regexp. If you filter only the source= or use no pre-filter at all, then your regular expressions must be very specific. Sorry for being rude. When i saw your minimalistic config file, i thought ‘wtf’
  144. Bill Cattell Says:
    December 3rd, 2010 at 18:53

    When running the command;

    C:\Bin>check_logfiles -f c:\bin\check_logfiles.cfg

    I get the output;

    CRITICAL – (1 errors in check_logfiles.protocol-2010-12-03-11-37-38) – cannot write status file C:\t emp/check_logfiles.C__DG_AppLogs_MPOrderEntryService_orderentryservice.log.ERROR:! check your filesy stem (permissions/usage/integrity) and disk devices |ERROR:_lines=0 ERROR:_warnings=0 ERROR:_critica ls=1 ERROR:_unknowns=0

    My guess is that the ‘c:\temp/’ is the reason I can’t write the file. I just can’t find where the ‘/’ is coming from.

    I’m obviously missing something. Any pointers or suggestions will be appreciated.

    [Reply]

    lausser Reply:

    Make sure you can create files in the C:\temp directory.

    [Reply]

    Bill Cattell Reply:

    @lausser,

    The protocol files get created fine. I have been able to create new text files as well.

    BTW – Great thanks for this plug-in.

    [Reply]

    Bill Cattell Reply:

    @Bill Cattell,

    Ok, access rights have been worked out.

    Now i’m back to getting Nagios to run the check.

    thank you for your input.

    [Reply]

  145. Glenn Stewart Says:
    December 13th, 2010 at 4:41

    Just recently compiled latest version to use report=long.

    Error noted:

    Use of uninitialized value in concatenation (.) or string at /opt/nagios/libexec/check_logfiles line 1542, line 1. Use of uninitialized value in sprintf at /opt/nagios/libexec/check_logfiles line 1543, line 1. Use of uninitialized value in sprintf at /opt/nagios/libexec/check_logfiles line 1544, line 1. Can’t exec “/bin/”: Permission denied at /opt/nagios/libexec/check_logfiles line 1617, line 1. Use of uninitialized value in sprintf at /opt/nagios/libexec/check_logfiles line 3009. CRITICAL – (1 errors, 1 warnings) – the fourth critical line |non 1_lines=0 non 1_warnings=0 non 1_criticals=0 non 1_unknowns=0 non 2_lines=0 non 2_warnings=0 non 2_criticals=0 non 2_unknowns=0 non 3_lines=1 non 3_warnings=1 non 3_criticals=1 non 3_unknowns=0 tag non 3 PATTERN MATCH TEST ANOTHER PATTERN MATCH could not execute

    I am finding it difficult to debug the cause of “sprintf” and “could not execute” errors.

    [Reply]

    lausser Reply:

    Can’t exec “/bin/” means your config is wrong.

    [Reply]

    Glenn Stewart Reply:

    @lausser,

    Thanks. But have had to simplify completely and although errors reduced – they’re still there.

    The config is now this simple: @searches = ( { tag => ‘/logs/filename.log’, logfile => ‘/logs/filename.log’, criticalpatterns => [ ‘ERROR’ ], options => ‘nologfilenocry,noprotocol,script,perfdata’, }, );

    Error is: Use of uninitialized value in concatenation (.) or string at /opt/nagios/libexec/check_logfiles line 1542, line 4. Use of uninitialized value in sprintf at /opt/nagios/libexec/check_logfiles line 1543, line 4. Use of uninitialized value in sprintf at /opt/nagios/libexec/check_logfiles line 1544, line 4. Can’t exec “/bin/”: Permission denied at /opt/nagios/libexec/check_logfiles line 1617, line 4. Use of uninitialized value in sprintf at /opt/nagios/libexec/check_logfiles line 3009.

    Furthermore, main aim is also to have report=long in config. But this breaks it further.

    I understand best to include $options = ‘report=long'; as first line prior to @searches and to ensure one of the latter versions of check_logfiles.

    Is this correct?

    [Reply]

  146. Max Says:
    December 13th, 2010 at 11:42

    Hi lausser, Can I use this plugin for the server with no nagios installed? I’m trying to monitor the remote server from the monitoring server installed with Nagios and everything. I installed check_logfiles in remote server (but no nagios), and i run the script. It gives me the error: “CRITICAL – (1 errors) – cannot write status file /usr/local/nagios/var/tmp/check_logfiles._…! check your filesystem (permissions/usage/integrity) and disk devices”

    The /usr/local/nagios/var is installed together with nagios, so i’m wondering that I could use this one without nagios? I don’t want to install nagios from both server.

    Thank you very much. Max

    [Reply]

    lausser Reply:

    mkdir -p /usr/local/nagios/var/tmp
    The plugin wasn’t built with default parameters, otherwise it would try to write in /var/tmp/check_logfiles. Also look at $seekfilesdir in your config file.

    [Reply]

  147. vash27 Says:
    December 20th, 2010 at 8:16

    Hi lausser, This is nice plugin very helpful. I wonder if you have the check_logfiles plugin in rpm packet?.

    [Reply]

    lausser Reply:

    I don’t package my plugins myself (to be honest, i my knowledge in generating rpms is very limited). But if you google for check_logfiles+rpm you’ll find several of them.

    [Reply]

    vash27 Reply:

    @lausser, thanks for the information, i found it at http://packages.sw.be/check_logfiles/

    Thank very much :)

    [Reply]

  148. 17 Nagios-Fliegen mit einer Klappe: OMD 0.44 | KenntWas.de - Technische Tips Says:
    December 22nd, 2010 at 0:07

    […] check_logfiles aus den Labs der Fa. Consol dient zur Überprüfung und Analyse von Logdateien (Überprüfung auf bestimmte Muster). I.d.R. wird check_logfiles über check_nrpe auf den zu monitorenden Host aufgerufen. Die Doku ist sehr gut, aber die Kommandozeilenparameter, Variablen und Konfigurationsdateien für check_logfiles erfordern  einiges an Einarbeitungszeit. […]

  149. Infin1ty Says:
    January 11th, 2011 at 11:37

    Hello, been using check_logfiles for a while, noticed a weird thing today as i have not received any notification on an exception.

    This is my config file: $seekfilesdir = ‘/tmp'; $protocolsdir = ‘/tmp'; $options = “report=long,maxlength=4096″;

    @searches = ( { tag => ‘check1′, logfile => ‘/opt/checks/check1/logs/logs.err’, warningpatterns => ‘Exception’, criticalpatterns => ‘Exception’, options => ‘noprotocol,perfdata,warningthreshold=1,criticalthreshold=5′ }, { tag => ‘check2′, logfile => ‘/opt/checks/check2/logs/$CL_DATE_YYYY$$CL_DATE_MM$$CL_DATE_DD$.stderrout.log’, warningpatterns => ‘Exception’, criticalpatterns => ‘Exception’, options => ‘noprotocol,perfdata,warningthreshold=1,criticalthreshold=5′ }, );

    all works well for check1, for check2 it simply won’t tell me there’s a regexp in that file even though it does access the file correctly.

    [Reply]

    lausser Reply:

    How do you see it accesses the file correctly? You have a tracefile /tmp/check_logfiles.trace? Anyway, you should modify check2 configuration. If the current logfile changes it’s name on a regular basis in a way that current logfile and archives use the same naming scheme, please configure it as rotating::uniform like this:

      { tag => 'check2',
        type => 'rotating::uniform',
        logfile => '/opt/checks/check2/logs/dummy',
        rotation => '\d{8}\.stderrout\.log',
        #logfile => '/opt/checks/check2/logs/$CL_DATE_YYYY$$CL_DATE_MM$$CL_DATE_DD$.stderrout.log',
        warningpatterns => 'Exception', 
        criticalpatterns => 'Exception', 
        options => 'noprotocol,perfdata,warningthreshold=1,criticalthreshold=5'
      },
    check_logfiles then looks in the check2/logs directory (the logfile key is used as a hint where to look for files, that’s why there is a dummy filename), selects all files that match the rotation-pattern and takes that with the most fresh modification date as the current logfile. Using the $CL_DATE_macros in the logfile might lead to missed error mesages a few minutes around midnight, rotating::uniform does not.

    [Reply]

  150. Dave Says:
    January 19th, 2011 at 15:21

    Moin moin; wir notzen einem anwendung wo die hersteller hat die logfile pfad etwas geaendert, so das log ist auf 2 stellen aber ich grep fur die gleichen string daten und wird gern nur ein regeln nutzen; ist es moglich mit so ein entrag? oder aenliches? logfile => ‘C:\\PROGRA~1\\TrippLite\\PowerAlert\\data\\paelog.txt’,’C:\\PROGRA~1\\TrippL~1\\NEU_PFAD\\data\\NEU-LOG.txt’

    Danke in voraus! Dave

    [Reply]

  151. Acorn Says:
    January 21st, 2011 at 1:52

    Can the protocol filename be changed via the config file ?

    [Reply]

    lausser Reply:

    No, only the directory where the protocols are written to.

    $protocolsdir = '/opt/my/protocols';

    [Reply]

  152. Claudiu Says:
    January 24th, 2011 at 9:49

    Hi,

    On windows platforms I am getting “The process cannot access the file because it is being used by another process.” for services scheduled on the same time. The issue manifests very randomly. I have tested also with the last version (perl version with Strawberry and exe one too). Do you have any resolution for this? Thank you!

    [Reply]

  153. Dan Wittenberg Says:
    January 24th, 2011 at 18:46

    One issue I have found is some machines that event logs are several years old, or have many old errors like the one I’m looking for generates tons of alerts when I first add check_logfiles. Is there any way to say on first run just create the protocol file to basically say “start from now” so I dno’t get tons of old false positives on first installation?

    [Reply]

    lausser Reply:

    please throw away year old logfiles. checklogfiles does not look into the past when it is run for the first time. what you describe is impossible.

    [Reply]

    Dan Wittenberg Reply:

    @lausser, Kind of what I thought, and unfortunately policy dictates I can’t through the logs away. What is odd too, I have noticed that they keep alerting at the same time every night. That should only happen if the seek files get trashed on the client right?

    [Reply]

    lausser Reply:

    When you delete the seekfile, check_logfiles takes the logfile, positions at the end-of-file and saves that position to a new seekfile. Actually no lines are scanned. Except if you have set the option allyoucaneat. In this case, when check_logfiles finds no seekfile, it starts scanning from the beginning of file until it reaches end-of-file, then saves that position. You will get all error messages in te logfile. But if you see the strange behavior every night at the same time, then error-hunting is fun. Create a file named /tmp/check_logfiles.trace let’s say 10 minutes before the critical moment (so that check_logfiles runs at least one time correctly with this tracefile). As long as this file exists, check_logfiles will write a log of debug information to it. Move it away 10 minutes after you got the unexpected behavior. Then examine the trace, if seekfile is found, the position is correct, ..

    [Reply]

  154. Ryan Ash Says:
    January 24th, 2011 at 23:54

    We use the following windows variable to specify directories that may be different on W2k3 vs w2k8. lets call it %ourdir% On w2k3 it is e:\mydir on w2k8 it is d:\mydir. We typically use thsi in our alerting so a single template works. I am finding that it will not work when that variable is used in the seek or protocol file definition. Can you address this in a future upgrade?

    Ty sir

    [Reply]

    lausser Reply:

    environment variables can be used. read the manual. if you want something to be included in future releases, you have to pay.

    [Reply]

  155. Dan Wittenberg Says:
    January 26th, 2011 at 0:14

    I’m trying to have check_logfiles watch for an event in the eventlog, and if it happens more than say 5 times in 1 hour then send an alert, otherwise I don’t care. Is this something I can set: savethresholdcount sticky=60 criticalthreshold=5

    bad thing is, there is no ‘ok’ pattern, it’s just picking up if a certain error has been logged in the event log. Am I thinking of this correctly?

    [Reply]

    Dan Wittenberg Reply:

    @Dan Wittenberg, Ok after some testing, I see the critical and warning thresholds work as expected, except there doesn I was hoping the sticky option would do that, but guess it doesn’t, and from the above posts the ‘okpattern’ won’t do it either. So I guess maybe a new param is needed like criticalthresholdtime and warningthresholdtime to say how long to go before resetting that counter for that particular check? Or have I missed something?

    [Reply]

    lausser Reply:

    The sticky-option takes a parameter, so the status returns to OK after a certain period of time, even if there is no okpattern.

    options=’sticky=3600′ makes sure that a critical/warning sticks for a maximum of an hour, then all is ok again.

    [Reply]

    Dan Wittenberg Reply:

    @lausser, Which is what I was testing before, so it says it will keep alerting for an hour, which isn’t really what I need in the end. It needs to track the time at which it found something in the log, and see if those hits are within the hour. So I don’t care that it sends alerts for an hour, I only want it to alert if I get more than the 5 alerts in one hour. I’m thinking now as a short-term hack I’ll have to manually manipulate the protocol files with state info, and then see if can add a new option to track time in the state file.

    From reading posts above, it seems like a general need is any way to reset the critical/warningthreshold back to 0, either from time constraint, or # matches.

    [Reply]

    lausser Reply:

    The short answer is: it is not possible with check_logfiles. This tool was never intended to handle such a requirement. You might look at the ‘scripts’ option if you want to implement your own logic or you might look over at http://www.consol.de and request a quotation for an individual implementation for this case.

    [Reply]

  156. centosboy Says:
    January 26th, 2011 at 10:51

    Hi, i am using this script. All is well when i run this command from the nagios client host (im using nrpe)

    bash-3.2$ ./check_logfiles -f config OK – no errors or warnings|UniversalConnectionPoolException_lines=0 UniversalConnectionPoolException_warnings=0 UniversalConnectionPoolException_criticals=0 UniversalConnectionPoolException_unknowns=0

    I am running as the nagios user and all is fine.

    However, when i run this from the nagios server… i get.. UNKNOWN – can not load configuration file config [root@nagios02 libexec]# ./check_nrpe -n -H prod-d-acqnode-10 -c check_log UNKNOWN – can not load configuration file config.

    I suspect it could be the illegal chars in the config file? Config looks like this:

    $seekfilesdir = ‘/usr/local/nagios'; $protocolsdir = ‘/usr/local/nagios';

    @searches = ( { tag => ‘UniversalConnectionPoolException’, logfile => ‘/opt/apps/sb-prod-acquire01/logs/sb-prod-acquire01.catalina.$CL_DATE_YYYY$-$CL_DATE_MM$-$CL_DATE_DD$.log’, criticalpatterns => [‘UniversalConnectionPoolException’], criticalthreshold => 4, }, );

    Someone help me pls… :) I would be very grateful :)

    [Reply]

  157. centosboy Says:
    January 26th, 2011 at 11:06

    OK..ignore my above comments. I have figured it out. In my nrpe.cfg file i need to specify the FULL path to the config file for check_logfile.

    Thanks anyway

    [Reply]

  158. Marco P Says:
    January 31st, 2011 at 18:04

    Hi,

    i try to install check_Oracle_health and check_logfile and have the same problem. The folder is not created in libexec dir. The last lines of the output are: configure: creating ./config.status config.status: creating Makefile config.status: creating plugins-scripts/Makefile config.status: creating plugins-scripts/subst config.status: creating t/Makefile –with-perl: /usr/bin/perl –with-gzip: /bin/gzip –with-seekfiles-dir: /var/tmp/check_logfiles –with-protocols-dir: /tmp –with-trusted-path: /bin:/sbin:/usr/bin:/usr/sbin –with-nagios-user: nagios –with-nagios-group: nagcmd Looks good, or, I can´t see any error but: ls -l /usr/local/nagios/libexec/check_lo* -rwxr-xr-x 1 nagios nagcmd 146423 25. Jan 11:25 /usr/local/nagios/libexec/check_load -rwxr-xr-x 1 nagios nagcmd 6020 25. Jan 11:25 /usr/local/nagios/libexec/check_log

    Do i have to copy something manual ? Thanks from Unterhaching :-) Marco

    [Reply]

    lausser Reply:

    Did you ever hear from the make-command? make; make install

    [Reply]

  159. Marco P Says:
    February 1st, 2011 at 16:29

    Thanks, that´s it. Sorry for the stupid question :-( but as a newbie i followed the installtion guide above. and there ist no make command. Kinde regards and “nothing for ungood” [bavarian english] Marco

    [Reply]

  160. Felipe Says:
    February 1st, 2011 at 16:33

    Hi lausser, all…

    may send a nagios alert when the counter of the lines is ‘0 ‘? (ie if the service has been suspended or is in zombie state).

    OUT_lines=7208 OUT_warnings=0 OUT_criticals=0 OUT_unknowns=0 —> good

    OUT_lines=0 —> send nagios alert.

    thanks

    [Reply]

  161. Felipe Says:
    February 1st, 2011 at 16:33

    Hi lausser, all…

    may send a nagios alert when the counter of the lines is ‘0 ‘? (ie if the service has been suspended or is in zombie state).

    OUT_lines=7208 OUT_warnings=0 OUT_criticals=0 OUT_unknowns=0 —> good

    OUT_lines=0 —> send nagios alert.

    thanks for your help.

    [Reply]

    lausser Reply:

    Hi, look at http://www.nagios-portal.org/wbb/index.php?page=Thread&postID=105490#post105490 There you find a config file which lets check_logfiles meaure the throughput (lines per second or bytes per second) of your logging. In the comments at the beginning of the file you see the thresholds. At the end you see the calculation of the exitcodes depending on the throughput. It should be easy to set $crit_rate to zero and reverse the logic at the end, so you get an alert when there’s zero lines/bytes per second.

    [Reply]

  162. Daniel Says:
    February 1st, 2011 at 19:34

    Good afternoon,

    I’m using check_logfiles under Windows Server 2003 and I’d like to add a simple script in the config file (using script => sub { … } )

    Is perl the only supported language? If so, do I need to install an interpreter or does check_logfiles interpret perl scripts by itself?

    Thanks

    [Reply]

    lausser Reply:

    Good evening, the config file itself is just a piece of perl code which is read by check_logfiles with a require-statement. So you can add as many perl code as you like.

    [Reply]

    Daniel Reply:

    @lausser,

    Vielen Dank, I’ve noticed that, while inside a postscript sub, using “printf $ENV{CHECK_LOGFILES_TAG};” doesn’t return the search tag while for example “printf $ENV{CHECK_LOGFILES_SERVICEOUTPUT};” does return the expected string (that was just for testing if I was doing it right). The search tag is defined in the search as “tag => ‘testtag’ and the desired search is successfuly executed so I must assume that the tag is passed correctly, unfortunately I don’t seem to be able to retrieve the tag from within a script. Grüsse,

    [Reply]

    lausser Reply:

    There’s no tag in a postscript. A tag is a descriptor which exists only within a search. As you can have many searches which are run independently, the postscript is a global piece of code. So a tag just makes no sense in the postscript.

    [Reply]

    Daniel Reply:

    @lausser,

    Maybe there’s a better solution for my problem then: what I’m trying to accomplish is to reformat the output string if I find some particular pattern in the log (i’m using criticalpatterns for that). I want that reformatted output to be sticky till the next time the pattern is found. I’ve tried: 1) reformatting in a script and printing in the postscript. problem: when the pattern isn’t found the output is shown without the formatting since the script wasn’t executed. 2) using ‘.*’ as pattern. problem: then the output isn’t sticky since all lines are critical. Any Idea? Thanks a lot

    [Reply]

    Daniel Reply:

    @Daniel,

    Solved! there were several searches involved, some simple ones, some which needed a script to extract info and one which needed info from several lines in order to build a single result. Getting info from your previous posts plus a supersmartscript plus a postcript I managed to get the results I wanted. Thanks.

  163. Daniel Wittenberg Says:
    February 4th, 2011 at 1:19

    Alerting issue I’m trying to find a good solution for. I have one big .cfg file for all my Windows event log searches. If one of them fails, I get the standard ‘1 bad error’ back for the check in the nsc.ini that called it. I also send an nsca message back for that particular problem so I can easily route it to whomever needs the issue. Now problem is when something then goes from bad to ok, there is not always an ‘okpattern’ to reset things, so as far as nagios is concerned it’s still broke and I have to manually submit a fake result to clear it. I was thinking of a NEB that could set it when I see a “all ok” message for the entire check, but that’s kind of an ugly hack. Anyone else run into this? I could break the .cfg into a bunch of smaller parts, but that could get to be a management issue.

    Thanks! Dan

    [Reply]

    lausser Reply:

    An okpattern is not necessary to reset a critical condition. You can also set a timer:

    options => 'sticky=3600',
    The critical will be automatically reset after an hour.

    [Reply]

    Daniel Wittenberg Reply:

    @lausser, I’m not sure that’s the same thing though. That will force the service check to send an OK message through NSCA back to nagios? The problem is that the OK messages don’t go back to the server, because I have like 50-100 checks in one .cfg file, and only 1 message goes back says “logs ok”…not “tag X ok” “tag Y ok”, etc… make sense?

    [Reply]

  164. Richard Says:
    February 4th, 2011 at 15:39

    Hi,

    I have problems using the $CL_PROTOCOLFILE$-Macro with the Windows Binary.

    I want to use it so I can send the results via E-Mail to myself. Is there anything wrong with my syntax?

    Tracelog shows me the following:

    Fri Feb 4 14:32:34 2011: execute C:\scripts\check_logfiles\blat.exe $CL_PROTOCOLFILE$ …..

    And if I want to send the log after the check_log has been running I assume I have to use a postscript ? Is it possible to execute the script only if there are any lines found?

    mail.cfg:

    $seekfilesdir=”c:\\temp”; $protocolsdir=’C:\\scripts\\check_logfiles\\logs'; $scriptpath=’C:\\scripts\\check_logfiles'; $tracefile =’C:\\scripts\\trace.log';

    @searches = ( { tag => ‘relayalerts’, logfile => ‘C:\scripts\RECV$CL_DATE_YYYY$$CL_DATE_MM$$CL_DATE_DD$-1.LOG’,

    criticalpatterns => ‘Unable to relay’, options => ‘allyoucaneat,protocol,script’,

    script => ‘blat.exe’, scriptparams => ‘$CL_PROTOCOLFILE$ -server xxxxxxx -port 25 -to xxxxx -f xxx@xxx’,

    });

    [Reply]

    lausser Reply:

    The protocolfile is a global thing, it exists once for the whole run of check_logfiles, not for every single search (remember, you can configure multiple searches with an individual tag). So if you want to postprocess the protocolfile, like sending it with mail, this needs to be done in a postscript. As i understand your example, the blat.exe takes the first argument as a file which contents are to be used as a mail’s body? If you only want to send mail, if something noteworthy happened, check the CHECK_LOGFILES_SERVICESTATEID environment variable. If it is 0, then either no lines were scanned at all or none of the critical/warningpatterns was found. So you should move the script to a postscript like this:

    ...
    $options = 'supersmartpostscript';
    @searches = ( {
      tag => ‘relayalerts’, 
      type => 'rotating::uniform',
      logfile => ‘C:\scripts\dummy',
      rotation => 'RECV\d{8}-1\.LOG',
      criticalpatterns => ‘Unable to relay’, 
      options => ‘allyoucaneat,protocol’,
    });
    $postscript => sub {
      if ($ENV{CHECK_LOGFILES_SERVICESTATEID}) {
        system('blat.exe', $ENV{CHECK_LOGFILES_PROTOCOLFILE}, 
            '-server',  xxxxxxx,  '-port',  25,  '-to',  xxxxx, '-f',  xxx@xxx);
      }
        print $ENV{CHECK_LOGFILES_SERVICEOUTPUT};
        return $ENV{CHECK_LOGFILES_SERVICESTATEID};
    };
    

    [Reply]

  165. lausser Says:
    February 4th, 2011 at 17:16

    p.s. use the search function to find out from the posts above, what i mean with rotating::uniform

    [Reply]

  166. Dmitriy Ilyin Says:
    February 7th, 2011 at 21:35

    Hi, Can this plugin be used to find n occurances of a pattern within t time? Thank you.

    [Reply]

    lausser Reply:

    Not out of the box. With handler scripts this might be implemented.

    [Reply]

  167. Matt Hawkins Says:
    February 8th, 2011 at 18:18

    Lausser,

    We have several Nagios check_logfiles services that have a time_periods defined for them. We only have active checks defined for these services between 9:30 – 14:30. The problem that we are having is that at 9:30, check_logfiles scans the entire log file because there is a seekfile from the previous day. We are looking for a way to have check_logfiles only scan lines in the log that are created after 9:30. The logs that we are monitoring get rotated or overwritten around midnight.

    Any suggestions would be appreciated.

    [Reply]

    lausser Reply:

    Define a $prescript which deletes the seekfile if the current time is outside the 9:30-14:30 range.

    [Reply]

  168. Bill Says:
    February 9th, 2011 at 21:55

    Lausser,

    I used check_by_ssh to check logfiles on remote linux host. I used config file to check one of mysql server errorlog, it worked fine if I have logfile=>’/tmp/mysql.log’ defined in config file. If I remove logfile from config file, put it in command line, I will get the following errors:

    /usr/bin/ssh -o StrictHostKeyChecking=no -i xxxx -l root remotehost /usr/local/nagios/libexec/check_logfiles –tag mysql_logcheck –config /usr/local/nagios/etc/objects/logchk -report=long –logfile /tmp/mysql.log

    Use of uninitialized value in pattern match (m//) at /usr/local/nagios/libexec/check_logfiles line 1300. fileparse(): need a valid pathname at /usr/local/nagios/libexec/check_logfiles line 2224

    [Reply]

    lausser Reply:

    You cannot mix –config and other parameters. Either everything in a configfile or everything on the commandline.

    [Reply]

  169. Alex Ehrlich Says:
    February 13th, 2011 at 15:23

    Could you please add a rotation algorithm LOGLOG1LOG2GZ? It is the way Ubuntu (at least 10.04.2 LTS) logrotate is configured by default, so maybe also to add a synonym UBUNTU for it (like DEBIAN)?

    [Reply]

  170. Mirza Dedic Says:
    February 16th, 2011 at 21:41

    Hello,

    I am using check_logfiles to verify some of my database logs, and currently the output comes into my email such as this:

    Additional Info: CRITICAL – (1 errors in usr1_oppy3.protocol-2011-02-16-11-22-31) – SYSTEM ERROR

    My check command in Nagios consists of:

    check_logfiles -t 30 -noprotocol -f /home/nagios/log_conf/usr1_oppy4.lg

    What I want to do is, get ride of the “(1 errors in usr1_oppy3.protocol-2011-02-16-11-22-31)” text, but it still comes through even after the -noprotocol on the command line…

    Any idea why?

    I am using check_by_ssh on the Nagios box to locally execute check_logfiles on the remote host.

    [Reply]

    Mirza Dedic Reply:

    @Mirza Dedic, Remote host is AIX 5.3, Nagios host is Ubuntu 10.04 LTS.

    [Reply]

    lausser Reply:

    You cannot mix command line parameters and a config file. You have to move the –noprotocol into the config file as options=>’noprotocol’

    [Reply]

  171. Patrick Says:
    February 18th, 2011 at 21:22

    Hi Lausser, Love the plugin! I have a question. I am feeding the output of a search to a script as an arg, but this script does not like unescaped chars as input. Is there a recommended approach to escape the output of the search before it is sent to the script?

    [Reply]

    lausser Reply:

    So i assume you feed every matching line to a script with script=’your postprosessing script’. WHat you can do to manipulate the output is to use perl-based handler-scripts.

      options = 'script',
      script =&gt; sub {
        my $matching_line =$ENV{CHECK_LOGFILES_SERVICEOUTPUT};
        # do whatever you want with the matching_line, especially 
        # escaping and modifying single characters
        my $cmd = sprintf 'your_script %s', $modified_matching_line:
        system($cmd);
      },

    [Reply]

    Patrick Reply:

    @lausser, Sorry – didn’t see your response before posting my simplified question – I will try this out, it looks like what I may need to do. Thanks so much!

    [Reply]

    Patrick Reply:

    @Patrick, It worked! Thanks a lot.

    Patrick

    [Reply]

    Patrick Reply:

    @Patrick, I guess an easier way to explain is – is there a way to escape the content of $CL_SERVICEOUTPUT$ before I pass it to the script using scriptparams?

    [Reply]

  172. Mendes Says:
    February 22nd, 2011 at 18:47

    Hello Lausser,

    Would you please help me? My log file rotates as: logfilename.log => logfilename.log.old

    What rotation parameters (rotation and type) shall I use?

    Many thanks in advance!

    [Reply]

    lausser Reply:

    logfile => '...../path/path/logfilename.log',
    rotation => 'logfilename\.log\.old',
    ...
    
    that’s all.

    [Reply]

  173. cgnatzy Says:
    February 24th, 2011 at 16:23

    Hello Mr. Lausser,

    im some of my seekfiles there is a line with ‘logfile’, but not in all. What is the reason for this ?

    [Reply]

    lausser Reply:

    There is one seekfile per search (elements in the @searches-array of all your configs). Maybe you have some windows-eventlog-searches. They have no logfile, this would make no sense. Only if you define a search which scans a ‘real’ file, then there’s a logfile entry in the corresponding seekfile.

    [Reply]

  174. Patrick Says:
    March 2nd, 2011 at 19:56

    Hi Luasser, I am trying to figure out why when I get a notification email from Nagios the line from the log file that is found by my search is not included in the email message. Here is a sample search: @searches = ( { tag => ‘LISTENER’, logfile => ‘Listener.log’, rotation => ‘Listener-([0-9]{8}).log’, archivedir => ‘/’, criticalpatterns => [‘####’,’FATAL’,’Exception’,’Error occurred during initialization of VM’,’Could not create the Java virtual machine’,’Could not reserve enough space for object heap’ ], options => ‘nocase’ },

    Does this look like something that I need to do in my cfg file for check_logfiles, or something I need to look for somewhere else?

    Thanks!

    [Reply]

    Patrick Reply:

    @Patrick, I’m sure I need to look somewhere else in my nagios configuration – when I run the check_logfiles command directly, it returns the ServiceOutput.

    Thanks!

    [Reply]

  175. Nair Says:
    March 3rd, 2011 at 14:10

    I am using below definition to report Critical if there are 3 patttern matches.

    { tag => ‘TimeOut’, logfile => ‘/var/log/messages’, rotation => ‘loglog0log1′, criticalpatterns => [‘TEST: .* TimeOut’], criticalthreshold => 3, options => ‘noprotocol,perfdata,nosavethresholdcount’, },

    ./check_logfiles -f checklog.cfg –tag=TimeOut

    CRITICAL – (1 errors) – Mar 3 08:26:34 TESTHOST07 Server: TEST: ‘test message here’ Timeout |TimeOut_lines=4 TimeOut_warnings=0 TimeOut_criticals=1 TimeOut_unknowns=0

    Instead of showing no of errors in the plugin output, can i display no. of matches.

    say for e.g: CROTICAL – (4 matches)

    Thank you Nair

    [Reply]

    lausser Reply:

    You could implement this with a supersmartpostscript.

    [Reply]

  176. Robert Mitchell Says:
    March 4th, 2011 at 2:40

    Hi Lausser,

    I’m having problems with the foreach loop. I have populated an array with a list of files that do exist, but I only search the last file in the list. Can you help?

    config file: use File::Spec::Unix

    $tracefile = ‘/tmp/check_celery.trace'; $seekfilesdir = ‘/localfs/nagios'; $protocolsdir = ‘/localfs/nagios'; $scriptpath = ‘/services/nagios/check_logfiles/bin:/home/rmitchell/celery';

    $check_cmd=”/services/nagios/libexec/check_logfiles”; $path=”/localfs/celery/”; $check_cfg=”/tmp/check_celery.cfg”;

    opendir(DIR, $path) or die “can not open $path\n”; @files = grep { /\.log$/ } readdir(DIR); closedir(DIR);

    print “@files\n”;

    foreach (@files) { $logfile = “$path.$_”; }

    @searches = ({ tag => ‘celery_log’, logfile => $logfile, rotation => ‘bmwhpux’, criticalpatterns => [‘ERROR’, ‘error’], #warningpatterns => [‘OVERTEMP_CRIT’, ‘Corrected ECC Error’], options => ‘nologfilenocry, noprotocol’, });

    output:

    [rmitchell@app1 celery]$ sudo strace -o strace.out /services/nagios/libexec/check_logfiles -f check_celery.cfg celeryd-quick-stdio-vm15.log celeryd-celery-stdio-vm7.log celeryd-quick-stdio-vm13.log celeryd-quick-vm10.log celeryd-quick-vm8.log celeryd-celery-vm8.log celerybeat-vm9.log celeryd-mass-import-vm4.log celeryd-celery-stdio-vm5.log celeryd-celery-stdio-vm8.log <>

    OK – no errors or warnings|celery_log_lines=0 celery_log_warnings=0 celery_log_criticals=0 celery_log_unknowns=0

    trace: 1 Thu Mar 3 19:35:46 2011: ==================== /localfs/celery/.celeryd-mass-import-vm8.log ================== 2 Thu Mar 3 19:35:46 2011: try pre2seekfile /localfs/nagios/check_celery..celeryd-mass-import-vm8.log.celery_log instead 3 Thu Mar 3 19:35:46 2011: try pre3seekfile /tmp/check_celery.localfs_celery.celeryd-mass-import-vm8.log.celery_log instead 4 Thu Mar 3 19:35:46 2011: no seekfile /localfs/nagios/check_celery.localfs_celery.celeryd-mass-import-vm8.log.celery_log found 5 Thu Mar 3 19:35:46 2011: and no logfile found 6 Thu Mar 3 19:35:46 2011: ILS lastlogfile = /localfs/celery/.celeryd-mass-import-vm8.log 7 Thu Mar 3 19:35:46 2011: ILS lastoffset = 0 / lasttime = 0 (Wed Dec 31 19:00:00 1969) / inode = 0:0 8 Thu Mar 3 19:35:46 2011: there is no logfile /localfs/celery/.celeryd-mass-import-vm8.log at this moment 9 Thu Mar 3 19:35:46 2011: Log offset: 0 10 Thu Mar 3 19:35:46 2011: looking for rotated files in /localfs/celery with pattern OLD.celeryd-mass-import-vm8.log|.celeryd-mass-import-vm8.log\.[A-Z][0-9]+_[0-9] +\.gz$ 11 Thu Mar 3 19:35:46 2011: although a logfile rotation was detected, no archived files were found 12 Thu Mar 3 19:35:46 2011: stat (/localfs/celery/.celeryd-mass-import-vm8.log) failed, try access instead 13 Thu Mar 3 19:35:46 2011: could not find logfile /localfs/celery/.celeryd-mass-import-vm8.log, but that’s ok 14 Thu Mar 3 19:35:46 2011: first relevant files: 15 Thu Mar 3 19:35:46 2011: relevant files: 16 Thu Mar 3 19:35:46 2011: nothing to do 17 Thu Mar 3 19:35:46 2011: keeping position 0 and time 0 (Wed Dec 31 19:00:00 1969) for inode 0:0 in mind ~

    Fantastic plugin!

    Thank you, Robert Mitchell

    [Reply]

  177. Robert Mitchell Says:
    March 4th, 2011 at 2:44

    sorry, that last post had no valid log file. please delete

    [Reply]

  178. Robert Mitchell Says:
    March 4th, 2011 at 2:54

    Hello lausser,

    Having an issue with the foreach loop only processing the last file in the array.

    Config: 1 use File::Spec::Unix 2 3 $tracefile = ‘/tmp/check_celery.trace'; 4 $seekfilesdir = ‘/localfs/nagios'; 5 $protocolsdir = ‘/localfs/nagios'; 6 $scriptpath = ‘/services/nagios/check_logfiles/bin:/home/rmitchell/celery'; 7 8 $check_cmd=”/services/nagios/libexec/check_logfiles”; 9 $path=”/localfs/celery/”; 10 $check_cfg=”/tmp/check_celery.cfg”; 11 12 opendir(DIR, $path) or die “can not open $path\n”; 13 @files = grep { /\.log$/ } readdir(DIR); 14 closedir(DIR); 15 16 foreach (@files) { 17 $logfile = “$path$_”; 18 } 19 20 @searches = ({ 21 tag => ‘celery_log’, 22 logfile => $logfile, 23 rotation => ‘bmwhpux’, 24 criticalpatterns => [‘ERROR’, ‘error’], 25 #warningpatterns => [‘OVERTEMP_CRIT’, ‘Corrected ECC Error’], 26 options => ‘nologfilenocry, noprotocol’, 27 }); 28

    trace: 1 Thu Mar 3 19:50:01 2011: ==================== /localfs/celery/celeryd-mass-import-vm8.log ================== 2 Thu Mar 3 19:50:01 2011: found seekfile /localfs/nagios/check_celery._localfs_celery_cele ryd-mass-import-vm8.log.celery_log 3 Thu Mar 3 19:50:01 2011: LS lastlogfile = /localfs/celery/celeryd-mass-import-vm8.log 4 Thu Mar 3 19:50:01 2011: LS lastoffset = 845364 / lasttime = 1299192206 (Thu Mar 3 17:43:26 2011) / inode = 64768:24842043 5 Thu Mar 3 19:50:01 2011: found private state $VAR1 = { 6 ‘runcount’ => 1, 7 ‘lastruntime’ => 0, 8 ‘logfile’ => ‘/localfs/celery/celeryd-mass-import-vm8.log’ 9 }; 10 11 Thu Mar 3 19:50:01 2011: Log file has the same modified time: Thu Mar 3 17:43:26 2011 12 Thu Mar 3 19:50:01 2011: Log offset: 845364 13 Thu Mar 3 19:50:01 2011: nothing to do 14 Thu Mar 3 19:50:01 2011: keeping position 845364 and time 1299192206 (Thu Mar 3 17:43:26 2011) for inode 64768:24842043 in mind

    Sorry about the previous posts.

    Thanks, Bobby

    [Reply]

  179. Robert Mitchell Says:
    March 4th, 2011 at 4:22

    lausser,

    I got this script working. It reads a list of log files from a single directory.

    Thank you, Bobby

    opendir(DIR, $path) or die “can not open $path\n”; @files = grep { /\.log$/ } readdir(DIR); closedir(DIR);

    $tracefile = ‘/tmp/check_celery.trace'; $seekfilesdir = ‘/localfs/nagios'; $protocolsdir = ‘/localfs/nagios'; $scriptpath = ‘/services/nagios/check_logfiles/bin:/home/rmitchell/celery';

    $check_cmd=”/services/nagios/libexec/check_logfiles”; $check_cfg=”/tmp/check_celery.cfg”;

    foreach (@files) { $logfile = “$path$” if -f “$path$“; my %hash = ( logfile => $logfile, tag => celery_log, rotation => ‘bmwhpux’, criticalpatterns => [‘ERROR’, ‘error’], options => ‘nologfilenocry, noprotocol’, ); push @search, \%hash; }

    @searches = @search;

    [Reply]

  180. Holger Says:
    March 4th, 2011 at 12:24

    Hi,

    again a question about stickyness and how long nagios would show a service failed.

    Consider the following example: check_logfiles is used to determine whether we have 3 failed root login attempts w/o clearing “root login okay”. This could be well done with warningpatterns, warningthreshold and okpatterns.

    The service raises a warning in nagios as expected. Next time check_logfiles runs service is still okay. Well, this is what the sticky option is designed for. You keep the warning in nagios and ack (meaning I’m working on it).

    Once you have investigated the whole thign and you see everything is normal you would want to get the service back to OK state. First option is to manually inject the okpattern into the logfile. Problem: what the heck was the corresonding ok pattern in logfile? Second option: make check_logfiles forget about the stickyness of that service.

    For the latter option I would implement an additional arg for check_logfiles (like –resetstickyness) which in turn would call a “sub resetstickyness”. This does something very similiar as “sub rewind”: load state, copy to new state and manipulate $self->{newstate}->{laststicked,servicestateid,serviceoutput}. Advantage is that check_logfiles knows best about which seekfile to look at and how it is organized.

    From nagios host the resetstickyness function would be called via check_nrpe (and a corresponding special service definition in nrpe.conf)

    Would you treat this a valid approach? Would you think about adding this to mainline (once we proof it’s usefullness)?

    As aside note: reality is a bit more complex which comes from interaction from nagios (with states OK, Non-OK, ACKed) and an external application (with states OK, Non-OK, ACKed, CLOSEd)….

    [Reply]

  181. Joe Says:
    March 8th, 2011 at 15:05

    Hi Gerhard, I have a problem in a special log constellation. I need to evaluate a log file where a process periodically (every few minutes) logs some internal metrics (e.g. current numbers of messages in a JMS queue). So check_logfiles may find several new entries for the same metric since the last run or even none depedning on when and at what frequency check_logfiles is called. I solved the “many entries” part by collecting the lines in the script part and evaluating them in a supersmart postscript so only the last entries for a given metric will decide whether an error condition is given or not. However if there are no new entries this approach would simply state that everything is ok as the result of the previous run of check_logfiles is not known. Obviously this behavior is not correct and the script should return the state of the previous run. Is there any better way to solve this than to persist the state of the current run at the end of the postscript part and read the state of the last run in a prescript or the beginning of the postscript part? Thanks in advance Joe

    [Reply]

  182. mqjia Says:
    March 10th, 2011 at 6:16

    Use of uninitialized value in subroutine entry at /usr/local/nagios/libexec/check_logfiles line 1146. Bad arg length for Socket::inet_ntoa, length is 0, should be 4 at /usr/local/nagios/libexec/check_logfiles line 1146.

    [Reply]

  183. Uwe Says:
    March 10th, 2011 at 13:05

    Hello lausser,

    I don’t get any further.

    I am just trying to run a vbs postscript after the logfile check is done. But instead I’ll get some error messages in syslog. Here is my current cfg: $scriptpath = ‘C:\WINDOWS\system32′; $seekfilesdir = ‘C:\\Program Files\\NSClient++\\plugins-scripts\\seek'; $protocolsdir = ‘C:\\Program Files\\NSClient++\\plugins-scripts\\protocol';

    @searches = ( { tag => “system”, logfile => ‘C:\\mylogfile.log’, criticalpatterns => [ “SCOPE” ], warningpatterns => [ “CODA” ], options => ‘script,protocol,nocount,nocase,allyoucaneat’, }, ); $postscript => ‘cscript.exe'; $postscriptparams => ‘C:\Program Files\NSClient++\plugins-scripts\parse_logfiles.vbs NAGCORE 5667 Hostname “Passiv Checks” 1′;

    The params should later be replaced by some macros, but for now I just want to get the check run.

    This is my error message in syslog: Mar 10 11:51:14 NAGCORE nsca[2895]: SERVICE CHECK -> Host Name: ‘Hostname’, Service Description: ‘Passiv Checks’, Return Code: ‘0’, Output: ‘Use of uninitialized value $script in concatenation (.) or string at script/check_logfiles line 1441, line 4.#015#012Use of uninitialized value $script in sprintf at script/check_logfiles line 1442, line 4.#015#012Use of uninitialized value $script in sprintf at script/check_logfiles line 1443, line 4.#015#012’C:\WINDOWS\system32\’ is not recognized as an internal or external command,#015#012operable program or batch file.#015#012Use of uninitialized value $script in concatenation (.) or string at script/check_logf’ Mar 10 11:51:14 NAGCORE nsca[2895]: End of connection…

    [Reply]

    lausser Reply:

    Be careful with postscript and postscriptparams. These are normal Perl-variables. Don’t use ‘=>’ here.

    $postscript = 'cscript.exe';
    $postscriptparams = 'C:.....';
    You see the difference? With $seekfilesdir you used ‘=’, which is correct.

    [Reply]

  184. meilon Says:
    March 10th, 2011 at 15:44

    Hallo!

    Besten Dank für das Plugin, nutze es bereits für Linux und Windows Eventlogs.

    Ich möchte jetzt auch die Logfiles unseres BlackBerry Enterprise Servers überwachen, habe aber noch nicht kapiert, wie ich das mit den Rotations hinbekommen soll.

    Die Ordnerstruktur sieht wie folgt aus: ……\Logs\\____.txt

    Im Konkreten kann eine Datei also so lauten: 05BESRV191_ALRT_01_20110310_0001.txt

    Wie müsste die Konfiguration dafür aussehen?

    [Reply]

    lausser Reply:

    @searches = ({
      tag => 'blackberry',
      type => 'rotating::uniform',
      logfile => '....\Logs\dummy.txt',
      rotation => '05BESRV191_ALRT_\d{2}_\d{8}_\d{4}\.txt',
      criticalpatterns => ......
    });
    Falls sich dieses 05BESRV191 auch ändert, muss man auch an der Stelle mit \d o.ä. arbeiten.

    [Reply]

  185. Uwe Says:
    March 10th, 2011 at 16:35

    Hello again,

    thanks for the info but that does not solve my problem. As far as I can see it is the same error msg.:

    SERVICE CHECK -> Host Name: ‘Hostname’, Service Description: ‘Passiv Checks’, Return Code: ‘0’, Output: ‘Use of uninitialized value $script in concatenation (.) or string at script/check_logfiles line 1441, line 4.#015#012Use of uninitialized value $script in sprintf at script/check_logfiles line 1442, line 4.#015#012Use of uninitialized value $script in sprintf at script/check_logfiles line 1443, line 4.#015#012’C:\WINDOWS\system32\’ is not recognized as an internal or external command,#015#012operable program or batch file.#015#012Use of uninitialized value $script in concatenation (.) or string at script/check_logf’

    [Reply]

  186. Uwe Says:
    March 14th, 2011 at 15:43

    Hi,

    look excellent your log monitor. postgres uses a log rotation which may not be included in your script. You can define a pattern for the logfile name e.g. log_filename = ‘postgresql-%Y-%m-%d_%H%M%S.log’.

    postgres then creates a log file name which is then different every time called.

    The rotation method mod_log_rotate looked most promising, but I could not get it running.

    Do you have an idea how to use your monitor for this purpose?

    alternativly I can switch logging to syslog.

    Best Regards, Uwe

    [Reply]

    lausser Reply:

    When you have a logfile which continuously changes its name (no distinction between current logfile and archived logfiles by different naming scheme), you need type rotating::uniform. In your case the correct syntax would be:

    tag => 'pg',
    type => 'rotating::uniform',
    logfile => '/var/log/postgres/dummy',
    rotation => 'postgresql\-d{4}\-\d{2}\-\d{2}_\d{2}\d{2}\d{2}\.log',
    ...
    logfile in this case is a dummy file, its purpose is to define the directory where all the logfiles will be written to. (replace /var/log/postgres with your own log directory if necessary) The ‘dummy’ will be cut off. Then, in the directory check_logfiles will look at all files which match the rotation-pattern. Among these files it takes the one with the most current modification timestamp as the current logfile.

    [Reply]

    Uwe Reply:

    @lausser, excellent. thanks. Uwe

    [Reply]

    Uwe Reply:

    @Uwe, Hi Gerhard,

    I have to come back to you. I’m calling the script as shown below. The Execution according to the trace file looks correct but the script returns an unknown.

    For this execution I deleted all working file in /var/tmp/check_logfiles/.

    db24:~ # /usr/lib/nagios/plugins/check_logfiles –logfile=/var/lib/pgsql/data/pg_log/dummy –type=’rotating::uniform’ –rotation=’postgresql-\d{4}-\d{2}-\d{2}_\d{2}\d{2}\d{2}\.log’ –criticalpattern=”LOG: temporary file” UNKNOWN – (1 unknown in check_logfiles.protocol-2011-03-15-08-47-59) – could not find logfile /var/lib/pgsql/data/pg_log/dummy |default_lines=1549927 default_warnings=0 default_criticals=0 default_unknowns=1 db24:~ # echo $? 3

    the trace output is below: Tue Mar 15 08:47:59 2011: archive /var/lib/pgsql/data/pg_log/postgresql-2011-03-07_000000.log matches (modified Mon Mar 7 23:58:08 2011 / accessed Thu Mar 10 11:29:08 2011 / inode 159613004 / inode changed Fri Mar 11 15:34:37 2011) Tue Mar 15 08:47:59 2011: archive /var/lib/pgsql/data/pg_log/postgresql-2011-03-07_000000.log was modified after Thu Jan 1 01:00:00 1970 Tue Mar 15 08:47:59 2011: stat (/var/lib/pgsql/data/pg_log/dummy) failed, try access instead Tue Mar 15 08:47:59 2011: could not find logfile /var/lib/pgsql/data/pg_log/dummy Tue Mar 15 08:47:59 2011: first relevant files: postgresql-2011-03-07_000000.log, postgresql-2011-03-08_000000.log, postgresql-2011-03-09_000000.log, postgresql-2011-03-10_123822.log, postgresql-2011-03-11_000000.log, postgresql-2011-03-12_000000.log, postgresql-2011-03-13_000000.log, postgresql-2011-03-14_000000.log, postgresql-2011-03-14_115840.log, postgresql-2011-03-15_000000.log Tue Mar 15 08:47:59 2011: /var/lib/pgsql/data/pg_log/postgresql-2011-03-15_000000.log has fingerprint 64769:159613093:0 Tue Mar 15 08:47:59 2011: /var/lib/pgsql/data/pg_log/postgresql-2011-03-14_115840.log has fingerprint 64769:159613091:20245019 Tue Mar 15 08:47:59 2011: /var/lib/pgsql/data/pg_log/postgresql-2011-03-14_000000.log has fingerprint 64769:159613092:9728954 Tue Mar 15 08:47:59 2011: /var/lib/pgsql/data/pg_log/postgresql-2011-03-13_000000.log has fingerprint 64769:159613088:21196224 Tue Mar 15 08:47:59 2011: /var/lib/pgsql/data/pg_log/postgresql-2011-03-12_000000.log has fingerprint 64769:159613090:21196224 Tue Mar 15 08:47:59 2011: /var/lib/pgsql/data/pg_log/postgresql-2011-03-11_000000.log has fingerprint 64769:159613089:20327172 Tue Mar 15 08:47:59 2011: /var/lib/pgsql/data/pg_log/postgresql-2011-03-10_123822.log has fingerprint 64769:159613087:31060975 Tue Mar 15 08:47:59 2011: /var/lib/pgsql/data/pg_log/postgresql-2011-03-09_000000.log has fingerprint 64769:159613002:28760577 Tue Mar 15 08:47:59 2011: /var/lib/pgsql/data/pg_log/postgresql-2011-03-08_000000.log has fingerprint 64769:159612970:37402833 Tue Mar 15 08:47:59 2011: /var/lib/pgsql/data/pg_log/postgresql-2011-03-07_000000.log has fingerprint 64769:159613004:241078354 Tue Mar 15 08:47:59 2011: relevant files: postgresql-2011-03-07_000000.log, postgresql-2011-03-08_000000.log, postgresql-2011-03-09_000000.log, postgresql-2011-03-10_123822.log, postgresql-2011-03-11_000000.log, postgresql-2011-03-12_000000.log, postgresql-2011-03-13_000000.log, postgresql-2011-03-14_000000.log, postgresql-2011-03-14_115840.log, postgresql-2011-03-15_000000.log Tue Mar 15 08:47:59 2011: moving to position 0 in /var/lib/pgsql/data/pg_log/postgresql-2011-03-07_000000.log Tue Mar 15 08:48:07 2011: stopped reading at position 241078354 Tue Mar 15 08:48:07 2011: moving to position 0 in /var/lib/pgsql/data/pg_log/postgresql-2011-03-08_000000.log Tue Mar 15 08:48:10 2011: stopped reading at position 37402833 Tue Mar 15 08:48:10 2011: moving to position 0 in /var/lib/pgsql/data/pg_log/postgresql-2011-03-09_000000.log Tue Mar 15 08:48:12 2011: stopped reading at position 28760577 Tue Mar 15 08:48:12 2011: moving to position 0 in /var/lib/pgsql/data/pg_log/postgresql-2011-03-10_123822.log Tue Mar 15 08:48:15 2011: stopped reading at position 31060975 Tue Mar 15 08:48:15 2011: moving to position 0 in /var/lib/pgsql/data/pg_log/postgresql-2011-03-11_000000.log Tue Mar 15 08:48:17 2011: stopped reading at position 20327172 Tue Mar 15 08:48:17 2011: moving to position 0 in /var/lib/pgsql/data/pg_log/postgresql-2011-03-12_000000.log Tue Mar 15 08:48:20 2011: stopped reading at position 21196224 Tue Mar 15 08:48:20 2011: moving to position 0 in /var/lib/pgsql/data/pg_log/postgresql-2011-03-13_000000.log Tue Mar 15 08:48:22 2011: stopped reading at position 21196224 Tue Mar 15 08:48:22 2011: moving to position 0 in /var/lib/pgsql/data/pg_log/postgresql-2011-03-14_000000.log Tue Mar 15 08:48:23 2011: stopped reading at position 9728954 Tue Mar 15 08:48:23 2011: moving to position 0 in /var/lib/pgsql/data/pg_log/postgresql-2011-03-14_115840.log Tue Mar 15 08:48:24 2011: stopped reading at position 20245019 Tue Mar 15 08:48:24 2011: moving to position 0 in /var/lib/pgsql/data/pg_log/postgresql-2011-03-15_000000.log Tue Mar 15 08:48:24 2011: stopped reading at position 0 Tue Mar 15 08:48:24 2011: rotated logfiles examined but no current logfile found Tue Mar 15 08:48:24 2011: keeping position 0 and time 1300175304 (Tue Mar 15 08:48:24 2011) for inode 64769:159613093 in mind

    [Reply]

    lausser Reply:

    @Uwe, I just saw, rotation::uniform does not work with the commandline mode. (i never used it like that, but it’s supposed to work). I’ll have a look at it this afternoon

    Try it with a config file in the meantime.

    [Reply]

    lausser Reply:

    I found it. It will be corrected in the next release. Meanwhile you can edit plugins-scripts/check_logfiles.pl

    if (my $cl = Nagios::CheckLogfiles->new({
          cfgfile => $commandline{config} ? $commandline{config} : undef,
          searches => [
              map {
                if (exists $commandline{type} && $commandline{type} eq 'rotating::uniform') {
                  $_->{type} = $commandline{type};
                } elsif (exists $commandline{type}) {
                  # "eventlog" or "eventlog:eventlog=application.......
    You see it? There’s an extra if-statement where you check for rotating::uniform

    [Reply]

  187. Rahul Says:
    March 14th, 2011 at 20:32

    Hi,

    I am using this for sometime now and it’s really helpful to me. Currently I am trying to use option “!”. I want to raise an alert if some pattern is not found in logfile. What I want is to alert if there is no entry in my logfile like below:

    Transaction Completed

    This should occur everyday between 3PM to 4PM else raise an alert.

    Here is my configuration:

    { tag => ‘Trans_notFound’, logfile => ‘/var/tmp/transaction_log4j.log’, criticalpatterns => [ ‘!Transaction Completed’, ], options => ‘noprotocol,nologfilenocry,sticky=900′, },

    [Reply]

    lausser Reply:

    Set the check_period of your service to 04:05-04:10 and set the sticky-time to 28800, then you have a critical until lunchtime, if the message was not found.

    [Reply]

  188. Rahul Says:
    March 15th, 2011 at 16:52

    Thanks Lausser!

    I did that and instead of increasing sticky-time, I removed it completely but still no luck. Here is what happens, after 4:10 , the next check gets scheduled at 4:05 next day. And the service remains in ‘Hard’ state.

    I have marked this service as ‘Volatile’. Max check attempts, Normal check interval,Retry check interval all are set to 1. Notification interval is ‘300’. What I want is the service to turn back to ‘Ok’ state after 4:10.

    Thanks, Rahul

    [Reply]

  189. Niggo Says:
    March 16th, 2011 at 13:42

    Hi,

    gibt es vielleicht irgendwo eine Vorlage für warning- und criticalpatterns für die messages, das syslog und ähnliches? Wäre ganz praktisch, wenn man nicht erst in die Fehler rein laufen muss, um zu erkennen, dass sie vorhanden sind.

    Grüße

    [Reply]

    Rahul Reply:

    @Niggo,

    Grüße. I am not good at German but could interpret you message through Google Translate. Thanks for your suggestion. My application sends some data to an external system and once completed put this line into a specific log file. nothing else is available for scrapping except the given log file. let me see if there is anything else I could do.

    Thanks, Rahul.

    [Reply]

    Niggo Reply:

    @Rahul,

    Thank you for your fast answer, but I’m looking for some default configurations for the messages or syslog files, to get kernel warnings, SMART errors, ssh failures, filesystem errors,………

    [Reply]

    Sandeep Reply:

    @Niggo, no default. you need to know what you monitor then create patterns for each scenario. want to monitor ssh? create config with pattern “ssh”.

    [Reply]

    Niggo Reply:

    @Sandeep, sure, but there are so many possible errors and I don’t want to try all to know the string they put into the logfile. I thought, that someone has a list with patterns or the error strings.

    [Reply]

  190. Dave Says:
    March 16th, 2011 at 15:35

    I wanted to start monitoring the nsclient.log for special errors, I seem to only get ASCII junk back, is there anything special around code page or something else I need to set in my checklogfile.cfg to correct this? Thanks.

    [Reply]

    lausser Reply:

    Try

    ....
        options = 'encoding=ucs-2,....

    [Reply]

  191. Dave Says:
    March 16th, 2011 at 15:45

    Thanks I answered my own question:> options => ‘encoding=ucs-2′

    [Reply]

  192. Pee Tee Says:
    March 24th, 2011 at 12:59

    Hallo, erst mal ein dickes Lob für das Plugin. Es ist wirklich toll!!! Wir überwachen ein Logfile, das täglich neu erzeugt wird und der Name sich nach dem aktuellen Datum richtet. Allerdings wird das File, wenn kein Fehler vorliegt, erst abends erzeugt. Die Überwachung soll aber regelmäßig durchgeführt werden. So kann es sein, dass das aktuelle Logfile nicht vorhanden ist und die “Rotation” durchsucht wird. Beim ersten Check eines jeden Tages gibt es ja noch kein entsprechendes Seek-File (da Filename Datum enthält), womit alle Rotation Files erneut durchsucht werden und veraltete Fehlermeldungen erzeugt werden. Habe versucht das mit der Option noallyoucaneat zu unterbinden. Das hat hierauf aber keine Auswirkung. Meine Frage dazu: Ist es möglich, den Namen des Seek-Files zu beeinflußen, um die Fehler, die in alten Logfiles stehen, zu unterbinden? Hier noch die Konfiguration dazu: $seekfilesdir = ‘/tmp'; $protocolsdir = ‘/tmp';

    @searches = ({ tag => ‘FileNet_PE_elogyyyymmdd’, logfile => ‘elog$CL_DATE_YYYY$$CL_DATE_MM$$CL_DATE_DD$’, archivedir => ‘elogs’, rotation => ‘elog\d{8}’, criticalpatterns => [ ‘VW: process exiting with signal number 15 received.’ ], criticalthreshold => ’15’, options => ‘nosavethresholdcount,nologfilenocry’, });

    Danke schonmal!

    [Reply]

    lausser Reply:

    Nimm rotating::uniform, wenn eine Applikation kontinuierlich neue Logfiles schreibt, die anhand des Namenformats nicht von den archivierte, wegrotierten Files zu unterscheiden sind (und aktuelles sowie alte Logfiles in einem Verzeichnis liegen).

    ....
    tag => 'FileNet_PE_elogyyyymmdd',
    type => 'rotating::uniform',
    logfile => '/pfad/zum/verzeichnis/dummy',
    rotation => 'elog\d{8}',
    ....
    Das würde reibungslos funktionieren, wenn alte Dateien nicht ins elogs-Verzeichnis verschoben würden. Kann man das der Applikation vielleicht beibringen, daß alle Logfiles in einem Verzeichnis liegen sollen?

    [Reply]

    Pee Tee Reply:

    @lausser, Ok, die liegen schon alle im gleichen Verzeichnis. Hatte den ganzen Pfad unterschlagen. Sorry! So sieht es aus: … @searches = ({ tag => ‘FileNet_PE_elogyyyymmdd’, logfile => ‘/fnsw/local/logs/elogs/dummy’, archivedir => ‘/fnsw/local/logs/elogs’, rotation => ‘elog\d+’, criticalpatterns => [ ‘VW: process exiting with signal number 15 received.’ ], criticalthreshold => ’15’, options => ‘nosavethresholdcount,nologfilenocry’, }); …

    [Reply]

    lausser Reply:

    Dann wäre das hier eine funktionierende Lösung, bei der du dich nicht mehr um irgendwelche Datums/Zeitstempel kümmern brauchst.

    @searches = ({
            tag => 'FileNet_PE_elogyyyymmdd',
            logfile => '/fnsw/local/logs/elogs/dummy',
            rotation => 'elog\d+',
            criticalpatterns => [
                    'VW: process exiting with signal number 15 received.'
            ],
            criticalthreshold => '15',
            options => 'nosavethresholdcount',
    });
    Jetzt werden alle Dateien im elogs-Verzeichnis, zu denen der rotation-Pattern passt heranzegogen, wobei die Datei mit dem neuesten Änderungsdatum als aktuelles Logfile und alle anderen als Archive betrachtet werden.

    [Reply]

    Pee Tee Reply:

    Super!!! Danke, funktioniert!

    Viele Grüße

    [Reply]

    Pee Tee Reply:

    Leider werden hier auch alte Einträge gelesen, wenn das Seek-File, warum auch immer, nicht mehr da ist und dadurch Fehlmeldungen erzeugt. Wie kann das verhindert werden? ‘noallyoucaneat’ scheint hier nicht zu funktionieren.

    [Reply]

  193. Michael Banck Says:
    March 30th, 2011 at 15:42

    Hi,

    is there a best practise to make sure a notification is sent out for every found critical logfile line? As far as I can tell (and according to my testing), Nagios will send out notifications only for the first found line, then sets the service status to CRITICAL, and further critical lines until the next OK will not be sent and might be overlooked.

    Can/should one do this via a small retry_check_interval, even though I believe setting max_check_attempts=1 is recommended?

    Best regards, Michael

    [Reply]

  194. John P. Says:
    March 30th, 2011 at 23:06

    I have a daily job that writes to a log file. Every day it re-writes it.

    check_logfiles triggers on an error perfectly, however that ‘Critical Error’ goes away after a few minutes. I need it to stay in ‘Critical’ state until someone does something.. acknowledges it, or something.

    Then the next day, I need it to start over from the top of the file and scan for errors again. Is this possible?

    [Reply]

  195. John P Says:
    March 31st, 2011 at 21:50

    Hopefully, you are still answering questions. :)

    Simply put, when check_logfiles detects a string in a log, it needs to throw a critical and STAY critical until someone acknowledges it. Is this possible.

    [Reply]

    lausser Reply:

    option sticky. look in the documentation, the examples and several posts here.

    [Reply]

  196. John P Says:
    April 1st, 2011 at 0:00

    Thanks for the quick response!

    Unfortunately, this doesn’t answer the question. As far as I have gathered from reading, I have found that ‘sticky’ will just keep it at critical until the time is up. At which case, it will go back to ‘OK’, barring any other errors.

    Here is my simple problem: I have a batch job that runs once a day. It’s doing a bunch of Oracle work. All I’m searching for is the existence of any ‘ORA-‘ strings. I needed this check_logfiles to throw a Critical when this string (ORA-) is found then stay there at “Critical state” until which time someone Acknowledges the error, OR when the log file gets rotated (ie. the log got re-written on the next day’s run).

    As it stands now, the error (ORA-) will get detected, check_logfiles throws the error, then 5 mins later, nothing happens, and it goes back to ‘OK’ in spite of the fact that the batch failed utterly.

    [Reply]

    lausser Reply:

    You have to use max_check_attempts=1 and is_volatile so nagios sends a notification as soon as the service gets critical. the sticky option can be used to make the critical status permanent, either until a certain time has passed or until an okpattern appears in the logfile.

    acknowledging an event in the nagios gui cannot make the service green. this would only be possible if clicking ack would send a line containing the okpattern to the oracle-logfile. Only then would check_logfiles return OK when it runs next time.

    [Reply]

  197. Hernan Fonseca Says:
    April 1st, 2011 at 15:32

    Hi Lausser, Im trying this to catch some events ID in the application event log on Windows

    @searches = ( { options => ‘eventlogformat=”%w src:%s id:%i %m”‘, tag => ‘evt_app’, type => ‘eventlog’, eventlog => { eventlog => ‘application’, include => { source => ‘BizTalk Server 2006′, eventid => [‘5429′,’5410′,’6912′,’6913′,’5753′,’10034′,’7184′,’5439′,’7221′,’5649′,’5773′,’5888′,’5777′,’5697′,’5652′,’5743′,’5740′], }, }, criticalpatterns => ‘.*’, });

    The problem is, that i wanna catch all events Id =5429 or all events ID =5410, or all events Id=6912 , and so on…. keeping the same source => ‘BizTalk Server 2006′

    I tried the option “operation => ‘or’ “, but this makes an “OR” between source and events Id, and i just want an ” OR” only between event ID. How could be the best way to do that with out making all in separates files of course.

    Thanks in advance

    [Reply]

  198. Thomas Says:
    April 5th, 2011 at 16:24

    Hallo!

    Vielen Dank für die viele Arbeit die du in das Plugin gesteckt hast und immer noch steckst.

    Die sticky Option ist mir relativ unsympathisch, aber ich habe keine andere Möglichkeit gefunden, dass einmal gefundene Fehler, beim nächsten Check mit OK überschrieben werden.

    Habe ich etwas übersehen?

    Gruß Thomas

    [Reply]

    lausser Reply:

    Nein. Ob sympathisch oder unsympathisch, genau für diesen Zweck gibt es die sticky-Option.

    [Reply]

    Thomas Reply:

    @lausser,

    Danke für die schnelle Rückmeldung. Ich habe dein Beispiel 3 für den Versand der Einzeltreffer per send_nsca mit der Abfrage des AIX Error Reports kombiniert. Auf Serverseite sitzt check_mk mit dem “NSCA Listener” und wartet auf die Meldungen. Leider bekomme ich keine Einzeltreffer angezeigt, sondern immer nur einen einzigen Treffer und dieser wird beim nächsten Durchlauf auch wieder überschrieben.

    Kannst du mir da auch nochmal Auskunft geben?

    [Reply]

    lausser Reply:

    Hat der entsprechende Service evt. max_check_attempts > 1 und/oder is_volatile=0? Das würde erklären, weshalb Meldungen verschluckt werden. Evt. hat der NSCA-Listener auch die Möglichkeit eine Debug-datei zu schreiben. Seitens check_logfiles kann man mit ‘touch /tmp/check_logfile.trace’ eine Datei anlegen, in die das Plugin alles reinschreibt, was es gerade macht. Bitte darin kontrollieren, ob alle Fehlerzeilen gefunden und jeweils Events verschickt werden.

    [Reply]

    Thomas Reply:

    @lausser,

    Das trace File gibt mir aus, dass für jeden Treffer ein data packet raus geht. Auf dem Server wird aber trotzdem nur das letzte angezeigt. Über “extra_service_conf” habe ich die Werte für max_check_attempts und is_volatile auf 1 gesetzt. Hoffe ich habe dich da richtig verstanden.

    [Reply]

    Thomas Reply:

    @Thomas,

    Im nagios.log werden mir auch alle gesendeten Events angezeigt.

    Thomas Reply:

    @Thomas,

    Vielen Dank für deine Hilfe, das Problem hat sich gelöst. Man muss nur an der richtigen Stelle nach den Events ausschau halten. Dein Hinweis zu max_check_attempts und is_volatile hat es gerichtet.

  199. Benny Says:
    April 6th, 2011 at 14:37

    Hey folks,

    I’m having a rough time trying to match filenames for a custom in-house application. I have log files in a Windows folder named with a format of: “ISMM-DD-YYYY.log”, where MM-DD-YYYY is the current date. So, today’s log would be named “IS04-06-2011.log”. The logs are rotated daily.

    I would like to only check the current day’s log, but I’m not sure if that is possible.

    In an attempt to check them ALL just to test my config, I have:

    type => ‘rotating::uniform’, logfile => ‘E:\Import Logs\foo’, rotation => ‘IS\d+-\d+-\d+\.log’, options => ‘allyoucaneat,nologfilenocry,noperfdata,noprotocol,report=long,sticky=180′

    I have other things like tag and criticalpatterns, I don’t think they’re relevant to the question?

    Is my rotation setting incorrect? Is there a way to only check today’s log?

    Thanks so much!

    Benny

    [Reply]

    Benny Reply:

    @Benny, I guess I forgot the actual problem I’m seeing – I don’t get any hits at all, and I do have one instance of the criticalpattern in a previous log file.

    Hence, I’m wondering if I’m not matching the file name pattern, and it’s not checking any of the logs.

    Benny

    [Reply]

  200. Three must-have Nagios services for your Minecraft server | erschaffe.de Says:
    April 6th, 2011 at 18:56

    […] a wonderful plugin made by Gerhard Lausser: check_logfiles. It has some nice features. For example, you can specify several logfiles to be monitored by one […]

  201. Rainer Says:
    April 13th, 2011 at 10:09

    Hi,

    my config file looks like this:

    @searches = ( { tag => ‘soadw_serverlog’, logfile => ‘/../server.log’, criticalpatterns => [‘ERROR’], rotation => ‘server\.log\.\d{1}’, options => ‘criticalthreshold=10′, }, ); $seekfilesdir = ‘/var/tmp/mon/check_logfiles/'; $protocolsdir = ‘/var/tmp/mon/check_logfiles/';

    I assume that the criticalthreshold causes a alert if the number of ‘ERROR’-entries/timestep is at least 10. But icinga shows alerts at every appearance of an ‘Error’-entry.

    What is wrong – with the config or with my head ;-))

    [Reply]

    lausser Reply:

    criticalthresholds: ‘Eine Zahl N, die bedeutet, dass jeweils erst jeder N-te Treffer als Fehler gezählt wird.’

    Hier steht nichts von timesteps. Es werden die absoluten Treffer gezählt.

    [Reply]

  202. OMD: State Retention der nagios-Plugins in /var/tmp !??? | KenntWas.de - Technische Tipps Says:
    May 28th, 2011 at 12:36

    […] zum Cachen von Dateien oder als statesdir. Ein Beispiel dafür sind check_oracle_health und check_logfiles, die sich gewisse Informationen zwischen zwei Aufrufen persistent ‘merken’.Der absolute […]

  203. Need Help regarding nagios plugin - Admins Goodies Says:
    August 18th, 2011 at 4:23

    […] i want to install nagios plugin check_logfiles […]

  204. Use Nagios to monitor a log file and send log details - Admins Goodies Says:
    February 14th, 2012 at 19:35

    […] It is listed on Nagios Exchange, but here is the direct link to the English version: http://labs.consol.de/lang/en/nagios/check_logfiles/ […]

  205. Supervision temps réel de fichier de log avec NSClient++ | Communauté Francophone de la Supervision Libre Says:
    December 2nd, 2012 at 18:54

    […] la supervision de logs windows peut très bien se faire avec du check_wmi_plus, CheckEventLog ou CheckLogFile. Effectivement, ces 3 solutions réalisent largement le travail … mais la recherche […]

  206. Tasslehoff Burrfoot » Blog Archive » Monitoraggio logs con Nagios Says:
    January 28th, 2013 at 20:02

    […] Monitorare i log è un’attività spesso tediosa ma fondamentale per essere certi del corretto funzionamento di un servizio, un modo per rendere tutto più semplice e automatico è sfruttare il buon vecchio Nagios in combutta con l’ottimo plugin check_logfiles. […]

  207. Phát hiện đăng nhập root sai password với Nagios « MC's Blog Says:
    March 19th, 2013 at 4:27

    […] http://labs.consol.de/lang/en/nagios/check_logfiles/. Plugin này để tìm kiếm và so sánh một mẫu, chuỗi ký tự trong log files và đưa […]

  208. PHP异常和错误监控 Says:
    May 22nd, 2013 at 5:21

    […] 使用Nagios监控,需要安装相应的监控插件,这里使用check_logfiles,perl语言写的,通用监控日志文件的Nagios插件,支持正则表达式匹配。 […]

  209. Clarkseth Says:
    February 5th, 2014 at 14:05

    I’ve a problem with result of a *.cfg file, containing $prescript parameter, running by check_logfiles.exe. My platform is MS Windows Server 2008 R2 64 bit. I have to check in a dir, if there are *.err files. For do this job, i’ve write this powershell script:

    $mypath=”W:\nrpe\tmp\” $logfile = $args $logfile = Foreach-Object {$logfile -replace ‘\\’, ‘‘ -replace “_”, “”} $result = ls $args -Filter .err|Measure-Object -Line | select -expand lines echo “$result file/s present with .err string” >> $mypath$logfile I’ve chose to elaborate $logfile parameter, because i have to check more paths, and i want to use the same script.

    This is the cfg file:

    $scriptpath = ‘C:\Windows\System32\WindowsPowerShell\v1.0′; $seekfilesdir = ‘W:\nrpe\tmp';

    $prescript = ‘powershell.exe'; $prescriptparams = ‘-File W:\nrpe\libexec\check_err_file.ps1 \\networkpath\FTP_Data\ExtraUE\Input'; $options = ‘supersmartprescript';

    $log=’W:\nrpe\tmp\networkpath_FTP_Data_ExtraUE_Input'; @searches = ( { tag => ‘check_logfiles_test’, type => ‘simple’, logfile => $log, criticalpatterns => [‘!0 file/s present with .*err string’], options => ‘count,noprotocol,noperfdata’, } ); The $log file it’s empity if i run checklogfiles:

    W:\nrpe\libexec\check_logfiles -f W:\nrpe\cfg\check_logfiles_test.cfg

    But if i Run powershell manually, it works correctly:

    PS W:> W:\nrpe\libexec\check_err_file.ps1 \networkpath\FTP_Data\ExtraUE\Input

    Content of W:\nrpe\tmp\networkpath_FTP_Data_ExtraUE_Input:

    1 file/s present with .*err string

    Do you know what could be the problem?

    [Reply]

    Clarkseth Reply:

    this is more readable format: http://stackoverflow.com/questions/21576679/nagios-prescript-doesnt-work

    [Reply]

  210. Tim Bellen Says:
    February 17th, 2014 at 11:26

    I am having troubles with the protocol file.

    I have the following config

    @searches = ( { tag => ‘prot’, logfile => ‘/opt/local/java/jboss/server/hes/log/localhost_access_log.$CL_DATE_YYYY$-$CL_DATE_MM$-$CL_DATE_DD$.log’, rotation => ‘SOLARIS’, criticalpatterns => [‘reminder.*OpCode=get’], options => ‘protocol,nosavethresholdcount’, criticalthreshold => 10 } );

    I hoped that the protcol file would have all the matching lines from the moment the critical threshold was reached.

    But it does not. It only shows a few lines I gues one line per 10 which is the threshhold.

    Is it possible to have it match all the lines

    [Reply]

  211. Francesco Says:
    February 26th, 2014 at 15:25

    Hi Iassuser, one question about *.cfg file. If i have to search how many occurence have of CriticalPatterns and WarningPatterns words searched, what i have to do? Now i have this search: @searches = ( { tag => ‘check_logfiles_Services_passive’, type => ‘virtual’, logfile => $log, criticalpatterns => [‘Stopped’,’Running’,’Paused’], warningpatterns => [‘Auto’,’Disabled’,’Manual’], options => ‘noprotocol,noperfdata,count’, }, );

    And output is this: CRITICAL – (1 errors, 1 warnings) – H_Desktop:RunningHS_Repli:RunningThemes:AutoThemes:Stopped

    I would see CRITICAL – (3 errors, 1 warnings). What i have to do?

    Thanks in advance

    [Reply]

  212. stefan Says:
    March 10th, 2014 at 18:15

    Hi, thx for the great plugin! I stumbled over a problem where my /tmp dir got spammed with protocol-files because the checked logfile no longer existed. Is there a way to prevent this behaviour (other than setting the noprotocol option)? Thanks in advance.

    [Reply]

  213. Dan Says:
    March 16th, 2014 at 12:15

    Hi…my log files contain messages from multiple clients. Have already figured out from previous examples that I can use $CL_TAG in a template to restrict the search to a specific client. However, I was then planning on using a postscript to send the check results back to nagios via NSCA, but I want to use the client that was specified in the actual search as the host that the result will appear against. My problem is that the content of $CL_TAG changes when it calls the postscript so not quite sure how I can pass the name of the client that was used in the search to my postscript.

    [Reply]

  214. Krish Says:
    March 26th, 2014 at 10:33

    Hi,

    How to print all matched lines in alert email?

    if it is found 25 errors, it should print all 25 line in an alert email how to do that?

    Thanks Krish

    [Reply]

  215. Mower Says:
    March 27th, 2014 at 16:49

    Hi

    Is there an option to output the search filename in the warning/critical message.

    [Reply]

  216. Miss_Knox Says:
    March 28th, 2014 at 21:48

    Hi Iausser,

    When I execute the follow command, i get OK message;

    sh-3.2$ whoami nagios

    sh-3.2$ ./check_logfiles –config teste.cfg OK – no errors or warnings|knox_lines=2 knox_warnings=0 knox_criticals=0 knox_unknowns=0

    My config file (teste.cfg):

    sh-3.2$ cat teste.cfg @searches = ({ tag => ‘knox’, logfile => ‘/var/log/secure’, criticalpattern => ‘Failed’ });

    But when i grep this system log (secure) i get:

    sh-3.2$ cat /var/log/secure | grep Failed Mar 24 14:50:16 fan sshd[12764]: Failed password for invalid user…

    What is wrong?

    Help me please ^^

    Thanks

    [Reply]

  217. Mark Thornber Says:
    April 2nd, 2014 at 17:08

    When is the protocol file closed ? I am using $CHECK_LOGFILES_PROTOCOLFILE in a postscript to get at the lines selected from the logfile by the critical patterns. The postscript fails with PROCESS_CSSI_CHANGES UNKNOWN: Internal error : Cannot open /tmp/check_tsc_cssi.protocol-2014-04-02-15-58-32 : No such file or directory at /opt/nagios/etc/process_cssi_changes line 219.

    [Reply]

    Mark Thornber Reply:

    @Mark Thornber, ‘sOK – I hadn’t allowed for nothing new in the logfiles and hence no protocol file – doh!

    [Reply]

  218. Dan Says:
    June 7th, 2014 at 16:32

    Hi…

    Am having problems getting the “sticky” option to work how I want it to. Here is a sample of one of my searches (using a template) in my config file (gpg13.cfg):

    { template => ‘windows-pmc4′, logfile => ‘/logs/$CL_TAG$-messages’, rotation => ‘loglog0gzlog1gz’, options => ‘sticky’, criticalpatterns => [ ‘^\d+\.2 ‘, # Critical event ‘#011Critical#011′, # Any CRITICAL event ‘Detected using Scan engine’, # Virus detected ‘Detected with Scan Engine’ # Virus detected ] }

    and this is how I call the above search from nagios:

    ./check_logfiles –tag=test-server –config=gpg13.cfg –selectedsearches=windows-pmc4`

    I used the EICAR file on my test-server to generate a virus alert and this was correctly picked up by the above search. Furthermore, because of the sticky option, the error was propagated on each successive run of the above search which is what I want. I then generated a second virus alert and this was correctly picked up which meant the search above now report “2 events found” which, again, is what I expected. However, when the search next ran (and on all successive runs thereafter) it only reported on the very last event found…it seemed to have discarded the first event! Because of the “sticky” option I expected it to carry on reporting the 2 events indefinitely. Isn’t that how the “sticky” option is supposed to work or have I misunderstood?

    [Reply]

    Dan Reply:

    Managed to figure it out myself. You have to set the report option to “long” for successive errors to be included in the propogation.

    [Reply]

  219. mid Says:
    July 17th, 2014 at 14:07

    Moin,

    wir versuchen, über bestimmte Logfiles mit dem Check abzufragen. Das funktioniert auch soweit. Nur haben wir das Problem, dass bei einem 2.ten Durchlauf der alte Status nicht beibehalten wird, wenn z.B. keine neue Zeile im Logfile auftaucht.

    #

    Lauf 1: logfile enthält error-string -> status wechselt auf “critical” oder “warning”.

    Lauf 2: logfile enthält nur einen Text, der auf keinen definierten STRING passt.

    #

    Bin hier leider nur sehr wenig im Thema und versuche lediglich, unseren Azubis zu helfen.

    Sysinfos: – Check prüft auf Windows–Systemen. – Stickyfunktion wurde bereits getestet. – auch den Type habe ich mal umgestellt, damit ggf. die ganze Datei immer durchsucht wird. Hilfreich ist dies aber nicht, da die Logdatei im Laufe des Tages ganz gut anwächst.

    Könnt ihr mir hierbei weiterhelfen?

    Mit freundlichen Grüßen

    mid

    [Reply]

  220. Chaitanya Reddy Says:
    July 31st, 2014 at 13:25

    I want to call the remote check_logfiles using ssh and not via nrpe. Can you give some examples on that. I get following error when trying to run manual check from Nagios server :

    ssh -tt -p 20022 monitor@192.168.32.2 “sudo -n -u root /usr/lib64/nagios/plugins/check_logfiles –config test_logs.conf” UNKNOWN – can not load configuration file test_logs.conf

    I’ve updates in sudoers file as well

    Cmnd_Alias CHECK_LOGFILES = /usr/lib64/nagios/plugins/check_logfiles –config test_logs.conf monitor ALL=(root) NOPASSWD: CHECK_LOGFILES

    [Reply]

  221. check_logfile监控oracle的日志以及其他各种日志 | 记录生活和工作的那些事 Says:
    September 7th, 2014 at 12:17

    […] http://labs.consol.de/lang/de/nagios/check_logfiles/ […]

  222. marvin Says:
    September 10th, 2014 at 11:23

    Hi,

    ich versuche gerade check_logfiles in Betrieb zu nehmen, aber komme nicht so ganz weiter.

    Ich habe eine test.log erstellt, welche ich auf “error” prüfe. In der Datei stehen mehrere Zeilen, unter anderem auch 2x “error”.

    Beim ersten Lauf wird nichts gefunden und OK zurückgeliefert. Das gleiche beim zweiten Lauf, usw.

    In dem Seek-File steht auch das entsprechende Offset, bis wohin gesucht wurde (Dateiende).

    Nun schreibe ich manuell in die test.log – an das Ende – “neue zeile”.

    Starte ich nun das Script, wird 2x “error” gefunden. Das ist mir nicht so ganz klar warum, da doch eigentlich dort weitergesucht wird, wo beim letzten Lauf aufgehört wurde, oder?

    Hier meine Config-Datei:

    $options = ‘report=long'; @searches = ( { tag => ‘test’, logfile => ‘/var/log/test.log’, criticalpatterns => [ ‘error’ ], }, );

    [Reply]

  223. mir Says:
    October 5th, 2014 at 17:05

    hi i have many check_logfiles.protocol in my tmp partition how can i delete this files automatically?

    [Reply]

    lausser Reply:

    either use options=noprotocol in the search or use the global option protocolretention

    [Reply]

  224. zaxxon Says:
    October 24th, 2014 at 15:09

    Hello, I searched the documentation, the comments here as well as the examples etc. but I did not find a way to have environment variables being included to the definitions in the config. For example I would like to include a windows variable in the form of %VAR% to resemble part of the path to the log file to be checked. Thanks for a hint!

    Cheers Markus

    [Reply]

    lausser Reply:

    $ENV{VAR}

    [Reply]

Leave a Reply