check_db2_health

Posted on October 1st, 2009 by lausser

Description

check_db2_health is a plugin, which is used to monitor various parameters of a DB2 database.

Documentation

Commandline parameters

  • –database <DB-Name> The name of the database. (If it was catalogued locally, this parameter is the only you need. Otherwise you must specify database, hostname and port)
  • –hostname <hostname> The database server
  • –port <port> The port, where DB2 listens
  • –username <username> The database user
  • –password <password> The database password
  • –mode <modus> With the mode parameter you tell the plugin what you want it to do. (See the list of possible values in the table below)
  • –name <objektname> You can limit the checks to a specific database object by using the name parameter (e.g. tablespace, buffercache). It is also used for a custom sql statement with –mode sql
  • –name2 <string> If you use –mode sql, the statement appears in the output and the performance data. Use –name2 to specify a custom string.
  • –warning <range> The warning threshold.
  • –critical <range> The critical threshold.
  • –environment <variable>=<wert> You can pass environment variables to the plugin by using this parameters. It can be used multiple times.
  • –method <connectmethode> This tells the plugin how to connect to the database. The only method implemented yet is "dbi" which is the default. (It means, the plugin uses the perl module DBD::DB2).
  • –units <%|KB|MB|GB> When using –mode sql you can specify a unit which will appear in the output and the performance data.

Using the –mode parameter with the following arguments tells the plugin what it should monitor.

Modi

Keyword Meaning Range
connection-time Measures, how long it takes to connect and login. 0..n seconds (1, 5)
connected-users Number of connected users 0..n (50, 100)
synchronous-read-percentage Percentage of synchronous reads (SRP) 0%..100% (90:, 80:)
asynchronous-write-percentage Percentage of asynchronous writes (AWP) 0%..100% (90:, 80:)
bufferpool-hitratio Hitratio in Buffer Pools (can be limited to a specific pool by using –name) 0%..100% (98:, 90:)
bufferpool-data-hitratio The same, but only Data Pages 0%..100% (98:, 90:)
bufferpool-index-hitratio The same, but only Index Pages 0%..100% (98:, 90:)
index-usage Percentage of SELECTs, which use an index 0%..100% (98:, 90:)
deadlocks Number of deadlocks per second 0..n (0, 1)
lock-waits Number of lock requests per second which could not be satisfied. 0..n (10, 100)
lock-waiting Fraction of time which was spent waiting for locks 0%..100% (2%, 5%)
database-usage Used space in a database 0%..100% (80%, 90%)
tablespace-usage Used space in a tablespace 0%..100% (90%, 98%)
tablespace-free Free space in a tablespace. In contrast to the previous mode you can use units (MB, GB) for the thresholds. 0%..100% (5:, 2:)
log-utilization Used space in a database log 0%..100% (80, 90)
sql Execute a custom sql statement which returns a numerical value. The statement itself is passed as an argument to the –name parameter. A label for the performance data can be set with the –name2 parameter. With the parameter –units you can add units (%, c, s, MB, GB,..) to the putput. If the sql statement contains special characters you can encode it first by using –mode encode. 0..n
list-databases Outputs a list of all databases  
list-tablespaces Outputs a list of all tablespaces  
list-bufferpools Outputs a list of all bufferpools  

Thresholds can take the form , : and :.

"10" means "Alert, if > 10" and

"90:" means "Alert, if < 90"

Preparation of the database

In order for the plugin to retrieve the necessary information from the database, a (OS-)user "nagios" (with group nagios) is needed. Maybe it already exists because the database server is monitored with check_nrpe or check_by_ssh.

The Monitoring Switches need to be set:

update dbm cfg using dft_mon_bufpool on
update dbm cfg using dft_mon_lock on
update dbm cfg using dft_mon_timestamp on

The nagios-user (to be exact: the nagios-group) gets the necessary privileges:

db2inst1$ db2 update dbm cfg using sysmon_group nagios
db2inst1$ db2 grant select,update on table SYSTOOLS.STMG_DBSIZE_INFO to nagios
db2inst1$ db2stop; db2start

Examples

nagsrv$ check_db2_health --mode connection-time
WARNING - 1.61 seconds to connect as DB2INST1 | connection_time=1.6084;1;5
 
nagsrv$ check_db2_health --mode connected-users
OK - 3 connected users | connected_users=3;50;100
 
nagsrv$ check_db2_health --mode list-databases
TOOLSDB
OK - have fun
 
nagsrv$ check_db2_health --mode database-usage
OK - database usage is 31.29% | 'db_toolsdb_usage'=31.29%;80;90
 
nagsrv$ check_db2_health --mode tablespace-usage
CRITICAL - tbs TEMPSPACE1 usage is 100.00%, tbs TBSP32KTMP0000 usage is 100.00%, tbs TBSP32K0000 usage is 100.00%, tbs USERSPACE1 usage is 5.08%, tbs SYSTOOLSPACE usage is 1.86%, tbs SYSCATSPACE usage is 80.37% | 'tbs_userspace1_usage_pct'=5.08%;90;98 'tbs_userspace1_usage'=16MB;288;313;0;320 'tbs_tempspace1_usage_pct'=100.00%;90;98 'tbs_tempspace1_usage'=0MB;0;0;0;0 'tbs_tbsp32ktmp0000_usage_pct'=100.00%;90;98 'tbs_tbsp32ktmp0000_usage'=0MB;0;0;0;0 'tbs_tbsp32k0000_usage_pct'=100.00%;90;98 'tbs_tbsp32k0000_usage'=61MB;55;60;0;61 'tbs_systoolspace_usage_pct'=1.86%;90;98 'tbs_systoolspace_usage'=0MB;28;31;0;32 'tbs_syscatspace_usage_pct'=80.37%;90;98 'tbs_syscatspace_usage'=51MB;57;62;0;64
 
nagsrv$ check_db2_health --mode list-tablespaces
SYSCATSPACE
SYSTOOLSPACE
TBSP32K0000
TBSP32KTMP0000
TEMPSPACE1
USERSPACE1
OK - have fun
 
nagsrv$ check_db2_health --mode tablespace-usage --name SYSCATSPACE
OK - tbs SYSCATSPACE usage is 80.37% | 'tbs_syscatspace_usage_pct'=80.37%;90;98 'tbs_syscatspace_usage'=51MB;57;62;0;64
 
nagsrv$ check_db2_health --mode tablespace-free --name SYSCATSPACE
OK - tbs SYSCATSPACE has 19.63% free space left | 'tbs_syscatspace_free_pct'=19.63%;5:;2: 'tbs_syscatspace_free'=12MB;3.20:;1.28:;0;64.00
 
nagsrv$ check_db2_health --mode tablespace-free --name SYSCATSPACE --units MB
OK - tbs SYSCATSPACE has 12.55MB free space left | 'tbs_syscatspace_free_pct'=19.63%;7.81:;3.12: 'tbs_syscatspace_free'=12.55MB;5.00:;2.00:;0;64.00
 
nagsrv$ check_db2_health --mode tablespace-free --name SYSCATSPACE --units MB --warning 15: --critical 10:
WARNING - tbs SYSCATSPACE has 12.55MB free space left | 'tbs_syscatspace_free_pct'=19.63%;23.44:;15.62: 'tbs_syscatspace_free'=12.55MB;15.00:;10.00:;0;64.00
 
nagsrv$ check_db2_health --mode bufferpool-hitratio
CRITICAL - bufferpool IBMDEFAULTBP hitratio is 53.60%, bufferpool BP32K0000 hitratio is 100.00% | 'bp_ibmdefaultbp_hitratio'=53.60%;98:;90: 'bp_ibmdefaultbp_hitratio_now'=100.00% 'bp_bp32k0000_hitratio'=100.00%;98:;90: 'bp_bp32k0000_hitratio_now'=100.00%
 
nagsrv$ check_db2_health --mode list-bufferpools
BP32K0000
IBMDEFAULTBP
OK - have fun
 
nagsrv$ check_db2_health --mode bufferpool-hitratio --name IBMDEFAULTBP
CRITICAL - bufferpool IBMDEFAULTBP hitratio is 53.60% | 'bp_ibmdefaultbp_hitratio'=53.60%;98:;90: 'bp_ibmdefaultbp_hitratio_now'=100.00%
 
nagsrv$ check_db2_health --mode bufferpool-data-hitratio --name IBMDEFAULTBP
CRITICAL - bufferpool IBMDEFAULTBP data page hitratio is 64.35% | 'bp_ibmdefaultbp_hitratio'=64.35%;98:;90: 'bp_ibmdefaultbp_hitratio_now'=100.00%
 
nagsrv$ check_db2_health --mode bufferpool-index-hitratio --name IBMDEFAULTBP
CRITICAL - bufferpool IBMDEFAULTBP index hitratio is 38.89% | 'bp_ibmdefaultbp_hitratio'=38.89%;98:;90: 'bp_ibmdefaultbp_hitratio_now'=100.00%
 
nagsrv$ check_db2_health --mode index-usage
CRITICAL - index usage is 0.71% | index_usage=0.71%;98:;90:
 
nagsrv$ check_db2_health --mode synchronous-read-percentage
OK - synchronous read percentage is 100.00% | srp=100.00%;90:;80:
 
nagsrv$ check_db2_health --mode asynchronous-write-percentage
CRITICAL - asynchronous write percentage is 0.00% | awp=0.00%;90:;80:
 
nagsrv$ check_db2_health --mode deadlocks
OK - 0.000000 deadlocs / sec | deadlocks_per_sec=0.000000;0;1
 
nagsrv$ check_db2_health --mode lock-waits
OK - 0.000000 lock waits / sec | lock_waits_per_sec=0.000000;10;100
 
nagsrv$ check_db2_health --mode lock-waiting
OK - 0.000000% of the time was spent waiting for locks | lock_percent_waiting=0.000000%;2;5
Using environment variables

The parameters –hostname, –username, –password and –port can be omitted, if the corresponding data are available via environment variables. Since version 3.x of nagios, service definitions can have custom attributes, which can be used to specify login data. During the plugin execution they are available as environment variables .

Die Environmentvariablen heissen:

  • NAGIOS__SERVICEDB2_HOST (_db2_host in the service definition)
  • NAGIOS__SERVICEDB2_USER (_db2_user in the service definition)
  • NAGIOS__SERVICEDB2_PASS (_db2_pass in the service definition)
  • NAGIOS__SERVICEDB2_PORT (_db2_port in the service definition)
  • NAGIOS__SERVICEDB2_DATABASE (_db2_database in the service definition)

Installation

This plugin requires the installation of the Perl-Module DBD::DB2.

After unpacking the tar archive you have to run ./configure. With ./configure –help you get the list of possible options.

  • –prefix=BASEDIRECTORY The base directory of the Nagios installation (default: /usr/local/nagios). The final destination for check_db2_health will be the libexec subdirectory.
  • –with-nagios-user=SOMEUSER The owner of check_db2_health. (default: nagios)
  • –with-nagios-group=SOMEGROUP The group of check_db2_health. (default: nagios)
  • –with-perl=PATHTOPERL A non-standard perl interpreter. (default: perl found in PATH)

Download

check_db2_health-1.0.1.tar.gz

check_db2_health-1.0.1.shar.gz

Changelog

  • 2010-06-10 1.0.1 Bugfix in connected-users. Thanks Niko
  • 2009-10-01 1.0 First public release

Copyright

Gerhard Lausser

Check_db2_health is released under the GNU General Public License. GPL

Autor

Gerhard Laußer (gerhard.lausser@consol.de) will gladly answer your questions.

24 Responses to “check_db2_health”

  1. Juan Says:
    January 8th, 2010 at 13:07

    What version of DB2 is this plugin compatible with?

    [Reply]

    lausser Reply:

    I use it with 9.x, but you might try it with 8.x and tell me wether it works.

    [Reply]

  2. Christoph Says:
    January 22nd, 2010 at 15:22

    Hallo, das Plugin ist echt Klasse. Nur wie frage ich in einem SQL nach einer character ab: check_db2_health –mode sql –name ‘select count(*) from sysibmadm.tbsp_utilization where tbsp_state’='NORMAL” -warning 1 -critical 1

    Use of uninitialized value $value in numeric gt (>) at ./check_db2_health line 1262

    Und bei ein paar Sachen (asynchronous-write-percentage) bekomme ich die Meldung: CRITICAL – unable to aquire awp info Grüße Christoph

    [Reply]

  3. lausser Says:
    January 23rd, 2010 at 14:42

    Leg eine Datei /tmp/check_db2_health.trace an, lass das Plugin laufen und schau in einem anderen Fenster mit “tail -f /tmp/check_db2_health.trace” zu, was da hinter den Kulissen passiert. Die in der Tracedatei auftauchenden SQL-Statements danach bitte manuell eingeben. Sie müssen ein sinnvolles Ergebnis liefern.

    [Reply]

  4. Christoph Says:
    January 26th, 2010 at 10:52

    Danke, damit kann ich das wunderbar tracen.

    [Reply]

  5. Giovanni Says:
    February 22nd, 2010 at 17:35

    Hallo, I installed check_db2_health plugin and it works fine from command line. Anyway, when I run it from Nagios, I get this error:

    “CRITICAL – cannot connect to MyHost. Total Environment allocation failure! Did you set up your DB2 client environment?”

    Below the Nagios command definition

    define command{ command_name check_db2_health_connection_time command_line $USER1$/check_db2_health –hostname $HOSTADDRESS$ –database $_SERVICEDB2_DATABASE$ –port $_SERVICEDB2_PORT$ –username=$_SERVICEDB2_USER$ –password=$_SERVICEDB2_PASS$ –mode connection-time }

    $_SERVICEDB2_xxxx$ macros have been defined in my template and service configuration files (as _db2_xxxx). I tried to force DB2_HOME environment variable in /etc/init.d/nagios script, but it doesn’t work.

    Thanks in advance for any help. Giovanni

    [Reply]

    lausser Reply:

    What about this? http://archive.netbsd.se/?ml=perl-dbi-users&a=2009-11&t=11921001 Does it solve your problem?

    [Reply]

    Giovanni Reply:

    @lausser, I made confusion with .bash_profile and .bashrc, so the environment variables have been overwritten. Now it works perfectly. Thanks a lot, Giovanni

    [Reply]

  6. Max Says:
    March 4th, 2010 at 17:34

    Ich habe folgenden Fehler:

    CRITICAL – cannot connect to 10.29.234.123. [IBM][CLI Driver] SQL30081N A communication error has been detected. Communication protocol being used: “TCP/IP”. Communication API being used: “SOCKETS”. Location where the error was detected: “10.29.234.123″. Communication function detecting the error: “recv”. Protocol specific error code(s): ““, ““, “0″. SQLSTATE=08001 # Komme nicht weiter… Gruss

    [Reply]

    lausser Reply:

    Alles überprüft? Die IP-Adresse? Den Port? Das Passwort? SQLSTATE=08001 findet man x-mal bei Google und weist auf einen Fehler bei der Angabe dieser Verbindungsdaten hin.

    [Reply]

    Max Reply:

    @lausser, Ja, soweit alles überprüft… da ich keinen Catalog verwende gebe ich alles mit: /usr/local/nagios/libexec/check_db2_health –hostname 10.29.234.xxx –port 5900 –database xx –username name –password pass –mode database-usage

    [Reply]

    lausser Reply:

    Probiers mit dem folgenden Mini-Script. Damit lässt sich der Fehler eingrenzen.

    # fill in your parameters
    #
    my $database = '';
    #
    # if the database is not in the catalogue, add host + port
    #
    my $hostname = '';
    my $port = '';
     
    my $username = '';
    my $password = '';
     
    use strict;
    use warnings;
    my $dsn = "DBI:DB2:";
    if (! $hostname) {
      # catalog tcpip node <host-nickname> remote <hostname> server <port>
      # catalog database <remote-db> as <local-nick> at node <host-nickname>
      $dsn .= $database;
    } else {
      $dsn .= sprintf "DATABASE=%s; ", $database;
      $dsn .= sprintf "HOSTNAME=%s; ", $hostname;
      $dsn .= sprintf "PORT=%d; ", $port;
      $dsn .= sprintf "PROTOCOL=TCPIP; ";
      $dsn .= sprintf "UID=%s; ", $username;
      $dsn .= sprintf "PWD=%s;", $password;
    }
     
    eval {
      require DBI;
      if (my $dbh = DBI->connect(
          $dsn,
          $username,
          $password,
          { RaiseError => 0, AutoCommit => 1, PrintError => 0 })) {
        printf "OK - connected\n";
      } else {
        die DBI::errstr();
      }
    };
    if ($@) {
      printf "%s\n", $@;
    }

    Danach steht fest, ob die Verbindung mut dem Perl-Modul DBD::DB2 grundsätzlich hinhaut.

    [Reply]

    Max Reply:

    @lausser, Danke für die schnelle Antwort.

    Also hab ich die Catalog-Befehle mal ausgeführt, dann alles im Script ergänzt, und ausgeführt, bekomme folgendes:

    [IBM][CLI Driver] SQL30081N A communication error has been detected. Communication protocol being used: “TCP/IP”. Communication API being used: “SOCKETS”. Location where the error was detected: “10.29.234.123″. Communication function detecting the error: “recv”. Protocol specific error code(s): ““, ““, “0″. SQLSTATE=08001

    [Reply]

    Max Reply:

    @lausser,

    Hier der Weg “zu Fuss”: db2 => connect to c14 SQL30081N A communication error has been detected. Communication protocol being used: “TCP/IP”. Communication API being used: “SOCKETS”. Location where the error was detected: “10.29.234.123″. Communication function detecting the error: “recv”. Protocol specific error code(s): ““, ““, “0″. SQLSTATE=08001

    [Reply]

    Max Reply:

    @lausser, Bekomme auch noch teilweise Fehler 104, als würde die Verbindung vom DB-Host abgelehnt…

    [Reply]

    Max Reply:

    @Max, Problem liegt an der Config der DB, hab eine andere versucht, Login 1A… <>

  7. Max Says:
    March 9th, 2010 at 10:16

    Jetzt habe ich ein anderes Problem: zu Fuss auf der shell funzt es, aber wenn Nagios die Scripte ausführt, bekomme ich: “Total Environment allocation failure! Did you set up your DB2 client environment? ” Aus dem Artikel hier http://archive.netbsd.se/?ml=perl-dbi-users&a=2009-11&t=11921001 werde ich aber nicht schlau, ich habe wie im Buch von Herrn Laußer die Pfade als export in die .bashrc vom User Nagios und in das Startscript von Nagios (init.d) eingetragen, bringt aber nix… Gruss Maximilian

    [Reply]

    lausser Reply:

    Ich würde mal schauen, welche mit DB2 zusammenhängenden Environmentvariablen in der Shell vorhanden sind.

    env | grep -i db2
    Vermutlich reicht es, die LD_LIBRARY_PATH im Initscript einzutragen.

    [Reply]

    Max Reply:

    @lausser,

    Hallo Herr Laußer, Habs so gelöst:

    die Home-Vz-Struktur vom User dbainst1 auf Nagios kopiert, und dann im initscript das Userprofile mitgeladen. Das hinzufügen der LD_LIB… im init hatte ich wie in Ihrem Buch beschrieben eingetragen, leider ohne Erfolg, deshalb die oben beschriebene Vorgehensweise… hauptsache es läuft :)

    [Reply]

  8. Andreas Says:
    May 13th, 2010 at 19:33

    Hallo, vielen Dank schon mal für das tolle Plugin. Ich habe schon recht viel davon in Verwendung. Ich hätte da noch zwei Fragen.

    1. Welche Berechtigung ist für den Modus “database-usage” erforderlich? Mit einem “normalen” Benutzer in der SYSMON Gruppe bekomme ich den Fehler “unable to aquire database info”. Sobald der Benutzer Mitglied der OS Gruppe DB2ADMNS ist, funktioniert es. Lässt sich dies noch anders lösen?

    2. Ich bekomme im Modus “bufferpool-hitratio” immer 100% zurück. Gibt es hier noch etwas zu beachten? Die Monitor Schalter sind eingeschaltet.

    Vielen Dank schon mal!

    [Reply]

  9. Tim Says:
    May 28th, 2010 at 14:15

    Hallo,schickes Plugin.Habe aber ein kleines Problem: Check_db2_health führt anscheinend auf keiner meiner überwachten Datenbanken “CALL GET_DBSIZE_INFO(?, ?, ?, 0)” erfolgreich aus (kein neuer Snapshot-timestamp anschließend beim Aufruf von SELECT * FROM SYSTOOLS.STMG_DBSIZE_INFO). Somit ist der ermittelte Füllstand auch bei wachsender DB immer gleich.(trotz periodisch eingeplanten check_db2_health mit mode database-usage) Getestet habe ich mit folgenden DB-Versionen: 9.1, 9.5, 9.7 und Basis Perl v5.8.8,DBD-DB2-1.78,db2exc_971_LNX_x86_64. Ich habe die Vorbereitungen wie von Ihnen beschrieben durchgeführt. Muss auf der DB-Seite noch etwas geändert werden? Viele Grüße,Tim.

    [Reply]

    lausser Reply:

    Kann ich mir nicht erklären, muss allerdings dazusagen, daß ich kein DB2-Profi bin. Wurde die Datenbank rebootet nach dem Absetzen der 4 update-Kommandos?

    [Reply]

  10. Niko Says:
    June 9th, 2010 at 15:20

    Hi,

    wir bauen gerade auch ein Monitoring für DB2 mittels Nagios. Wir setzen auch dabei das check_db2_health ein. Mir gefällt das Skript gut.. Habe allerdings auch ein paar Anmerkungen:

    connected-users: Gibt nur die Anzahl der Connections an, die im Zustand ‘Connected’ sind. DB2 kennt aber noch ein paar Stati mehr. Siehe http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/topic/com.ibm.db2.luw.sql.rtn.doc/doc/r0022011.html?resultof=%22%73%79%73%69%62%6d%61%64%6d%2e%61%70%70%6c%69%63%61%74%69%6f%6e%73%22%20 Die Abfrage SELECT COUNT(*) FROM sysibmadm.applications trifft es eher.

    Zum aktuellen Problem: Die Mitgliedschaft in der SYSMON_GROUP reichen nicht aus, um “CALL GET_DBSIZE_INFO(…)” aufzurufen. Da müssen noch weitere Berechtigungen gesetzt werden. grant select,update on table SYSTOOLS.STMG_DBSIZE_INFO to user

    Grüße, Niko

    [Reply]

    lausser Reply:

    Ja, die Sache mit dem “connected” leuchtet mir ein. Ich werde ein Korrekturrelease rausbringen und auch die Doku bzgl. Berechtigungen erweitern. Vielen Dank für den Hinweis

    [Reply]

Leave a Reply