check_oracle_health

Posted on July 3rd, 2009 by admin

Description

check_oracle_health is a plugin to check various parameters of an Oracle database.

Documentation

Command line parameters

  • –connect= The database name
  • –user= The database user
  • –password= Password of the database user.
  • –connect= Alternativ to the parameters above.
  • –connect=sysdba@ Login with / as sysdba (if the user that executes the plugin is privileged to do this)
  • –connect=/@token Login with help of the Password Store (assumes –method=sqlplus)
  • –mode= With the mode-parameter you tell the plugin what it should do. See the list of possible values further down.
  • –tablespace= With this you can limit the check of a single tablespace. If this parameter is omitted all tablespaces are checked.
  • –datafile= With this you can limit the check of a single datafile. If this parameter is omitted all datafiles are checked.
  • –name= Here the check can be limited to a single object (Latch, Enqueue, Tablespace, Datafile). If this parameter is omitted all objects are checked. (Instead of –tablespace or –datafile this parameter can and should be used. It servers the purpose to standardize the CLI interface.)
  • –name2= f you use –mode=sql, then the SQL-Statement appears in the output and performance values. With the parameter name2 you’re able to specify a string for this.
  • –regexp Through this switch the value of the –name Parameters will be interpreted as regular expression.
  • –warning= Determined values outside of this range trigger a WARNING.
  • –critical= Determined values outside of this range trigger a CRITICAL.
  • –absolute Without –absolute values that increase in the course of time will show the increase per second or with –absolute show the difference between the current and last run.
  • –runas= With this parameter it is possible to run the script under a different user. (Calls sudo internally: sudo -u .
  • –environment = With this you can pass environment variables to the script. For example: –environment ORACLE_HOME=/u01/oracle. Multiple declarations are possible.
  • –method= With this parameter you tell the plugin how it should connect to the database. (dbi for using DBD::Oracle (default), sqlplus for using the sqlplus-Tool).
  • –units=<%|KB|MB|GB> The declaration from units servers the "beautification" of the output from mode=sql and simplification from threshold values when using mode=tablespace-free

Use the option –mode with various keywords to tell the Plugin which values it should determine and check.

Keyword Description Range
tnsping Listener  
connection-time Determines how long connection establishment and login take 0..n Seconds (1, 5)
connected-users The sum of logged in users at the database 0..n (50, 100)
sga-data-buffer-hit-ratio Hitrate in the Data Buffer Cache 0%..100% (98:, 95:)
sga-library-cache-hit-ratio Hitrate in the Library Cache 0%..100% (98:, 95:)
sga-dictionary-cache-hit-ratio Hitrate in the Dictionary Cache 0%..100% (95:, 90:)
sga-latches-hit-ratio Hitrate of the Latches 0%..100% (98:, 95:)
sga-shared-pool-reloads Reload-Rate in the Shared Pool 0%..100% (1, 10)
sga-shared-pool-free Free Memory in the Shared Pool 0%..100% (10:, 5:)
pga-in-memory-sort-ratio Percentage of sorts in the memory. 0%..100% (99:, 90:)
invalid-objects Sum of faulty Objects, Indices, Partitions  
stale-statistics Sum of objects with obsolete optimizer statistics n (10, 100)
tablespace-usage Used diskspace in the tablespace 0%..100% (90, 98)
tablespace-free Free diskspace in the tablespace 0%..100% (5:, 2:)
tablespace-fragmentation Free Space Fragmentation Index 100..1 (30:, 20:)
tablespace-io-balanc IO-Distribution under the datafiles of a tablespace n (1.0, 2.0)
tablespace-remaining-time Sum of remaining days until a tablespace is used by 100%. The rate of increase will be calculated with the values from the last 30 days. (With the parameter –lookback different periods can be specified) Days (90:, 30:)
tablespace-can-allocate-next Checks if there is enough free tablespace for the next Extent.  
datafile-io-traffic Sum of IO-Operationes from Datafiles per second n/sec (1000, 5000)
soft-parse-ratio Percentage of soft-parse-ratio 0%..100%
switch-interval Interval between RedoLog File Switches 0..n Seconds (600:, 60:)
retry-ratio Retry-Rate in the RedoLog Buffer 0%..100% (1, 10)
redo-io-traffic Redolog IO in MB/sec n/sec (199,200)
roll-header-contention Rollback Segment Header Contention 0%..100% (1, 2)
roll-block-contention Rollback Segment Block Contention 0%..100% (1, 2)
roll-hit-ratio Rollback Segment gets/waits Ratio 0%..100% (99:, 98:)
roll-extends Rollback Segment Extends n, n/sec (1, 100)
roll-wraps Rollback Segment Wraps n, n/sec (1, 100)
seg-top10-logical-reads Sum of the userprocesses under the top 10 logical reads n (1, 9)
seg-top10-physical-reads Sum of the userprocesses under the top 10 physical reads n (1, 9)
seg-top10-buffer-busy-waits Sum of the userprocesses under the top 10 buffer busy waits n (1, 9)
seg-top10-row-lock-waits Sum of the userprocesses under the top 10 row lock waits n (1, 9)
event-waits Waits/sec from system events n/sec (10,100)
event-waiting How many percent of the elapsed time has an event spend with waiting 0%..100% (0.1,0.5)
enqueue-contention Enqueue wait/request-Ratio 0%..100% (1, 10)
enqueue-waiting How many percent of the elapsed time since the last run has an Enqueue spend with waiting 0%..100% (0.00033,0.0033)
latch-contention Latch misses/gets-ratio. With –name a Latchname or Latchnumber can be passed over. (See list-latches) 0%..100% (1,2)
latch-waiting How many percent of the elapsed time since the last run has a Latch spend with waiting 0%..100% (0.1,1)
sysstat Changes/sec for any value from v$sysstat n/sec (10,10)
sql Result of any SQL-Statement that returns a number. The statement itself is passed over with the parameter –name. A Label for the performance data output can be passed over with the parameter –name2. n (1,5)
     
list-tablespaces Prints a list of tablespaces  
list-datafiles Prints a list of datafiles  
list-latches Prints a list with latchnames and latchnumbers  
list-enqueues Prints a list with the Enqueue-Names  
list-events Prints a list with the events from (v$system_event). Besides event_number/event_id a shortened form of the eventname is printed out. This could be use as Nagios service descriptions. Example: lo_fi_sw_co = log file switch completion  
list-background-events Prints a list with the Background-Events  
list-sysstats Prints a list with system-wide statistics  

 

Measurements that are dependent on a time interval can be execute differently. To calculate the end result the following is needed: start value, end value and the passed time between this two values. Without further options the inital value will be the value from the last plugin run. The passed time is normally the time of normal_check_interval of the according service.

If the increase per second shouldn’t be decisive for the check result, but the difference between two measured values, than use the option –absolute. This is useful for Rollback Segment Wraps which happen very rare so that their rate is nearly 0/sec. Nevertheless you want to be alarmed if the number od this events grows.
The threshold values should be choosen in a way that they can be reached during a retry_check_interval. If not the service will change into the OK-State after each SOFT;1.

Pleae note, that the thresholds must be specified according to the Nagios plug-in development Guidelines.
"10" means "Alarm, if > 10" and
"90:" means "Alarm, if < 90"

Preparation of the database

In order to be able to collect the needed information from the database a database user with specific privileges is required:

CREATE user nagios IDENTIFIED BY oradbmon; 
GRANT CREATE session TO nagios;
GRANT SELECT any dictionary TO nagios;
GRANT SELECT ON V_$SYSSTAT TO nagios;
GRANT SELECT ON V_$INSTANCE TO nagios;
GRANT SELECT ON V_$LOG TO nagios;
GRANT SELECT ON SYS.DBA_DATA_FILES TO nagios;
GRANT SELECT ON SYS.DBA_FREE_SPACE TO nagios;
--
-- if somebody still uses Oracle 8.1.7...
GRANT SELECT ON sys.dba_tablespaces TO nagios;
GRANT SELECT ON dba_temp_files TO nagios;
GRANT SELECT ON sys.v_$Temp_extent_pool TO nagios;
GRANT SELECT ON sys.v_$TEMP_SPACE_HEADER  TO nagios;
GRANT SELECT ON sys.v_$session TO nagios;

 

Examples

 

nagios$ check_oracle_health --connect bba --mode tnsping
OK - connection established to bba.
 
nagios$ check_oracle_health --mode connection-time
OK - 0.17 seconds to connect  |
  connection_time=0.1740;1;5
 
nagios$ check_oracle_health --mode sga-data-buffer-hit-ratio
CRITICAL - SGA data buffer hit ratio 0.99%  |
  sga_data_buffer_hit_ratio=0.99%;98:;95:
 
nagios$ check_oracle_health --mode sga-library-cache-hit-ratio
OK - SGA library cache hit ratio 98.75%  |
  sga_library_cache_hit_ratio=98.75%;98:;95:
 
nagios$ check_oracle_health --mode sga-latches-hit-ratio
OK - SGA latches hit ratio 100.00%  |
  sga_latches_hit_ratio=100.00%;98:;95:
 
nagios$ check_oracle_health --mode sga-shared-pool-reloads
OK - SGA shared pool reloads 0.28%  |
  sga_shared_pool_reloads=0.28%;1;10
 
nagios$ check_oracle_health --mode sga-shared-pool-free
WARNING - SGA shared pool free 8.91%  |
  sga_shared_pool_free=8.91%;10:;5:
 
nagios$ check_oracle_health --mode pga-in-memory-sort-ratio
OK - PGA in-memory sort ratio 100.00%  |
  pga_in_memory_sort_ratio=100.00;99:;90:
 
nagios$ check_oracle_health --mode invalid-objects
OK - no invalid objects found  |
  invalid_ind_partitions=0 invalid_indexes=0
  invalid_objects=0 unrecoverable_datafiles=0
 
nagios$ check_oracle_health --mode switch-interval
OK - Last redo log file switch interval was 18 minutes |
    redo_log_file_switch_interval=1090s;600:;60:
 
nagios$ check_oracle_health --mode switch-interval --connect rac1
OK - Last redo log file switch interval was 32 minutes (thread 1)|
    redo_log_file_switch_interval=1938s;600:;60:
 
nagios$ check_oracle_health --mode tablespace-usage
CRITICAL - tbs SYSTEM usage is 99.33%
tbs SYSAUX usage is 93.73%
tbs USERS usage is 8.75%
tbs UNDOTBS1 usage is 6.65% | 'tbs_users_usage_pct'=8%;90;98
'tbs_users_usage'=0MB;4;4;0;5
'tbs_undotbs1_usage_pct'=6%;90;98
'tbs_undotbs1_usage'=11MB;153;166;0;170
'tbs_system_usage_pct'=99%;90;98
'tbs_system_usage'=695MB;630;686;0;700
'tbs_sysaux_usage_pct'=93%;90;98
'tbs_sysaux_usage'=802MB;770;839;0;856
 
nagios$ check_oracle_health --mode tablespace-usage 
    --tablespace USERS
OK - tbs USERS usage is 8.75% |
  'tbs_users_usage_pct'=8%;90;98
  'tbs_users_usage'=0MB;4;4;0;5
 
nagios$ check_oracle_health --mode tablespace-usage 
    --name USERS
OK - tbs USERS usage is 8.75% |
  'tbs_users_usage_pct'=8%;90;98
  'tbs_users_usage'=0MB;4;4;0;5
 
nagios$ check_oracle_health --mode tablespace-free 
    --name TEST
OK - tbs TEST has 97.91% free space left |
    'tbs_test_free_pct'=97.91%;5:;2:
    'tbs_test_free'=32083MB;1638.40:;655.36:;0.00;32767.98
 
nagios$ check_oracle_health --mode tablespace-free 
    --name TEST --units MB --warning 100: --critical 50:
OK - tbs TEST has 32083.61MB free space left |
    'tbs_test_free_pct'=97.91%;0.31:;0.15:
    'tbs_test_free'=32083.61MB;100.00:;50.00:;0;32767.98
 
nagios$ check_oracle_health --mode tablespace-free 
    --name TEST --warning 10: --critical 5:
OK - tbs TEST has 97.91% free space left |
    'tbs_test_free_pct'=97.91%;10:;5:
    'tbs_test_free'=32083MB;3276.80:;1638.40:;0.00;32767.98
 
nagios$ check_oracle_health --mode tablespace-remaining-time 
    --tablespace ARUSERS --lookback 7
WARNING - tablespace ARUSERS will be full in 78 days |
  'tbs_arusers_days_until_full'=78;90:;30:
 
nagios$ check_oracle_health --mode datafile-io-traffic 
  --datafile users01.dbf
WARNING - users01.dbf: 1049.83 IO Operations per Second |
  'dbf_users01.dbf_io_total_per_sec'=1049.83;1000;5000
 
nagios$ check_oracle_health --mode latch-contention 
  --name 214
OK - SGA latch library cache (214) contention 0.08% |
 'latch_214_contention'=0.08%;1;2
 'latch_214_sleep_share'=0.00% 'latch_214_gets'=49995
 
nagios$ check_oracle_health --mode latch-contention 
  --name 'library cache'
OK - SGA latch library cache (214) contention 0.08% |
 'latch_214_contention'=0.08%;1;2
 'latch_214_sleep_share'=0.00% 'latch_214_gets'=49937
 
nagios$ check_oracle_health --mode enqueue-contention --name TC
CRITICAL - enqueue TC: 19.90% of the requests must wait |
 'TC_contention'=19.90%;1;10
 'TC_requests'=2015 'TC_waits'=401
 
nagios$ check_oracle_health --mode latch-contention 
  --name 'messages'
OK - SGA latch messages (17) contention 0.02% |
 'latch_17_contention'=0.02%;1;2 'latch_17_gets'=4867
 
nagios$ check_oracle_health --mode latch-waiting 
  --name 'user lock'
OK - SGA latch user lock (205) sleeping 0.000841% of the time |
 'latch_205_sleep_share'=0.000841%
 
nagios$ check_oracle_health --mode event-waits 
  --name 'log file sync'
OK - log file sync : 1.839511 waits/sec |
 'log file sync_waits_per_sec'=1.839511;10;100
 
nagios$ check_oracle_health --mode event-waiting 
  --name 'Log file parallel write'
OK - log file parallel write waits 0.045843% of the time |
rarr 'log file parallel write_percent_waited'=0.045843%;0.1;0.5
 
nagios$ check_oracle_health --mode sysstat 
  --name 'transaction rollbacks'
OK - 0.000003 transaction rollbacks/sec |
 'transaction rollbacks_per_sec'=0.000003;10;100
 'transaction rollbacks'=4
 
nagios$ check_oracle_health --mode sql 
  --name 'select count(*) from v$session' --name2 sessions
CRITICAL - sessions: 21 | 'sessions'=21;1;5
 
nagios$ check_oracle_health --mode sql 
  --name 'select 12 from dual' --name2 twelve --units MB
CRITICAL - twelfe: 12MB | 'twelfe'=12MB;1;5
 
nagios$ check_oracle_health --mode sql 
  --name 'select 200,300,1000 from dual' 
  --name2 'kaspar melchior balthasar' 
  --warning 180 --critical 500
WARNING - kaspar melchior balthasar: 200 300 1000 |
'kaspar'=200;180;500 'melchior'=300;; 'balthasar'=1000;;

 

Authentication

Example with –runas and an "external user"

There are to users in the database:

  • OPS$DBNAGIO IDENTIFIED EXTERNALLY
  • NAGIOS IDENTIFIED BY ‘DBMONI’

There are two unix users:

  • qqnagio with normal access.
  • dbnagio with /bin/false as login shell.

 

qqnagio$ check_oracle_health --mode=connection-time 
    --connect=nagios/dbmoni@BBA 
OK - 0.21 seconds to connect as NAGIOS 
 
dbnagio$ check_oracle_health --mode=connection-time 
    --connect=BBA --runas=dbnagio 
    --environment ORACLE_HOME=$ORACLE_HOME
OK - 0.17 seconds to connect as OPS$DBNAGIO

The background for this example is the following scenario with a SAP-Server:

Only local connections to the database are allowed. The database isn’t reachable over the network. Logging in with username and password is not possible.

Only database-users that are authenticated through the operating system (OPS$-User) are allowed to connect.

These users are not allowed to connect via SSH. (Therefore /bin/false).

Because the Nagios user qqnagio is allowed to connect via SSH, he can’t be used as database user. But the NRPE which executes the plugin will run under the qqnagios-account.

 

Use of environment variables

It is possible to omit –connect (and if not needed –user and –password) completely, if you provide the corresponding values in environment variables. Since Version 3.x it is possible to extend service definitions in Nagios through own attributes (custom object variables). These will appear during the exectution of the check command in the environment.

The environment variables are:

  • NAGIOS__SERVICEORACLE_SID (_oracle_sid in the service definition)
  • NAGIOS__SERVICEORACLE_USER (_oracle_user in the service definition)
  • NAGIOS__SERVICEORACLE_PASS (_oracle_pass in the service definition)

 

Installation

The installation of the perl-modules DBI and DBD::Oracle is required.

After unpacking the archive ./configure is called. With ./configure –help some options can be printed which show some default values for compiling the plugin.

  • –prefix=BASEDIRECTORY Specify a directory in which check_oracle_health should be stored. (default: /usr/local/nagios)
  • –with-nagios-user=SOMEUSER This User will be the owner of the check_oracle_health file. (default: nagios)
  • –with-nagios-group=SOMEGROUP The group of the check_oracle_health plugin. (default: nagios)
  • –with-perl=PATHTOPERL Specify the path to the perl interpreter you wish to use. (default: perl in PATH)

 

Download

check_oracle_health-1.6.4.tar.gz

check_oracle_health-1.6.4.shar.gz

Some versions of tar are having problems with the long filesnames. In this case please unpack the shar-Paket with cat check_oracle_health-xxx.shar.gz | gzip -d | sh

 

Changelog

  • 2010-06-10 1.6.4
    added checking of dba_registry to mode invalid-objects. Thanks Ovidiu Marcu
    speedup of tablespace-remaining-time. Thanks Steffen Poulsen
    switch-interval detects redo log timestamps in the future and reports critical
    method sqlplus now works with "(DESCRIPTION =(ADDRESS = (PROTOCOL = TCP"-like connectstrings
    new parameter –ident to show instance and database names in the output
    bugfix in tablespace-usage (temp tbs with multiple datafiles). Thanks Philipp Lemke
  • 2009-09-09 1.6.3
    tablespace-can-allocate-next was optimized.

    Illegal statefile-Names were fixed. Thanks Franky van Liedekerke.

    Bugfix in tablespace-usage under Oracle 8.1.x

    switch-interval now works more precise. Thanks Naquada.

    Paswords don’t show up in error messages any more. Thanks Jens Seiffert.

    Bugfix in mode sql. (Decimalvalues with .5 lead to errors). Thanks Shane Jordan.

    Bugfix in sga-latches-hitratio, Thresholds were ignored. Thanks Yannik Charton.

    The parameter –user is now –username (user still works)

  • 2009-04-05 1.6.2 Bugfix in tablespace-usage/free due to non-autoextensible TEMP-Tablespaces. (Thanks Daniel Graef)
  • 2009-03-27 1.6.1 –mode=tablespace-usage|free now recognizes offline tablespaces. (Thanks Daniel Graef)
  • 2009-03-11 1.6 Support for DBD::SQLRelay. Mode sql can print out multiple values (Thanks Juergen Lesny). Login as "sys" possible (Thanks Joerg Horchler). Bugfix when using warning/critical=0 (Thanks Danijel Tasov)
  • 2008-10-28 1.5.0.1 Bugfix due to , instead of . in decimal values. mode=sql output will be rounded to 2 places after the decimal point. Bugfix in mode=sga-shared-pool-free. (Thanks Birk Bohne)
  • 2008-10-15 1.5 New authentication methods password store and as sysdba. New mode tablespace-free. New parameter –units when using mode=sql and mode=tablespace-free. Mode switch-interval considers RAC (Thanks Harald Zahn).
  • 2008-09-19 1.4.2.1 New parameter –regexp supplemented –name. Bugfix in tablespace-usage (>100% when using resize datafile)
  • 2008-09-10 1.4.1 New mode tablespace-can-allocate-next, Handling from locked accounts, Timeout-Bugfix, Encode, expired Extents in UNDO-Tablespace are considered, Bugfix wg. mode=sql and Null-Values (Thanks Viktor Käfer), mode=top10* optimized.
  • 2008-07-09 1.4.0.1 Bugfixes#(–name=0, –method=sqlplus), –invalid-objects and –stale-statistics now consider thresholds (Thanks Konrad Barck)
  • 2008-07-03 1.4 Statesdir is now /var/tmp/check_oracle_health, Bugfixes in latch-contention, systats and roll-extends. Performance improvements.
  • 2008-07-01 1.3.1.1 Bugfix in method=sqlplus and os$user, Bugfix in tablespace-usage when using Temp-Tablespaces, better performancevalues for pga-in-memory-sort-ratio
  • 2008-06-26 1.3.1 Code cleanup, Bugfix in connected-users Thresholds
  • 2008-06-24 1.3 data-buffer/library/dictionary-cache-hitratio are now more precise, tablespace-usage considers autoextents, sqlplus, code cleaned up
  • 2008-06-20 1.2.7 bugfixes in top10-x and pga-in-memory-sort. New Mode sql. Unrecoverable datafiles removed from invalid-objects (will get his own mode later)
  • 2008-06-16 1.2.6.1 New modes sysstat list-sysstats
  • 2008-06-14 1.2.6 New modes event-waited event-waits list-events
  • 2008-06-11 1.2.5.1 internal changes
  • 2008-06-03 1.2.5 New modes latch-contention enqueue-contention enqueue-waiting connected-users list-latches list-enqueues
  • 2008-05-27 1.2.4.1 New modes list-tablespaces and list-datafiles (no Monitoringfunction)
  • 2008-05-27 1.2.4 New modes datafile-io-traffic and redo-io-traffic
  • 2008-05-25 1.2.3.1 stale-statistics now run under Oracle 9.x
  • 2008-05-25 1.2.3 New modes –roll-block-contention, –roll-hit-ratio, Bugfix in –switch-interval
  • 2008-05-23 1.2.2.1 Modes, that require Oracle 10.x are disabled with Oracle 9.x/8.x
  • 2008-05-21 1.2.2 Bugfix in –environment
  • 2008-05-19 1.2.1 sga-buffer-cache-hit-ratio now shows percent (thx Maik Ihde), new parameters –runas –environment, support for externally authenticated users, Bugfix in tablespace-remaining-time
  • 2008-05-06 1.2 connection timeout handling, stale-statistics
  • 2008-05-02 1.1 tablespace-remaining-time, tablespace-io-balance
  • 2008-04-16 1.0 first public version

 

Copyright

2008 Gerhard Laußer

Check_oracle_health is published under the GNU General Public License. GPL

 

Author

Gerhard Laußer (gerhard.lausser@consol.de) gladly answers questions to this plugin.

 

Translation

Thanks to Christian Lauf there is finally an english translation of this page :-)

123 Responses to “check_oracle_health”

  1. Marco Says:
    October 7th, 2009 at 12:58

    Hallo,

    ich teste gerade Ihr Tool. Ich bin sehr begeistert davon, denn es nimmt mir viel Abreit ab. Leider habe ich ein kleines Problem mit dem mode sql ./check_oracle_health –connect ‘(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=172.16.102.103)(PORT=1521))(CONNECT_DATA=(SID=XYZ)))’ –user nagios –password oradbmon09 –mode sql –name ‘select count(*) from v$session’ –name2 sessions –warning 100 –critical 150 ERgebnis: WARNING – sessions: 21 | ‘sessions’=21;20;30

    In Nagios eingebunden bekomme ich als Status Information nur OK – sessions: Hier müßte eigentlich ja auch die Warning kommen.

    Skript:

    define command{ command_name check_oracle_per_sql command_line $USER1$/check_oracle_health –connect ‘(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=$HOSTADDRESS$)(PORT=1521))(CONNECT_DATA=(SID=$ARG3$)))’ –user $ARG1$ –password $ARG2$ –mode sql –name ‘select count(*) from v\$session’ –name2 sessions –warning 5 –critical 10 }

    define service{ use local-service host_name test2_ch service_description Count Open Sessions2 check_command check_oracle_per_sql!nagios!abc!xyz }

    Performance Daten werden auch nicht geschrieben.

    Der Mode Tablespace funktioniert bei mir.

    Gruß Marco

    [Reply]

  2. lausser Says:
    October 8th, 2009 at 10:24

    Hallo, das liegt daran, dass Nagios empfindlich auf das Dollarzeichen in seinen Konfigdateien reagiert. Mit –name ‘select count(*) from v$$session’ in der Command-Definition sollte es funktionieren. Eine Alternative wäre, SQL-Statements, die Sonderzeichen beinhalten, vorher zu encodieren. Das geht zwar auf Kosten der Lesbarkeit, dafür muss man sich aber keine Gedanken mehr bzgl. einfache/doppelte Hochkommata, Escapen etc. machen.

    echo 'select count(*) from v$session' | check_oracle_health --mode encode select%20count%28%2A%29%20from%20v%24session

    command_line $USER1$/check_oracle_health ..... --name select%20count%28%2A%29%20from%20v%24session ....

    Gerhard

    [Reply]

  3. Marco Says:
    October 9th, 2009 at 8:50

    Wunderbar es funktionert.

    Eine Frage habe ich noch. Kann es sein, dass für einige Abfragen keine Performancedaten geschrieben werden? z.B. für sga-shared-pool-reloads ?

    Marco

    [Reply]

    lausser Reply:

    Das kann ich nicht nachvollziehen. $ check_oracle_health --user nagios --password $ORAPW --connect NAPRAX --mode sga-shared-pool-reloads OK - SGA shared pool reload ratio 0.85% | sga_shared_pool_reload_ratio=0.85%;1;10 $ check_oracle_health -V check_oracle_health (1.6.3) Verwendest du die neueste Version? Gerhard

    [Reply]

  4. Marco Says:
    October 9th, 2009 at 9:29

    Wenn ich keine Schwellwerte mitgebe, dann schreibt er nichts. Gebe ich welche an, dann werden auch Performancedaten geschrieben.

    [Reply]

  5. Marco Says:
    October 9th, 2009 at 12:25

    Ich habe die gleiche Version. Bei mir war es so, dass er keine Performance-Daten nach “/usr/local/nagios/share/perfdata” geschrieben hat. Nachdem ich die Schwellwerte angegeben habe, ging es. Lag vielleich an mir.

    Eine Frage habe ich nun doch schon wieder: Ich möchte mir “sga-data-buffer-hit-ratio” ausgeben lassen. Ergebnis: CRITICAL – SGA data buffer hit ratio 43.73% | sga_data_buffer_hit_ratio=43.73%;98:;95: Unter SQLPLUS kommt folgendes raus: SELECT ROUND((1-(phy.value / (cur.value + con.value)))*100,2) “Cache Hit Ratio” FROM v$sysstat cur, v$sysstat con, v$sysstat phy WHERE cur.name = ‘db block gets’ AND con.name = ‘consistent gets’ AND phy.name = ‘physical reads’;

    Cache Hit Ratio

          86.36
    

    Interpretier ich etwas falsch?

    [Reply]

    lausser Reply:

    “db block gets” und die anderen Werte werden stur hochgezählt. Mit deinem SQL-Statement errechnest du die Hitratio über die gesamte Laufzeit der Instanz. Der Wert wird irgendwann sehr ungenau bzw. ändert sich nur sehr langsam. Bei check_oracle_health werden die Deltas dieser Zähler (zwischen dem aktuellen und dem letzten Lauf des Plugins) zur Berechnung verwendet. So bekommst du immer einen aktuellen Wert. (Den Mittelwert im check_interval).

    [Reply]

  6. Marco Says:
    October 15th, 2009 at 15:04

    Ich habe schon wieder eine Frage: Kann ich mit “tablespace-usage” auch einige Tablespace excluden? Ich möchte alle TBS’s außer z.B. sysaux, system überwachen. Geht das?

    [Reply]

    lausser Reply:

    Der Parameter –regexp sorgt dafür, dass der Parameter –name (mit dem man normalerweise einen Tablespace gezielt abfragt) als regulärer Ausdruck interpretiert wird. Wenn du also –name so formulierst, dass der Pattern alle Namen matcht ausser SYSTEM und SYSAUX, dann werden die beiden nicht angezeigt.

    … –name=’^(?!(SYSTEM|SYSAUX))’ –regexp

    Gerhard

    [Reply]

  7. Manuel Says:
    October 18th, 2009 at 3:22

    HAllo aus Spanien , Wo kann ich check_oracle_health download ?’

    DAnke

    [Reply]

    lausser Reply:

    Ungefähr 80cm nach oben scrollen bis zur Überschrift “Download”.

    [Reply]

  8. guzik Says:
    October 28th, 2009 at 11:35

    Hi, I’ve got a problem with check_oracle_health plugin. Status information in my Nagios: **ePN /usr/lib64/nagios/plugins/check_oracle_health: “Can’t exec “/usr/sbin/p1.pl”: Permission denied at (eval 1) line 5254,”. Few of check working fine, rest has got a problem. From console there is no problem to execute script. What can I do to correct check services?

    [Reply]

    lausser Reply:

    Hi, please add one line in the first 10 lines of the plugin:

    # nagios: -epn
    This prevents Nagios from executing check_oracle_health with the embedded Perl interpreter. I will add this to the next release of the plugin as the default. Gerhard

    [Reply]

  9. Steffen Poulsen Says:
    November 9th, 2009 at 17:29

    I tried to run check_oracle_health at an install using perl, version 5.005_03 – and it barked at me:

    Use of reserved word “our” is deprecated at check_oracle_health.pl line 9. Bareword “our” not allowed while “strict subs” in use at check_oracle_health.pl line 9. Unquoted string “our” may clash with future reserved word at check_oracle_health.pl line 9. Array found where operator expected at check_oracle_health.pl line 9, at end of line (Do you need to predeclare our?) syntax error at check_oracle_health.pl line 9, near “our @ISA ” Global symbol “@ISA” requires explicit package name at check_oracle_health.pl line 9. BEGIN not safe after errors–compilation aborted at check_oracle_health.pl line 80.

    We are very fond of your plugin and would like to use it at this install also – is there per incidence a drag-and-drop replacement for the “our @ISA”-construct that would allow the check to run at this old install also?

    Best regards, Steffen Poulsen

    [Reply]

    lausser Reply:

    I think, this would require a major rewrite of the plugin. Can’t you run it on the Nagios server and check the database with a remote connection? Gerhard

    [Reply]

  10. SweetBiene91 Says:
    November 11th, 2009 at 0:57

    hey ho bin so einsam jemand lust zu chattn oder so

    [Reply]

    lausser Reply:

    Versuch’s mal hiermit: irc server : irc.irclink.net port : 6667 channel : #nagios

    [Reply]

  11. Manfred Says:
    November 12th, 2009 at 18:17

    Gibt es eine Option (z.B. quiet) welche nur die Werte ausgeben läßt, welche ein warning oder critical ausgeben? Bei über 30 Tablespaces (bei –mode=tablespace-free) ist es fast unmöglich den zu finden, welcher das Warning ausgelöst hat. Ausserdem wird die Ausgabe in Nagios sehr unübersichtlich und viel zu lange. Ich habe schon im Source versucht, ein “if” einzubauen um die Ausgabe zu unterdrücken, bin damit aber gescheitert – da dann die Warnings selbst ausbleiben. Z.B. der orignal Nagios check der Filesysteme gibt auch nur die aus, welche warning oder critical sind.

    [Reply]

    lausser Reply:

    Ich würde in dem Fall empfehlen, check_multi zu verwenden. Das hat ausserdem den Vorteil, dass Schwellwerte im check_multi-Konfigfile geändert werden können, ohne dass man Nagios neu starten muss. Wenn man die Tablespacenamen als Label angibt, so erhält man eine knappe Ausgabe 30 plugins checked, 1 critical (TBS_1), 1 warning (TBS_25), 0 unknown, 28 ok

    [Reply]

  12. fsom Says:
    November 20th, 2009 at 15:54

    Tolles Script! Funktioniert soweit alles, nur bei mode=sql komme ich nicht weiter (v1.6.3): ./check_oracle_health –connect=DB –user=xxxxxx –password=yyyyyy –mode=sql –name=”select count(*) from v$session where status = ‘ACTIVE’”

    Use of uninitialized value in numeric gt (>) at /usr/lib/nagios/plugins/check_oracle_health line 3615. Use of uninitialized value in numeric gt (>) at /usr/lib/nagios/plugins/check_oracle_health line 3616. OK – select count(*) from v where status = ‘active’:

    ich bekomme nichts von dem SQL Befehl zurück. Mache ich etwas falsch ? danke, fsom

    [Reply]

    lausser Reply:

    Du musst das Dollarzeichen entwerten. Dein Statement: from v$session where… Ausgabe: from v where… Für die Shell sieht $session wie eine Variable aus und da diese nicht existiert, macht sie einen Leerstring draus. Schreib stattdessen …from v\$session…. Wenn das SQL-Statement komplizierter ist und viele solcher Sonderzeichen enthält, kann man es auch encodieren. Dazu rufst du check_oracle_health mit dem Parameter “–mode encode” auf. Es liest dann von der Standardeingabe. Du tippst dein Statement (ohne auf Entwertung von Sonderzeichen achten zu müssen und schliesst es mit RETURN ab.

    $ check_oracle_health --mode encode
    select count(*) from v$session where status = 'ACTIVE'
    select%20count%28%2A%29%20from%20v%24session%20where%20status%20%3D%20%27ACTIVE%27
    
    Als Ausgabe erhältst du das Statement in encodierter Form, das du nun bei –name angeben kannst, ohne auf Dollar- oder irgendwelche Anführungszeichen achten zu müssen.

    [Reply]

  13. Bas de Klerk Says:
    November 28th, 2009 at 17:55

    Hi,

    thx for your greate plugin. Saves me a lot of time!

    One small problem I’m having in version 1.6.3 is that the sga-data-bufer-hit-ratio sometimes drops to 0%… no clue why but sometimes it does. If I calculate it by hand using statement below the values are fine. If you need any add. info please let me know. For now I’ve made a workaround using mod=sql

    Regards Bas

    SELECT ((P1.value + P2.value – P3.value) / (P1.value + P2.value))*100 ratio FROM v$sysstat P1, v$sysstat P2, v$sysstat P3 WHERE P1.name = ‘db block gets’ AND P2.name = ‘consistent gets’ AND P3.name = ‘physical reads’;

    [Reply]

  14. lausser Says:
    November 28th, 2009 at 20:48

    I use the deltas (the difference to the counter value when check_oracle_health was run last time) for the calculation. E.g. the “physical reads” i use for the calculation is “value of physical reads now – value of physical reads approx. 5 minutes ago.” This way the hitrate reflects the current state of the buffer cache. In your formula you use the counters which increased since the database was started, so it’s an average hitrate over the whole lifetime. But isn’t it more interesting to get the current hitrate? When you get 0% sometimes, it actually means a hitrate of 0% (at least during the last check_interval).This is some kind of a “negative spike”. But i understand the problem. I will introduce a parameter “–lookback” which takes a number of minutes as argument. This way, you can for example measure the hitrate during the last 30 minutes, which is pretty up to date, but gives you much smoother results.

    [Reply]

  15. Andreas Says:
    December 12th, 2009 at 8:44

    Hallo, bei mir funktionieren nur run 50% der Abfragen: z.B. TNSPING, CON.-TIME, CON.-USERS, invalid-objects . Aber bei einigen Abfragen z.B. sga-data-buffer-hit-ratio erhalte ich in Nagios folgende Fehlermeldung: **ePN /usr/lib/nagios/plugins/check_oracle_health: printf() on closed filehandle STATE at (eval 1) line 3841,. “-epn” habe ich schon eingebaut. auf der Kommandozeile funktioniert die Abfrage aber. Danke!

    [Reply]

    lausser Reply:

    Kann es sein, dass du check_oracle_health auf der Kommandozeile als root ausgeführt hast? Das Plugin merkt sich nämlich Zwischenergebnisse im Verzeichnis /var/tmp/check_oracle_health, welches automatisch angelegt wird. Falls das Verzeichnis root gehört, kann ein check_oracle_health-Prozess, der unter der Nagios-Kennung läuft, da nicht mehr hineinschreiben. Die Fehlermeldung weist darauf hin. Ein “chown -R nagios:nagios /var/tmp/check_oracle_health” sollte das Problem lösen.

    [Reply]

    Andreas Reply:

    @lausser, Super! Genau daran hat’s gelegen. Vielen Dank!

    [Reply]

  16. Erlon Says:
    December 14th, 2009 at 14:41

    Where I find the download link?

    [Reply]

    lausser Reply:

    Scroll up until you see the topic “Download”

    [Reply]

  17. Erlon Says:
    December 14th, 2009 at 16:54

    But does not exist the Topic Download!

    [Reply]

  18. Erlon Says:
    December 14th, 2009 at 20:41

    Ok, I can see now. I didn’t see before because, i was seeing the page in english, and in english this link dont exists.

    [Reply]

  19. Don Seiler Says:
    January 8th, 2010 at 1:14

    Are there plans for ASM checks, such as disk group free space (v$asm_diskgroup.usable_file_mb)?

    [Reply]

    lausser Reply:

    No, i actually have no plans (mostly because i’m too occupied with other things). But if you look in the contribs subdirectory, you’ll find a description how you can extend check_oracle_health with your own custom modes. You simply put the code (mostly the sql stements) in a separate file which is sourced at runtime. Perhaps you want to play around with this and post the result. If it works, i will gladly add it to the core plugin.

    [Reply]

    Don Seiler Reply:

    @lausser, I’d love to do this if I have some time later. Thanks.

    [Reply]

  20. Millet JC Says:
    January 13th, 2010 at 11:29

    Hello All

    I’ve a small compilation error on a Solaris system. I’m not expert but think that it’s linked to my environment :

    ./configure work with success.

    make give me this error :

    Making all in plugins-scripts make: Fatal error: Don’t know how to make target Nagios/DBD/Oracle/Server/Instance/SGA/SharedPool/DictionaryCache.pm' Current working directory /tmp/check_oracle_health-1.6.3/plugins-scripts *** Error code 1 The following command caused the error: failcom='exit 1'; \ for f in x $MAKEFLAGS; do \ case $f in \ *=* | --[!k]*);; \ *k*) failcom='fail=yes';; \ esac; \ done; \ dot_seen=no; \ target=echo all-recursive | sed s/-recursive//; \ list='plugins-scripts t'; for subdir in $list; do \ echo "Making $target in $subdir"; \ if test "$subdir" = "."; then \ dot_seen=yes; \ local_target="$target-am"; \ else \ local_target="$target"; \ fi; \ (cd $subdir && make $local_target) \ || eval $failcom; \ done; \ if test "$dot_seen" = "no"; then \ make "$target-am" || exit 1; \ fi; test -z "$fail" make: Fatal error: Command failed for targetall-recursive’

    [Reply]

    Millet JC Reply:

    @Millet JC, Making all in plugins-scripts make: Fatal error: Don’t know how to make target Nagios/DBD/Oracle/Server/Instance/SGA/SharedPool/DictionaryCache.pm' Current working directory /tmp/check_oracle_health-1.6.3/plugins-scripts *** Error code 1 The following command caused the error: failcom='exit 1'; \ for f in x $MAKEFLAGS; do \ case $f in \ *=* | --[!k]*);; \ *k*) failcom='fail=yes';; \ esac; \ done; \ dot_seen=no; \ target=echo all-recursive | sed s/-recursive//; \ list='plugins-scripts t'; for subdir in $list; do \ echo "Making $target in $subdir"; \ if test "$subdir" = "."; then \ dot_seen=yes; \ local_target="$target-am"; \ else \ local_target="$target"; \ fi; \ (cd $subdir && make $local_target) \ || eval $failcom; \ done; \ if test "$dot_seen" = "no"; then \ make "$target-am" || exit 1; \ fi; test -z "$fail" make: Fatal error: Command failed for targetall-recursive’

    [Reply]

    lausser Reply:

    Looks like your tar-command does not support filenames which exceed 100 characters (i think SuSE has such a tar). Instead of the tar.gz please download the shar.gz and unpack it with [sourcecode]cat check_oracle_health-xxx.shar.gz | gzip -d | sh[/sourcecode]

    [Reply]

  21. Rascal Says:
    January 25th, 2010 at 20:44

    Hallo, ich bin kein Datenbänker, sondern nur “Überwacher”, daher meine Frage: Gibt es eine Möglichkeit den Datenbank-Connect durch das Plugin zu erhalten? Durch den ständigen Auf- und Abbau der Verbindung, schwellen die Logdateien auf der DB an? Oder muss da was an der Datenbank-Config gemacht werden?

    [Reply]

  22. lausser Says:
    January 26th, 2010 at 13:13

    Mit http://sqlrelay.sourceforge.net/ kann man einen Proxy laufen lassen, der die Verbindung aufrecht hält. Dadurch entfallen dann die Login-Meldungen in der Logdatei.

    check_oracle_health --method sqlrelay --connect <proxy-ip>:<proxy-port> --username <proxy-user> --password <proxy-password> ...

    [Reply]

  23. Frank Says:
    February 15th, 2010 at 18:15

    Hallo, auf Kommandozeile funktioniert die Abfrage als User nagios. Im Nagios selber kommt die Fehlermeldung: ePN failed to compile /usr/lib/nagios/plugins/check_oracle_health “Missing right curly or square bracket at (eval 18) line 4193, at end of line syntax error at (eval 18) line 4200, at EOF at /usr/lib/nagios/p1.pl line 155″

    Die Zeile “# nagios: -epn” steht im Skript bereits drin.

    Kann es daran liegen dass Nagios noch v.2.9 ist? Gibt es einen Weg das unter dieser Version zum laufen zu bringen?

    [Reply]

    lausser Reply:

    Die selektive Abschaltung mit -epn gibt es erst ab der Version 3. Leider, bleibt also nur ein Upgrade auf 3.x oder der komplette Verzicht auf ePN.

    [Reply]

  24. John Tomawski Says:
    February 16th, 2010 at 23:30

    Be sure to set the –environment flag when required. The flag can be used to set things such as TNS_ADMIN, etc.

    Hopefully this comment saves someone 2 hours… sigh

    ex. –environment TNS_ADMIN=’/usr/lib/oracle/bleh’

    [Reply]

  25. Aldo Says:
    February 19th, 2010 at 12:52

    when running the following command:

    ./check_oracle_health –connect REMOTE –username $ORAUSER –password $ORAPWD –mode tablespace-usage –tablespace USERS

    I get the following error message.

    Use of uninitialized value in split at /usr/lib/nagios/plugins/check_oracle_health line 3924. bumm Can’t call method “execute” on an undefined value at /usr/lib/nagios/plugins/check_oracle_health line 4230.

    Can’t use an undefined value as an ARRAY reference at /usr/lib/nagios/plugins/check_oracle_health line 4242.

    and now I’m clue less what todo? can you assist me on this one

    thanks in advance

    [Reply]

    lausser Reply:

    Did you give the necessary privileges to your ORAUSER?

    CREATE user nagios IDENTIFIED BY oradbmon; 
    GRANT CREATE session TO nagios;
    GRANT SELECT any dictionary TO nagios;
    GRANT SELECT ON V_$SYSSTAT TO nagios;
    GRANT SELECT ON V_$INSTANCE TO nagios;
    GRANT SELECT ON V_$LOG TO nagios;
    GRANT SELECT ON SYS.DBA_DATA_FILES TO nagios;
    GRANT SELECT ON SYS.DBA_FREE_SPACE TO nagios;
    You also can create an empty file /tmp/check_oracle_health.trace with the touch-command. As long as this file exists, check_oracle_health will write debugging messages into it. You should see the sql statements sent to the database server and the responses. Maybe this gives you an idea what’s wrong.

    [Reply]

    Aldo Reply:

    @lausser, Hi Lausser, it works fantastic now! This is a great plugin.

    [Reply]

  26. Hans-Jürgen Says:
    February 22nd, 2010 at 11:03

    Hallo,

    wir benutzen check_oracle_health seit längerem und sind sehr zufrieden damit. Vielen Dank dafür. Für die Tablespaces, bei denen auto-extent eingeschaltet ist, möchten wir die Überwachung von tablespace-usage auf tablespace-can-allocate-next ändern. Wird dabei sowohl überprüft, ob noch genügend Platz ist als auch ob MAX_EXTENT bereits erreicht ist?

    [Reply]

    lausser Reply:

    Hallo, max_extent wird meines Wissens nach nicht angeschaut. Wenn man mit touch /tmp/check_oracle_health.trace eine leere Datei anlegt (beschreibbar vom Nagios-User), dann werden dort die angesetzten SQL-Statements und deren Resultate reinprotokolliert.

    [Reply]

    Günter Reply:

    @lausser, wird es in Zukunft eine Möglichkeit geben bei tablespace-usage autoextent Tablespaces auszuschließen?

    [Reply]

    Günter Reply:

    @Günter, hat sich erledigt. Hab gerade im Trace File gesehen, dass Autoextent Tablespaces berücksichtigt werden, d.h. es wir die max. Größe verwendet.

    [Reply]

    lausser Reply:

    Du kannst auch bestimmte Tablespaces per regulärem Ausdruck ausschliessen:

    --name='^(?!(TABLESPACE1$)|(TABLESPACE2$)|(TABLESPACE3$))' --regexp
    bedeutet: alles, ausser TABLESPACE1,TABLESPACE2,TABLESPACE3

    [Reply]

  27. angry_admin Says:
    February 24th, 2010 at 13:49

    http://ideas.nagios.org/a/dtd/22035-3955

    [Reply]

  28. Rik Says:
    February 26th, 2010 at 16:21

    Thanks you Gerhard for an excellent plugin. Here is a tiny correction on the documentation on this page. –method accepts two arguments: dbi (not tns) or sqlplus. Or am I misinterpreting things?

    [Reply]

    lausser Reply:

    Thanks! “tns” was how i named it in a very early phase. Later it was replaced by the less misleading “dbi”.

    [Reply]

  29. Thomas Says:
    March 4th, 2010 at 11:57

    Hallo,

    hört sich ja alles sehr schön an. Ich würde es ja auch gerne mal ausprobieren, aber ich finde leider nirgends einen Download Link (auch nicht mittlerweile 1,20 m weiter oben). Habe ich etwas übersehen?

    Danke, Thomas

    [Reply]

    lausser Reply:

    du bist vermutlich auf der englischen Seite gelandet, die es nicht gibt (bei der allerdings die Kommentare angezeigt werden) Der Download-Link ist auf dieser Seite: http://labs.consol.de/lang/de/nagios/check_oracle_health/

    [Reply]

  30. Thomas Says:
    March 4th, 2010 at 17:27

    Hallo,

    habe leider Schwierigkeiten, den Oracle-Instant-Client zu installieren. Weiß vielleicht jemand eine Seite, die sich mit dem Thema beschäftigt?

    Vielen Dank, Thomas

    [Reply]

    Max Reply:

    @Thomas, Hallo Thomas, schaue mal hier, http://samushka.blogspot.com/2009/04/installing-oracle-sqlplus-in-ubuntu.html

    [Reply]

  31. Steffen Poulsen Says:
    March 25th, 2010 at 18:09

    When using –mode=tablespace-remaining-time we have the experience, that on some machines it is somewhat slow. I.e. on the machine below it takes more than 60 seconds to process 34 tablespaces.

    Apparantly the processing of each status-file takes two seconds to process at this particular machine (some trace output pasted below) – and as this machine has a new tablespace automaticaly added each week, this is not going to get any better by itself any time soon :-)

    We are aware that we could split the tablespace checking into separate checks and do each tablespace individually – but if you would happen to have an idea for making this mode run a bit faster, so that all tablespaces could be checked inside a timeframe of say 60 seconds, that would be a clear number 1? :-)

    Best regards, Steffen Poulsen

    $ uname -a SunOS 5.10 Generic_141414-07 sun4v sparc SUNW,SPARC-Enterprise-T5220

    ./check_oracle_health –mode=tablespace-remaining-time –lookback=15 –warning=10: –critical=2: …

    Thu Mar 25 15:10:52 2010: loaded 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:10:52 2010 Thu Mar 25 15:10:52 2010: trimmed to 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:10:52 2010 Thu Mar 25 15:10:54 2010: loaded 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:10:54 2010 Thu Mar 25 15:10:54 2010: trimmed to 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:10:54 2010 Thu Mar 25 15:10:56 2010: loaded 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:10:56 2010 Thu Mar 25 15:10:56 2010: trimmed to 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:10:56 2010 Thu Mar 25 15:10:58 2010: loaded 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:10:58 2010 Thu Mar 25 15:10:58 2010: trimmed to 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:10:58 2010 Thu Mar 25 15:11:00 2010: loaded 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:00 2010 Thu Mar 25 15:11:00 2010: trimmed to 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:00 2010 Thu Mar 25 15:11:02 2010: loaded 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:02 2010 Thu Mar 25 15:11:02 2010: trimmed to 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:02 2010 Thu Mar 25 15:11:04 2010: loaded 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:04 2010 Thu Mar 25 15:11:04 2010: trimmed to 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:04 2010 Thu Mar 25 15:11:06 2010: loaded 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:06 2010 Thu Mar 25 15:11:06 2010: trimmed to 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:06 2010 Thu Mar 25 15:11:08 2010: loaded 5822 data sets from Thu Feb 25 11:00:46 2010 – Thu Mar 25 15:11:08 2010 Thu Mar 25 15:11:08 2010: trimmed to 5822 data sets from Thu Feb 25 11:00:46 2010 – Thu Mar 25 15:11:08 2010 Thu Mar 25 15:11:09 2010: loaded 3806 data sets from Thu Mar 4 11:00:53 2010 – Thu Mar 25 15:11:09 2010 Thu Mar 25 15:11:09 2010: trimmed to 3806 data sets from Thu Mar 4 11:00:53 2010 – Thu Mar 25 15:11:09 2010 Thu Mar 25 15:11:10 2010: loaded 1790 data sets from Thu Mar 11 11:00:59 2010 – Thu Mar 25 15:11:10 2010 Thu Mar 25 15:11:10 2010: trimmed to 1790 data sets from Thu Mar 11 11:00:59 2010 – Thu Mar 25 15:11:10 2010 Thu Mar 25 15:11:10 2010: loaded 5 data sets from Mon Mar 22 14:41:08 2010 – Thu Mar 25 15:11:10 2010 Thu Mar 25 15:11:10 2010: trimmed to 5 data sets from Mon Mar 22 14:41:08 2010 – Thu Mar 25 15:11:10 2010 Thu Mar 25 15:11:10 2010: no historical data found Thu Mar 25 15:11:11 2010: loaded 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:11 2010 Thu Mar 25 15:11:11 2010: trimmed to 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:11 2010 Thu Mar 25 15:11:13 2010: loaded 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:13 2010 Thu Mar 25 15:11:13 2010: trimmed to 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:13 2010 Thu Mar 25 15:11:15 2010: loaded 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:15 2010 Thu Mar 25 15:11:15 2010: trimmed to 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:15 2010 Thu Mar 25 15:11:17 2010: loaded 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:17 2010 Thu Mar 25 15:11:17 2010: trimmed to 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:17 2010 Thu Mar 25 15:11:19 2010: loaded 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:19 2010 Thu Mar 25 15:11:19 2010: trimmed to 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:19 2010 Thu Mar 25 15:11:21 2010: loaded 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:21 2010 Thu Mar 25 15:11:21 2010: trimmed to 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:21 2010 Thu Mar 25 15:11:23 2010: loaded 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:23 2010 Thu Mar 25 15:11:23 2010: trimmed to 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:23 2010 Thu Mar 25 15:11:25 2010: loaded 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:25 2010 Thu Mar 25 15:11:25 2010: trimmed to 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:25 2010 Thu Mar 25 15:11:27 2010: loaded 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:27 2010 Thu Mar 25 15:11:27 2010: trimmed to 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:27 2010 Thu Mar 25 15:11:29 2010: loaded 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:29 2010 Thu Mar 25 15:11:29 2010: trimmed to 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:29 2010 Thu Mar 25 15:11:31 2010: loaded 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:31 2010 Thu Mar 25 15:11:31 2010: trimmed to 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:31 2010 Thu Mar 25 15:11:33 2010: loaded 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:33 2010 Thu Mar 25 15:11:33 2010: trimmed to 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:33 2010 Thu Mar 25 15:11:35 2010: loaded 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:35 2010 Thu Mar 25 15:11:35 2010: trimmed to 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:35 2010 Thu Mar 25 15:11:37 2010: loaded 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:37 2010 Thu Mar 25 15:11:37 2010: trimmed to 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:37 2010 Thu Mar 25 15:11:39 2010: loaded 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:39 2010 Thu Mar 25 15:11:39 2010: trimmed to 6419 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:39 2010 Thu Mar 25 15:11:41 2010: loaded 6418 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:41 2010 Thu Mar 25 15:11:41 2010: trimmed to 6418 data sets from Tue Feb 23 09:15:44 2010 – Thu Mar 25 15:11:41 2010 Thu Mar 25 15:11:42 2010: found 2027 usable data sets since Wed Mar 10 15:11:42 2010 Thu Mar 25 15:11:42 2010: found 2028 usable data sets since Wed Mar 10 15:11:42 2010 Thu Mar 25 15:11:42 2010: found 2028 usable data sets since Wed Mar 10 15:11:42 2010 Thu Mar 25 15:11:42 2010: found 2028 usable data sets since Wed Mar 10 15:11:42 2010 Thu Mar 25 15:11:42 2010: found 2028 usable data sets since Wed Mar 10 15:11:42 2010 Thu Mar 25 15:11:42 2010: found 2028 usable data sets since Wed Mar 10 15:11:42 2010 Thu Mar 25 15:11:42 2010: found 2028 usable data sets since Wed Mar 10 15:11:42 2010 Thu Mar 25 15:11:42 2010: found 2028 usable data sets since Wed Mar 10 15:11:42 2010 Thu Mar 25 15:11:42 2010: found 2028 usable data sets since Wed Mar 10 15:11:42 2010 Thu Mar 25 15:11:42 2010: found 2028 usable data sets since Wed Mar 10 15:11:42 2010 Thu Mar 25 15:11:42 2010: found 2028 usable data sets since Wed Mar 10 15:11:42 2010 Thu Mar 25 15:11:43 2010: found 2028 usable data sets since Wed Mar 10 15:11:43 2010 Thu Mar 25 15:11:43 2010: found 2028 usable data sets since Wed Mar 10 15:11:43 2010 Thu Mar 25 15:11:43 2010: found 2028 usable data sets since Wed Mar 10 15:11:43 2010 Thu Mar 25 15:11:43 2010: found 2028 usable data sets since Wed Mar 10 15:11:43 2010 Thu Mar 25 15:11:43 2010: found 2028 usable data sets since Wed Mar 10 15:11:43 2010 Thu Mar 25 15:11:43 2010: found 1 usable data sets since Wed Mar 10 15:11:43 2010 Thu Mar 25 15:11:43 2010: found 6 usable data sets since Wed Mar 10 15:11:43 2010 Thu Mar 25 15:11:43 2010: found 1791 usable data sets since Wed Mar 10 15:11:43 2010 Thu Mar 25 15:11:43 2010: found 2028 usable data sets since Wed Mar 10 15:11:43 2010 Thu Mar 25 15:11:43 2010: found 2028 usable data sets since Wed Mar 10 15:11:43 2010 Thu Mar 25 15:11:43 2010: found 2028 usable data sets since Wed Mar 10 15:11:43 2010 Thu Mar 25 15:11:43 2010: found 2028 usable data sets since Wed Mar 10 15:11:43 2010 Thu Mar 25 15:11:43 2010: found 2028 usable data sets since Wed Mar 10 15:11:43 2010 Thu Mar 25 15:11:43 2010: found 2028 usable data sets since Wed Mar 10 15:11:43 2010 Thu Mar 25 15:11:43 2010: found 2028 usable data sets since Wed Mar 10 15:11:43 2010 Thu Mar 25 15:11:43 2010: found 2028 usable data sets since Wed Mar 10 15:11:43 2010 Thu Mar 25 15:11:43 2010: found 2028 usable data sets since Wed Mar 10 15:11:43 2010 Thu Mar 25 15:11:43 2010: found 2028 usable data sets since Wed Mar 10 15:11:43 2010 Thu Mar 25 15:11:44 2010: DESTROY DBD::Oracle::Server::Database::Tablespace with handle null null

    [Reply]

    lausser Reply:

    You’re right. 2 seconds is quite long. In /var/tmp/check_oracle_health you should find several files named tablespace-remaining-time_*

    Please mail me one of these files. I’ll have a look at it.

    [Reply]

    Steffen Poulsen Reply:

    Thank you very much for the patch you sent us, run time is down from 65 to 11 seconds at this particular host now :-)

    [Reply]

    lausser Reply:

    You’re welcome. If anybody stumbled upon the same problem….i’ll release a version with this patch soon.

    [Reply]

    Khadija Reply:

    @Steffen Poulsen, Can you plz let me know the patch that you suggest Steffen?

    Regards, Khadija

    [Reply]

    lausser Reply:

    I forgot to release this update. New version of check_oracle_health is coming asap.

    [Reply]

  32. Frank Says:
    April 14th, 2010 at 16:03

    Hello,

    We used this plugin (1.5) for some months now and everything worked fine, but since yesterday we receive the message “CRITICAL – connection could not be established within 60 seconds”. Nothing has changed on the plugin, nothing has changed on the network, nothing has changed on the machines nor on nagios/centreon?

    I don’t have a clue where to look to resolve this problem. Does it sound familiar to somebody?

    Regards, Frank

    [Reply]

    lausser Reply:

    can you connect with the sqlplus command? (executed on the Nagios server)

    [Reply]

  33. Steffen B Says:
    April 15th, 2010 at 9:41

    Hallo,

    erstmal großes Lob an euch, ein super Plugin was ihr dort kreiert habt. Wir nutzen es komplett zur Oracle Überwachung unserer Kundensysteme.

    Seit heute hab ich aber ein Problem wo ich nicht mehr weiter weiß. Situation ist folgende:

    1DB Server – darauf zwei Datenbanken mit jeweils einem Schema – beide mit dem gleichen DB Stand 10.2.0.4 und dem gleichen Schemanamen.

    Ich möchte mit dem Plugin die “Usage” des Tablespaces ermitteln. –mode=tablespace-usage Die Syntax auf der Kommandozeile ist die gleiche, es ändert sich nur der TNSAlias für die Datenbank. Und bei der einen DB funkioniert es ohne Probleme und bei der anderen DB zeigt er mir folgenden Fehler:

    Use of uninitialized value in split at /usr/local/nagios/libexec/check_oracle_health line 3924. bumm Can’t call method “execute” on an undefined value at /usr/local/nagios/libexec/check_oracle_health line 4230.

    Can’t use an undefined value as an ARRAY reference at /usr/local/nagios/libexec/check_oracle_health line 4242.

    Wie schon vorher hier empfohlen, hab ich das check_oracle_health.trace file angelegt und mir ist als einziges aufgefallen, dass ein anderes SQL Statement abgesetzt wird. Aber warum? Die Datenbanken sind gleich isntalliert und auf dem Selben Server, also kann es nicht mit der DB zu tun ahben oder mit dem Betriebssystem, oder?

    Wäre für jede Hilfe Dankbar.

    [Reply]

    lausser Reply:

    “..dass ein anderes SQL Statement abgesetzt wird.” Es wäre natürlich hilfreich, diese beiden unterschiedlichen Statements sehen zu können.

    [Reply]

  34. Frank Says:
    April 15th, 2010 at 10:10

    I think you’re right, it seems to be a problem with sqlplus.

    As root user I can connect with sqlplus, as Nagios user I cannot connect. I think we have to find out what is changed there…

    [Reply]

  35. Frank Says:
    April 15th, 2010 at 13:20

    We had to relocate the nagios server today, so we had to restart the server. Problem is now solved.

    [Reply]

  36. Hans Wolters Says:
    April 23rd, 2010 at 14:40

    Dear all,

    Great Plugin. Started to configure it this week and currently for some databases I already have nearly all of the checks possible with the default options.

    One question remains for me. If I have overlooked this in the documentation (yes, I can read German) the please let me know.

    Situation:

    We have several machines with more then one database per service id. Would it be possible to return the SID and database name with the return string of nagios (given by the plugin written in perl). This will enable me to use short service descriptions on those machines and setup service entries with multiple databases/sids on one machine. Maybe even with a parameter so people using only one database can skip the options.

    I could hack it into the source my self but I can imagine I am not the only one who would like that feature.

    Freundliche Grusse,

    Hans Wolters

    [Reply]

    lausser Reply:

    Hi, i’ll have a look at it.

    [Reply]

  37. Geoff Sears Says:
    April 30th, 2010 at 23:53

    Hi. I’m having trouble making a connection as sysdba, though I understand this should be possible.

    Would you post an example of how to make it work?

    Thanks,

    -geoff

    [Reply]

    lausser Reply:

    check_oracle_health –connect sysdba@ …

    [Reply]

    Geoff Sears Reply:

    @lausser,

    That’s what I can not get to work. Works fine with connect=host:port//service or connect=//host:port/service

    But, If I use:

    connect=sysdba@host:port/service or connect=sysdba@//host:port/service

    results in ORA-12154: TNS:could not resolve the connect identifier specified (DBD ERROR: OCIServerAttach)

    I believe that getting a sysdba connection with DBI/DBD::Oracle requires setting a connection attribute ora_session_mode => ORA_SYSDBA ; just passing that string “sysdba@host:port/service” as the data source won’t do it.

    [Reply]

    Geoff Sears Reply:

    @Geoff Sears,

    ok, I finally sat down and read through the code: sysdba@… is only supported for sqlplus connections. I hacked it so tns connections are possible.

    [Reply]

    lausser Reply:

    Which version did you use? I looked into the source (in my git repository) and found (in the tns section)

          my $connecthash = { RaiseError => 0, AutoCommit => 0, PrintError => 0 };
          if ($self->{username} eq "sys" || $self->{username} eq "sysdba") {
            $connecthash = { RaiseError => 0, AutoCommit => 0, PrintError => 0,
                  #ora_session_mode => DBD::Oracle::ORA_SYSDBA
                  ora_session_mode => 0x0002  };
            $dsn = sprintf "DBI:Oracle:";
          }
    Isn’t that correct? How does your changes look like?

    [Reply]

  38. Thomas Says:
    May 6th, 2010 at 9:09

    I am running nagios 1.2 and the service i have created with check_oracle_health won’t start. The service still remains on pending. When i run the check on the servers shell it works perfectly. Might that be a problem with nagios 1.x? Should I update to nagios 2 or 3?

    [Reply]

    lausser Reply:

    I don’t think this has to do with the Nagios version. Can’t you force scheduling of the service through the service detail page? Upgrading to 3.x is a good idea anyway.

    [Reply]

  39. Björn Says:
    May 17th, 2010 at 11:54

    Hello,

    for some databases we are using your health check. One of the installations is using a Dataguard Environment. When we configure checks for the standby, we get a critical error, as no connection is allowed (“ORA-01033: ORACLE initialization or shutdown in progress”).

    For our other oracle monitors we excluded the ORA-01033 and give an OK-State with a comment (“OK – Login Denied, the DB is in Standby Mode – this Check only works for Primary DB’s “).

    Could you implement an exeption handling for the ORA-01033 to allow the same Nagios config for Primary and Standby Database?

    [Reply]

    lausser Reply:

    ORA-01033 can be a sign of serious problems, for example when a corrupted database was restartet (ORA-10567 et al can be found in the alertlog), hence ignoring this error message is not an option.

    [Reply]

    Michael Reply:

    @lausser, Oracle Dataguard tnsping is not working anymore. getting the same error initialization or shutdown in progress. In Version 1.6.2 teh mode tnsping was working for closed Databases.

    [Reply]

    lausser Reply:

    How do you call the plugin and what’s the error message (in 1.6.2 and 1.6.4)?

    [Reply]

  40. Antonio Romero Says:
    May 25th, 2010 at 17:07

    Hi,

    I have installed the check_oracle_health on my nagios system in order to monitor several Oracle DB’s. All works fine, except one thing. When I ask to the DB for the space used by the tablespace the info that the plugin returns is diferent from the info that I can get by a Oracle query in the Oracle Manager. Can you give me some help about this issue?

    Thank you in advance for your help!

    Toni.

    [Reply]

    lausser Reply:

    Oracle tools usually set two values into relation: used space and allocated space. Now if you use autoallocation, the latter value may grow. When used:allocated is near 100%, autoallocation happens, allocated space suddenly grows and used:alloc percentage drops. This means, you could get an alert from nagios because the crit.threshold has been reached. Then, after the autoallocation, the usage drops below the threshold again. False alert. That’s why check_oracle_health calculates the usage percentage from used:max_allocatable

    [Reply]

  41. Thomas Says:
    May 26th, 2010 at 11:32

    Hallo,

    zunächst mal danke für das hilfreiche Plugin. Ich habe allerdings noch Probleme es zur Zusammenarbeit mit Nagios (3.2.0) zu überreden. Ich betreibe den Nagiosserver auf Ubuntu und habe den Oracle Instantclient installiert. Das Plugin funktioniert von der Konsole, als User nagios gestartet, ohne Probleme. Als service in Nagios mit folgendem command:

    command_line $USER1$/check_oracle_health –connect rebmasc.world –user dbo –password xxx –mode tnsping

    bekomme ich immer die Fehlermeldung:

    cannot connect to rebmasc.world. ORA-12154: TNS:could not resolve the connect identifier specified (DBD ERROR: OCIServerAttach)

    Die Variablen ORACLE_HOME, TNS_ADMIN usw. habe ich in der bash.bashrc für alle korrekt gesetzt und die DB kommt auch in der dort vorhandenen tnsnames.ora vor. Wie gesagt in der Konsole der Maschine ohne Probleme.

    Ich habe schon diverse alternative command probiert (–environment; –method), allerdings ohne Erfolg. Ich kann keinen Fehler finden. Was mache ich falsch?

    [Reply]

    lausser Reply:

    Die Environmentvariablen müssen im init-Script von Nagios gesetzt werden. Dateien wie .bashrc werden beim Systemstart nicht gelesen.

    [Reply]

    Thomas Reply:

    @lausser, Diesen Hinweis habe ich dann gestern auch im Nagiosforum gefunden. Wenn ich die Variablen in /etc/init.d/nagios setze klappt alles. Danke für den Hinweis.

    [Reply]

  42. Antonio Romero Says:
    May 28th, 2010 at 21:56

    Please Lausser, Can you answer my post above?, number 40.

    Thanks!

    [Reply]

  43. Dennis Says:
    June 8th, 2010 at 16:07

    Hallo, gibt es eine Möglichkeit die Flash Recovery Area zu überwachen? Bzgl. Füllstand.

    Gruß, Dennis

    [Reply]

    lausser Reply:

    Nein, das ist nicht eingebaut. Aber vielleicht wäre sowas hilfreich:

    --mode sql --name 'select max(percent_space_used) from v$flash_recovery_area_usage' --warning 80 --critical 90

    [Reply]

  44. Hamza Says:
    June 17th, 2010 at 12:15

    Hi there

    I love your check oracle plugin, it does everything I want.

    Is there any way to specify multiple DB names using some sort of delimiter.

    I currently have a setup as such.

    • In .profile I have export NAGIOS__SERVICEORACLE_SID=/usr/lib/oracle/11.2/client/network/admin/tnsnames.sh

    which basically does a cat of the tnsnames.ora and pulls out all the sids for me.

    What I would like to do is be able to run the check_oracle_health in this way

    check_oracle_health –connect $NAGIOS__SERVICEORACLE_SID:$NAGIOS__SERVICEORACLE_SID –username nagios –password nagios –mode tnsping

    where the : is any delimiter to which can specify multiple DB names.

    Please help.

    Thank you Hamza Maal

    [Reply]

    lausser Reply:

    That’s not possible. You can only check databases one at a time. If you want multiple checks inside one single service, you might want to give check_multi a try. http://www.my-plugin.de/wiki/projects/check_multi/discussion

    [Reply]

  45. Tim Says:
    June 17th, 2010 at 20:27

    I have an odd problem with this plugin. It works fine, but Nagios reports any response as a warning. From the command line, I’ll get:

    OK – 0.22 seconds to connect as MONITOR | connection_time=0.2193;3;8

    But Nagios shows the service in yellow and the log has:

    SERVICE ALERT: myhost;Oracle mySID Connect;WARNING;HARD;3;OK – 0.14 seconds to connect as MONITOR

    Why is it showing as an alert when the connect time is within the correct range?

    [Reply]

    lausser Reply:

    Strange… What about the thresholds? From your command line example i see you set –warning 3 –critical 8 (without these extra parameters it would be 1 and 5 by default) Did you set thresholds also in the service/command definition? When you get such a WARNING, please click on “Service Details” and look at the performance data. Which thresholds do you see there?

    [Reply]

    Tim Reply:

    @lausser,

    I think I added the warning/critical params just in case that might affect the display. The performance data looks like this:

    Current Status: WARNING (for 3d 22h 57m 46s) Status Information: OK – 0.23 seconds to connect as MONITOR Performance Data: connection_time=0.2339;3;8 Current Attempt: 3/3 (HARD state) Last Check Time: 06-21-2010 13:06:33 Check Type: ACTIVE

    Interestingly, I also set this up in Icinga and it does the same thing.

    [Reply]

    lausser Reply:

    Very strange…the last lines of the plugin are:

    printf "%s - %s", $ERRORCODES{$nagios_level}, $nagios_message;
    printf " | %s", $perfdata if $perfdata;
    printf "&#92;n";
    exit $nagios_level;
    so if $ERRORCODES{$nagios_level} is “OK” (which is in the output), then the exit code $nagios_level must be 0. Can you reset the service to OK with “submit passive checkresult”? Did you see a warning from the first moment when you configured this service? Or has it been OK before?

    [Reply]

    Tim Reply:

    @lausser,

    I can send it an OK passive result and it will switch to “OK”, but usually changes right back to a yellow warning.

    I’ve tried enabling and disabling passive checks, event handling, but to no effect.

    One thing I did notice is that it almost always shows:

    Current Attempt: 3/3 (HARD state)

    As if maybe it didn’t pass the first 2 checks. Running from the command line I can submit it repeatedly and I get OK results each time. It’s an odd thing. After all of this testing, I think the script works fine, it appears to be more of a Nagios problem.

    [Reply]

    lausser Reply:

    Just to be absolute sure, you can add an extra line at the end of the plugin:

    printf "%s - %s", $ERRORCODES{$nagios_level}, $nagios_message;
    printf " | %s", $perfdata if $perfdata;
    printf "\n";
    printf "i will definitively exit with %d\n", $nagios_level;
    exit $nagios_level;

    The level surely won’t change between the printf and the exit.

  46. IT-COW | Icinga: Oracle-Datenbanken abfragen Says:
    June 19th, 2010 at 8:53

    [...] Es gibt ein PlugIn für Icinga/Nagios, das es erlaubt den Status von Oracle-Datenbanken übers Netzwerk abzufragen. Das Tool nennt sich oracle_check_health und ist wie check_logfiles von Herrn Lausser von der Firma ConSol entwickelt worden – dies ist die Homepage des Projekts: Link. [...]

  47. Hamza Says:
    June 24th, 2010 at 18:00

    Hi

    I seem to be having some trouble setting the warning and critical thresholds for checking tablespace free.

    Could you please advise on the correct syntax for

    check_oracle_health -t 480 –connect db1 –username nagios –password nagios –mode tablespace-free –warning 85 –critical 90

    Please help.

    [Reply]

    lausser Reply:

    I assume you want a warning if less than 15% are free and a critical if less than 10% are free. Please use ‘:’ which is the correct syntax for ‘less than’-thresholds.

    --mode tablespace-free --warning 15: --critical 10:

    [Reply]

  48. Rija Says:
    July 2nd, 2010 at 15:55

    Hello, I have problem when I execute line command using tablespace-io-balance to check datafiles under all tablespaces. The output is CRITICAL – unable to aquire tablespace info. Can You help me please?

    [Reply]

  49. Rija Says:
    July 2nd, 2010 at 15:56

    Hello, I have problem when I execute line command using tablespace-io-balance to check datafiles under all tablespaces. The output is CRITICAL – unable to aquire tablespace info. Could You help me please?

    [Reply]

    lausser Reply:

    Do you see this message only with mode tablespace-io-balance? What about –mode list-tablespaces ?

    Maybe you forgot to set the right privileges?

    CREATE user nagios IDENTIFIED BY oradbmon; 
    GRANT CREATE session TO nagios;
    GRANT SELECT any dictionary TO nagios;
    GRANT SELECT ON V_$SYSSTAT TO nagios;
    GRANT SELECT ON V_$INSTANCE TO nagios;
    GRANT SELECT ON V_$LOG TO nagios;
    GRANT SELECT ON SYS.DBA_DATA_FILES TO nagios;
    GRANT SELECT ON SYS.DBA_FREE_SPACE TO nagios;

    [Reply]

  50. Rija Says:
    July 2nd, 2010 at 16:38

    I see this message with tablespace-io-balance only. I’ve executed: check_oracle_health –connect SID –user nagios –password oradbmon –mode tablespace-io-balance. list-tablespaces works, the output gives list and message “OK – have fun” in the end. All privileges are OK for user nagios. Thank You for your help!

    [Reply]

    lausser Reply:

    Edit the plugin and search for “sub init_datafiles”, then search for “iobalance” and finally search for “datafileresults”. Now you found the line

    my @datafileresults = $params{handle}-&gt;fetchall_array($sql, $params{selectname}, $params{selectname});
    Please change the $params{selectname} to $params{tablespace} (2 times) and try again.

    [Reply]

    Rija Reply:

    @lausser,

    I followed your tips and now everything works. Thank you very much for your help.

    [Reply]

    Rija Reply:

    @Rija, Oups! Sorry, it doesn’t work for oracle installed on windows machine, the same error message appear . Have you got another solution for that? Thank you.

    [Reply]

    Rija Reply:

    @lausser, Oups! Sorry, it doesn’t work for oracle installed on windows machine, the same error message appear . Have you got another solution for that? Thank you.

    [Reply]

    lausser Reply:

    Strange…unfortunately i don’t have a windows db-server. Please execute the following statement with sqlplus:

    SELECT file_name,  SUM(phyrds), SUM(phywrts)
    FROM dba_data_files, v$filestat
    WHERE tablespace_name = UPPER('USERS')
      AND file_id=file# GROUP BY tablespace_name, file_name

    [Reply]

    Rija Reply:

    @lausser, Hello! I ran “GRANT SELECT ON V_$filestat TO nagios;” and it works. Thank you… I have another problem, I’d like to modify default values of critical and warning level when execute sga-data-buffer-hit-ratio or sga-library-cache-hit-ratio or sga-dictionary-cache-hit-ratio but I still have the error message that appears critical even value is 100%. I’he executed the following command: check_oracle_health –connect SID –mode sga-data-buffer-hit-ratio –warning 80 –critical 90 CRITICAL – SGA data buffer hit ratio 100.00% | sga_data_buffer_hit_ratio=100.00%;80;90

    [Reply]

    Rija Reply:

    @Rija, Sorry! The command is: check_oracle_health –connect SID –mode sga-data-buffer-hit-ratio –warning 90 –critical 80 The same error message appears…

    lausser Reply:

    These are “less than”-thresholds. According to the plugin developer guidelines, you must add a “:”. So –warning <less than 90> is written as –warning 90:

  51. roger Says:
    July 2nd, 2010 at 18:38

    is normal what seg_top_10 metrics is including to PERFSTAT user, also:

    this my top 10 ……….

    PERFSTAT 2154 row lock waits 1 …….. PERFSTAT 1450 row lock waits 2 ……….. PERFSTAT 572 row lock waits 3 ……… PERFSTAT 466 row lock waits 4 ……… PERFSTAT 446 row lock waits 5 ……….. PERFSTAT 382 row lock waits 6 ………… PERFSTAT 350 row lock waits 7 ………… PERFSTAT 288 row lock waits 8 ………….

    PERFSTAT 246 row lock waits 9 ………….. PERFSTAT 191 row lock waits 10

    [Reply]

    roger Reply:

    @roger, why distinct value for the same user ?

    [Reply]

  52. Hamza Maal Says:
    July 14th, 2010 at 10:06

    Hi

    I am trying to run a sql statement using –mode sql but it does not seem to work. I have tried using the encode but it still comes up with errors

    Original statement /usr/lib/nagios/plugins/check_oracle_health –connect mlc247 –username dbuser –password dbpass –mode sql SELECT TO_CHAR(NEXT_TIME, ‘DD-MON-YYYY HH24:MI:SS’) FROM V$ARCHIVED_LOG where sequence# = (select max(sequence#) from v$archived_log where applied = ‘YES’)

    After encoding

    /usr/lib/nagios/plugins/check_oracle_health –connect mlc247 –username dbuser –password dbpass –mode sql SELECT%20TO%5FCHAR%28NEXT%5FTIME%2C%20%27DD%2DMON%2DYYYY%20HH24%3AMI%3ASS%27%29%20FROM%20V%24ARCHIVED%5FLOG%20where%20sequence%23%20%3D%20%28select%20max%28sequence%23%29%20from%20v%24archived%5Flog%20where%20applied%20%3D%20%27YES%27%29

    This is the error I get using the encode

    Use of uninitialized value $sql in sprintf at /usr/lib/nagios/plugins/check_oracle_health line 4194. Use of uninitialized value in subroutine entry at /usr/local/lib/perl/5.10.0/DBD/Oracle.pm line 284. Use of uninitialized value $value in numeric gt (>) at /usr/lib/nagios/plugins/check_oracle_health line 3615. Use of uninitialized value $value in numeric gt (>) at /usr/lib/nagios/plugins/check_oracle_health line 3616. Use of uninitialized value $params{“name2″} in split at /usr/lib/nagios/plugins/check_oracle_health line 3553. OK – :

    Any help would be much appreciated

    [Reply]

    lausser Reply:

    check_oracle_health .... --mode sql --name SELECT%20TO%5...

    [Reply]

  53. jhon Says:
    July 15th, 2010 at 22:11

    check_oracle_health –connect SID –mode sga-data-buffer-hit-ratio

    OK – SGA data buffer hit ratio 105.55%

    105.55 !!! why ?

    [Reply]

    lausser Reply:

    I don’t know. I need more information. Look into the code. Find the statement which is used to fetch the data used for the calculation of the hit ratio, execute the statement manually, get the values involved in the calculation manually, post the result here.

    [Reply]

    jhon Reply:

    SUM(DECODE(NAME,’PHYSICALREADS’,VALUE,0))

    SUM(DECODE(NAME,’PHYSICALREADSDIRECT’,VALUE,0))

    SUM(DECODE(NAME,’PHYSICALREADSDIRECT(LOB)’,VALUE,0))

    SUM(DECODE(NAME,’SESSIONLOGICALREADS’,VALUE,0))

                                    33942
                                            155
                                              319842
                                        5623223
    

    ===== using query of @Marco this result:

    SELECT ROUND((1-(phy.value / (cur.value + con.value)))*100,2) “Cache Hit Ratio” FROM v$sysstat cur, v$sysstat con, v$sysstat phy WHERE cur.name = ‘db block gets’ AND con.name = ‘consistent gets’ AND phy.name = ‘physical reads’ SQL> /

    Cache Hit Ratio

           99.4
    

    [Reply]

  54. JamesC Says:
    July 16th, 2010 at 22:23

    I’m having an odd issue, related to running the script as a non-root user. The output is correct, except there’s a printf() error included with the output.

    [nagios@server0224 ~]$ /usr/local/nagios/libexec/check_oracle_health –connect krusta_srv –username USER –password PASS –mode sga-data-buffer-hit-ratio –warning 95: –critical 90: printf() on closed filehandle STATE at /usr/local/nagios/libexec/check_oracle_health line 3828. OK – SGA data buffer hit ratio 99.99% | sga_data_buffer_hit_ratio=99.99%;95:;90:

    [Reply]

    lausser Reply:

    You ran the plugin as root. This lead to the creation of /var/tmp/check_oracle_health and probably some files below this directory. (owner: root) These files are necessary to carry state information from one run to the next. Then you ran the plugin as non-root. Overwriting the state file(s) does not work, because they’re owned by root. That’s why you see the error message. Homework:

    • chown -R nagios:nagios /var/tmp/check_oracle_health
    • write 100 times “i must not run plugins as root”

    [Reply]

  55. sdouce Says:
    July 20th, 2010 at 12:15

    Hi i receive this kind of message and i dont understand, i have many nagios server using same distrib and working fine , here i have this probleme can oy help ? :

    CRITICAL – cannot connect to ORACLE_TOTO. install_driver(Oracle) failed: Can’t load ‘/usr/lib/perl5/site_perl/5.8.8/i386-linux-thread-multi/auto/DBD/Oracle/Oracle.so’ for module DBD::Oracle: /usr/lib/oracle/10.2.0.4/client/lib/libocci.so.10.1: ne peut restaurer le segment prot après reloc:

    Permission non accordée at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 230. at (eval 14) line 3 Compilation failed in require at (eval 14) line 3. Perhaps a required shared library or dll isn’t installed where expected at /usr/local/nagios/libexec/check_oracle_health line 4193

    [Reply]

    lausser Reply:

    Hi, looks like a broken installation of DBD::Oracle.

    [Reply]

Leave a Reply