check_hpasm

Posted on July 15th, 2009 by lausser

Description

check_hpasm is a plugin for Nagios which checks the hardware health of Hewlett-Packard Proliant Servers. To accomplish this, you must have installed the hpasm package. The plugin checks the health of

  • Processors
  • Power supplies
  • Memory modules
  • Fans
  • CPU- and board-temperatures
  • Raids (ide and sas only when using SNMP)

and alerts you if one of these components is faulty or operates outside its normal parameters.

 

Documentation

The plugin can operate in two modes:

  • Local. The plugin runs on the server which is to be checked. The command hpasmcli (from the hpasm.rpm package) must be installed.
  • Remote. The plugin runs on the Nagios server. It finds out the status of the remote hardware by contacting remote server with SNMP. The hpasm package must be installed on the remote server.
nagios$ check_hpasm
OK - hardware working fine
nagios$ check_hpasm -H 10.0.73.30 -C public
OK - hardware working fine
nagios$ check_hpasm -H 10.0.73.30 -C public -P 1
OK - hardware working fine
nagios$ check_hpasm -H 10.0.73.30 -C public --snmpwalk /usr/bin/snmpwalk
OK - hardware working fine

Comparison of the two modes: lokal und remote.

Verbosity

For debugging purposes it can be called with the –verbose (or -v) option. It will then output the detailed status of each checked component:

nagios$ check_hpasm -v
CRITICAL - dimm module 0:5 (module 5 @ cartridge 0) needs attention (degraded), System: 'proliant dl360 g5', S/N: '3UH841N09K', ROM: 'P58 08/03/2008'
checking cpus
cpu 0 is ok
cpu 1 is ok
checking power supplies
powersupply 1 is ok
powersupply 2 is ok
checking fans
fan 1 is present, speed is normal, pctmax is 50%, location is powerSupply, redundance is redundant, partner is 2
fan 2 is present, speed is normal, pctmax is 50%, location is cpu, redundance is redundant, partner is 3
fan 3 is present, speed is normal, pctmax is 50%, location is cpu, redundance is redundant, partner is 1
checking temperatures
1 ioBoard temperature is 42C (65 max)
2 ambient temperature is 18C (40 max)
3 cpu temperature is 30C (95 max)
4 cpu temperature is 30C (95 max)
5 powerSupply temperature is 29C (60 max)
checking memory
dimm module 0:1 (module 1 @ cartridge 0) is ok
dimm module 0:2 (module 2 @ cartridge 0) is ok
dimm module 0:3 (module 3 @ cartridge 0) is ok
dimm module 0:4 (module 4 @ cartridge 0) is ok
dimm module 0:5 (module 5 @ cartridge 0) needs attention (degraded)
dimm module 0:6 (module 6 @ cartridge 0) is ok
dimm module 0:7 (module 7 @ cartridge 0) is ok
dimm module 0:8 (module 8 @ cartridge 0) is ok
checking disk subsystem
da controller 0 in slot 0 is ok
controller accelerator is ok
controller accelerator battery is ok
logical drive 0:1 is ok (distribDataGuard)
physical drive 0:0 is ok
physical drive 0:1 is ok
physical drive 0:2 is ok
physical drive 0:3 is ok
physical drive 0:4 is ok
physical drive 0:5 is ok | fan_1=50% fan_2=50% fan_3=50% temp_1_ioBoard=42;65;65 temp_2_ambient=18;40;40 temp_3_cpu=30;95;95 temp_4_cpu=30;95;95 temp_5_powerSupply=29;60;60

–verbose (or -v) can be repeated several times or given a numerical argument. The maximum level is -vvv. Using this level you will see a complete dump of all detected hardware components with all details.

nagios$ check_hpasm -vvv
...
[CPU_0]
cpqSeCpuSlot: 0
cpqSeCpuUnitIndex: 0
cpqSeCpuName: Intel Xeon
cpqSeCpuStatus: ok
info: cpu 0 is ok
 
[PS_1]
cpqHeFltTolPowerSupplyBay: 1
cpqHeFltTolPowerSupplyChassis: 0
cpqHeFltTolPowerSupplyPresent: present
cpqHeFltTolPowerSupplyCondition: ok
cpqHeFltTolPowerSupplyRedundant: redundant
info: powersupply 1 is ok
...
[FAN_1]
cpqHeFltTolFanChassis: 1
cpqHeFltTolFanIndex: 1
cpqHeFltTolFanLocale: powerSupply
cpqHeFltTolFanPresent: present
cpqHeFltTolFanType: spinDetect
cpqHeFltTolFanSpeed: normal
cpqHeFltTolFanRedundant: redundant
cpqHeFltTolFanRedundantPartner: 2
cpqHeFltTolFanCondition: ok
cpqHeFltTolFanHotPlug: nonHotPluggable
info: fan 1 is present, speed is normal, pctmax is 50%, location is powerSupply, redundance is redundant, partner is 2
...
[PHYSICAL_DRIVE]
cpqDaPhyDrvCntlrIndex: 0
cpqDaPhyDrvIndex: 4
cpqDaPhyDrvBay: 5
cpqDaPhyDrvBusNumber: 1
cpqDaPhyDrvSize: 1864
cpqDaPhyDrvStatus: ok
cpqDaPhyDrvCondition: ok
...

Blacklisting

If you want checks of failed/missing components to be skipped, so alerts caused by these are suppressed, then use the option –blacklist to blacklist them. With this option you give the plugin a list of items separated by / having the following format:

<typ>:<nr>[,<nr>...][/<typ>:<nr>[,<nr>...]]…

where <type> can take one of the following values:

cpu c
powersupply p
fan f
overall fan status ofs
temperature t
dimm d
da controller daco
da controller accelerator daac
da controller accelerator battery daacb
da logical drive dald
da physical drive dapd
scsi controller scco
scsi logical drive scld
scsi physical drive scpd
fcal controller fcaco
fcal accelerator fcaac
fcal host controller fcahc
fcal host controller overall condition fcahco
fcal logical drive fcald
fcal physical drive fcapd
fuse fu
enclosure manager em
iml-event evt

The <nr> of a component can be found in the output of check_hpasm -v.

checking cpus
cpu 0 is ok                                                             | c:0
cpu 1 is ok                                                             | c:1
checking power supplies
powersupply 1 is ok                                                     | p:1
powersupply 2 is ok                                                     | p:2
checking fans
fan 1 is present, speed is normal, ....                                 | f:1
fan 2 is present, speed is normal, ....                                 | f:2
fan 3 is present, speed is normal, ....                                 | f:3
overall fan status: fan=ok, cpu=ok
checking temperatures
1 ioBoard temperature is 42C (65 max)                                   | t:1
2 ambient temperature is 18C (40 max)                                   | t:2
3 cpu temperature is 30C (95 max)                                       | t:3
4 cpu temperature is 30C (95 max)                                       | t:4
5 powerSupply temperature is 29C (60 max)                               | t:5
checking memory
dimm module 0:1 (module 1 @ cartridge 0) is ok                          | d:0:1
dimm module 0:2 (module 2 @ cartridge 0) is ok                          | d:0:2
dimm module 0:3 (module 3 @ cartridge 0) is ok                          | d:0:3
dimm module 0:4 (module 4 @ cartridge 0) is ok                          | d:0:4
dimm module 0:5 (module 5 @ cartridge 0) needs attention (degraded)     | d:0:5
dimm module 0:6 (module 6 @ cartridge 0) is ok                          | d:0:6
dimm module 0:7 (module 7 @ cartridge 0) is ok                          | d:0:7
dimm module 0:8 (module 8 @ cartridge 0) is ok                          | d:0:8
checking disk subsystem
da controller 3 in slot 0 is ok                                         | daco:3
controller accelerator is ok                                            | daac:3
controller accelerator battery is ok                                    | daacb:3
logical drive 3:1 is ok (mirroring)                                     | dald:3:1
logical drive 3:2 is ok (mirroring)                                     | dald:3:2
physical drive 3:0 is ok                                                | dapd:3:0
physical drive 3:1 is ok                                                | dapd:3:1
physical drive 3:2 is ok                                                | dapd:3:2
physical drive 3:3 is ok                                                | dapd:3:3
ide controller 0 in slot -1 is ok and unused                            | ideco:0
fcal controller 1:0 in box 1/slot 0 needs attention (degraded)          | fcaco:1:0
fcal accelerator in box 1/slot 0 is temp disabled                       | fcac:1:0
logical drive 1:1 is failed (advancedDataGuard)                         | fcald:1:1
physical drive 1:128 is failed                                          | fcapd:1:128
physical drive 1:129 is ok                                              | fcapd:1:129
physical drive 1:130 is failed                                          | fcapd:1:130
physical drive 1:131 is ok                                              | fcapd:1:131
physical drive 1:132 is failed                                          | fcapd:1:132
physical drive 1:133 is ok                                              | fcapd:1:133
physical drive 1:134 is ok                                              | fcapd:1:134
physical drive 1:135 is ok                                              | fcapd:1:135
physical drive 1:144 is ok                                              | fcapd:1:144
physical drive 1:145 is ok                                              | fcapd:1:145
physical drive 1:147 is unconfigured                                    | fcapd:1:147
fcal host controller 0 in slot 1 is ok                                  | fcahc:0
fcal host controller 1 in slot 1 is ok                                  | fcahc:1

Assumed that you want to blacklist the failed memory module and the three failed hard disks (including the logical drive they belong to), you would write

d:0:5/fcapd:1:128,1:130,1:132/fcald:1:1

As an alternative you can write this string into the first line of a file and give the filename as an argument to –blacklist.

Custom temperature thresholds

If the system-default temperature thresholds should be overridden, use the –customthresholds option.

nagios$ check_hpasm
...
1 cpu temperature is 45C (62 max)
2 cpu temperature is 56C (80 max)
3 ioBoard temperature is 38C (60 max)
4 cpu temperature is 59C (80 max)
5 powerSupply temperature is 31C (53 max)
...
 
nagios$ check_hpasm --customthresholds 1:70/5:65
...
1 cpu temperature is 45C (70 max)
2 cpu temperature is 56C (80 max)
3 ioBoard temperature is 38C (60 max)
4 cpu temperature is 59C (80 max)
5 powerSupply temperature is 31C (65 max)
...

Performance data

With the option –perfdata you can switch on the output of performance data, if not already set as the default during installation. Should the perfdata string become too long, then use –perfdata=short which outputs a short form of the temperature tags (the location part will not be shown)

nagios$ check_hpasm
OK - hardware working fine| fan_1=8%;0;0 fan_2=8%;0;0  fan_3=15%;0;0 fan_4=15%;0;0 fan_5=8%;0;0 fan_6=8%;0;0 fan_7=20%;0;0 fan_8=20%;0;0 'temp_1_processor_zone'=38;62;62 'temp_2_cpu#1'=37;73;73 'temp_3_i/o_zone'=49;68;68 'temp_4_cpu#2'=40;73;73 'temp_5_power_supply_bay'=36;44;44
 
nagios$ check_hpasm --perfdata short
OK - hardware working fine| fan_1=8%;0;0 fan_2=8%;0;0  fan_3=15%;0;0 fan_4=15%;0;0 fan_5=8%;0;0 fan_6=8%;0;0 fan_7=20%;0;0 fan_8=20%;0;0 'temp_1'=38;62;62 'temp_2'=37;73;73 'temp_3'=49;68;68 'temp_4'=40;73;73 'temp_5'=36;44;44

Unknown memory status

With some Bios releases hpasmcli doesn’t display the memory modules correctly. The command SHOW DIMM shows only a list of modules with status n/a which is counted as a Warning. Using the –ignore-dimms you can skip memory checking without using a blacklist to avoid this warning.

Non-redundant fans

If you see a warning because all of the fans are not redundant, then this might be because ther are only single fans instead of pairs of fans on purpose. With –ignore-fan-redundancy you can suppress this warning. (See README).

Unfortunately it is not possible to show fan speed (or percent of max. speed) in SNMP mode. Therefore it is shown substituded by 50%.

 

Installation

  • After unpacking the Archive, call the ./configure command. Attention should be paid to the –with-noinst-level option which defines the exit code of the plugin if no hpasm rpm was installed. With the option –with-degrees you tell the plugin whether you want temperature values displayed in celsius or fahrenheit. With the option –enable-perfdata you tell check_hpasm to add performance data to it’s output by default. If you don’t want to see type, serial number and biosrelease in the output, you can switch this off by using –disable-hwinfo. With –enable-hpacucli you activate checking of raid controllers.
  • Grab the hpasm package suitable for your Linux distribution and install it. See the list of links below where to find it.
  • If you run check_hpasm (in local mode) as a non-root user you will need sudo-privileges which allow you to call /sbin/hpasmcli as root without providing a password.
  • Note: if you want to run check_hpasm under Debian with SNMP v3, you must install some additional packages: aptitude install libtie-encryptedhash-perl libdigest-hmac-perl (Thanks Tony Wolf)

 

Examples

More examples for different error conditions:

memory module failed:

nagios$ check_hpasm
CRITICAL - dimm module 2 @ cartridge 2 needs attention (dimm is degraded)
 
nagios$ check_hpasm -v
checking hpasmd process
System        :proliant dl580 g3
Serial No.    :GB8632FB7V
ROM version   :P38 04/28/2006
checking cpus
 cpu 0 is ok
 cpu 1 is ok
 cpu 2 is ok
 cpu 3 is ok
checking power supplies
 powersupply 1 is ok
 powersupply 2 is ok
checking fans
checking temperatures
 1 cpu#1 temparature is 36 (80 max)
 2 cpu#2 temparature is 34 (80 max)
 3 cpu#3 temparature is 33 (80 max)
 4 cpu#4 temparature is 37 (80 max)
 5 i/o_zone temparature is 32 (60 max)
 6 ambient temparature is 23 (40 max)
 7 system_bd temparature is 34 (60 max)
checking memory modules
 dimm 1@1 is ok
 dimm 2@1 is ok
 dimm 3@1 is ok
 dimm 4@1 is ok
 dimm 1@2 is ok
 dimm 2@2 is dimm is degraded
 dimm 3@2 is ok
 dimm 4@2 is ok
CRITICAL - dimm module 2 @ cartridge 2 needs attention (dimm is degraded)

power supply module failed:

nagios$ ./check_hpasm
CRITICAL - powersuply #2 needs attention (failed), powersuply #1 is not redundant
nagios$ ./check_hpasm -v
checking hpasmd process
System        :proliant dl580 g4
Serial No.    :GB8637M8TH
ROM version   :P59 09/08/2006
checking cpus
 cpu 0 is ok
 cpu 1 is ok
 cpu 2 is ok
 cpu 3 is ok
checking power supplies
 powersupply 1 is ok
 powersupply 2 is failed
checking fans
checking temperatures
 1 cpu#1 temparature is 42 (85 max)
 2 cpu#2 temparature is 46 (85 max)
 3 cpu#3 temparature is 44 (85 max)
 4 cpu#4 temparature is 44 (85 max)
 5 i/o_zone temparature is 39 (60 max)
 6 ambient temparature is 27 (40 max)
 7 system_bd temparature is 41 (60 max)
checking memory modules
 dimm 1@1 is ok
 dimm 2@1 is ok
 dimm 3@1 is ok
 dimm 4@1 is ok
 dimm 1@2 is ok
 dimm 2@2 is ok
 dimm 3@2 is ok
 dimm 4@2 is ok
 dimm 1@3 is ok
 dimm 2@3 is ok
 dimm 3@3 is ok
 dimm 4@3 is ok
 dimm 1@4 is ok
 dimm 2@4 is ok
CRITICAL - powersuply #2 needs attention (failed),  powersuply #1 is not redundant

power supply module pulled:

nagios$ ./check_hpasm
CRITICAL - powersuply #2 is missing, powersuply #1 is not redundant
nagios$ ./check_hpasm -v
checking hpasmd process
System        :proliant dl580 g4
Serial No.    :GB8637M8TH
ROM version   :P59 09/08/2006
checking cpus
 cpu 0 is ok
 cpu 1 is ok
 cpu 2 is ok
 cpu 3 is ok
checking power supplies
 powersupply 1 is ok
 powersupply 2 is n/a
checking fans
checking temperatures
 1 cpu#1 temparature is 42 (85 max)
 2 cpu#2 temparature is 46 (85 max)
 3 cpu#3 temparature is 44 (85 max)
 4 cpu#4 temparature is 44 (85 max)
 5 i/o_zone temparature is 39 (60 max)
 6 ambient temparature is 27 (40 max)
 7 system_bd temparature is 41 (60 max)
checking memory modules
 dimm 1@1 is ok
 dimm 2@1 is ok
 dimm 3@1 is ok
 dimm 4@1 is ok
 dimm 1@2 is ok
 dimm 2@2 is ok
 dimm 3@2 is ok
 dimm 4@2 is ok
 dimm 1@3 is ok
 dimm 2@3 is ok
 dimm 3@3 is ok
 dimm 4@3 is ok
 dimm 1@4 is ok
 dimm 2@4 is ok
CRITICAL - powersuply #2 is missing, powersuply #1 is not redundant

Hpasm daemon is not running:

nagios$ check_hpasm
CRITICAL - hpasmd needs to be started

Hpasm software is not installed:

OK - hardware working fine, at least i hope so because hpasm is not installed

 

Call to participate

Please run check_hpasm -v on as many as possible different platforms. Chances are you have a rare Proliant model whose components are not detected completely. You will then see instructions on how to report this to the author.

The following line appears frequently but can be considered harmless:

#0 SYSTEM_BD - -

I am always interested in test data. If you want to do me a favour, send me the output of

snmpwalk ... <ip-adress> 1.3.6.1.4.1.232

or if you are using the local variant, i’d like to see the output of the following script:

hpasmcli=$(which hpasmcli)
hpacucli=$(which hpacucli)
for i in server powersupply fans temp dimm iml
do
  $hpasmcli -s &quot;show $i&quot; | while read line
  do
    printf '%s %s\n' $i &quot;$line&quot;
  done
done
if [ -x &quot;$hpacucli&quot; ]; then
  for i in config status
  do
    $hpacucli ctrl all show $i | while read line
    do
      printf '%s %s' $i &quot;$line&quot;
    done
  done
fi

 

Download

check_hpasm-4.6.3.2.tar.gz

 

Externe Links

 

Changelog

  • 4.6.3.2 2013-03-19
    - fix a bug in proliant/gen8/ilo temperature thresholds (Thanks Kai Benninghoff
    and Stephane Loeuillet)
  • 4.6.3.1 2013-01-10
    - fix a bug in da disk in local mode
    - fix a bux in overall_init proliant nics (Thanks Fanming Jen)
  • 4.6.3 2012-11-25

    - fix the problem with –99 degrees

    - fix the problem with binrry zero EventUpdateTime

    - Proliant Gen8 should work now

  • 4.6.2.1 2012-11-09

    - some bugfixes in bladecenter temperatures (Thanks Thomas Reichel)
  • 4.6.2 2012-08-20

    - fix some bugs in snmpget where the system responded with undef values
  • 4.6.1 2012-08-14

    - fix a small bug in boottime

    - skip pagination in long "show iml" lists

    - make bulk requests if possible

  • 4.6 2012-06-07

    - output power consumption as performance data (only newer proliant models)

    - support older <=7 versions of hpacucli

    - add another error log: Uncorrectable Memory Error

    - raise the default timeout from 15 to 60 seconds

  • 4.5.3.1 2012-04-19

    - change the way –snmpwalk reads oids from a file
  • 4.5.3 2012-03-26

    - fix a bug in snmp-eventlogs
  • 4.5.2 2012-03-06

    - add another error log: Main Memory – Corrected Memory Error threshold exceeded
  • 4.5.1 2012-02

    - add another error log: 210 – Quick Path Interconnect (QPI) Link Degradation

    - remove watt percent for blade center power supply

    - make the snmp oid collection phase shorter for blade center

  • 4.5 2012-01-26

    - output power consumption perfdata for BladeCenters

    - correctly identify dl388g7 (Thanks lilei8)

  • 4.4 2011-12-16

    - add checks for power converters

    - add checks for nic teaming (experimental!!, must be enabled with –eval-nics)

    - fix a bug with invalid date/time from iml

    - fix a bug in blade enclosure manager verbose output

    - add msa2xxx storage sensors

  • 2011-10-14 4.3 add monitoring of IML events (Thanks Klaus) esp. "Memory initialization error… The OS may not have access to all of the memory installed in the system". This feature was sponsored by one of our customers. If it is useful for you and you want to thank them, buy a BMW.
  • 4.2.5 G2 series of X1660 storage systems are now correctly detected. (Thanks Andre Zaborowski), blacklisting for SAS controller & disks was added (Thanks Jewi)
  • 2011-08-09 4.2.4.1 dimm output of G7 hpasmcli (under Solaris) is now handled (Thanks Ron Waffle)
  • 2011-07-21 4.2.4 add a check for asr (Thanks Ingmar Verheij http://www.ingmarverheij.com/)
  • 2011-07-21 4.2.3 add a global temperature check when no temperature sensors are found, check power converters if no fault tolerant power supplies are found
  • 2011-04-17 4.2.2.1 fix a bug when a wrong –hostname was used (Thanks Wim Savenberg)
  • 2011-01-21 4.2.2 add support for msa500 and hpasmcli (Thanks Kalle Andersson)
  • 2010-10-18 4.2.1.1 X* Nas Storage is now detected correctly
  • 2010-10-01 4.2.1 added timeout handling, better hpacucli da controller handling, fix a bug in memory detection (0 dimms were shown) (Thanks Anthony Cano), better handling for failed and disabled controller batteries with warning only.
  • 2010-03-30 4.2 Bladesystems: Enclosure managers, Fuses und Temperaturen are now queried (looks like the latter are were not implemented by HP. At least i never saw temps in a snmpwalk), Proliant: blacklisting for SCSI-controller and -disks (Thanks Marco Hill) and for Overall Fan Status (Thanks Thomas Jampen)
  • 2010-02-09 4.1.2 Bugfix in local mode if there are more than 1 logical drive (Thanks Trond Hasle).
  • 2009-01-07 4.1.1 More smart array types are detected in local mode (Thanks Trond Hasle).
  • 2009-12-07 4.1 Bugfix in powersupply-check with hpasmcli, Bladecenters show more details now.
  • 2009-12-04 4.0.1 Added –help, fixed a bug in celsius-fahrenheit-conversion, enhanced fan logic, added support for models with a hidden product string (cpqsinfo-mib-error)
  • 2009-11-30 4.0 Complete redesign of the code. Suppoer for G6-models, new blacklist-rules, verbose-mode with detailed output of the hardwrae components. Support for HP BladeCenter (cpqRack-MIB) and HP Storage-Systems (cpqStorage MIB).
  • 2009-03-20 3.5 Support for SNMPv3, Bugfix for degraded dimms which were reported as missing, new parameter –port, support for MSA20, notice when /etc/sudoers is configured incorrectly. (Thanks Jeff the Riffer, matt at adicio.com)
  • 2009-02-06 3.1.1 Bugfix which removes Perl-Warnings (Thanks Bill Katz and Martin Hofmann)
  • 2009-01-23 3.1 support for ide and sas disks
  • 2008-12-05 3.0.7.1 Minor Bugfix. snmpwalk now uses -On
  • 2008-11-29 3.0.7 Bugfix in controller-blacklist. Using –snmpwalk you don’t need Net::SNMP.
  • 2008-10-30 3.0.6 Bugfix in –ignore-dimms
  • 2008-10-24 3.0.5 Shorter runtime thanks to fewer SNMP-Data (Thanks Yannick Gravel). New Option –ignore-fan-redundancy.
  • 2008-09-18 3.0.4 Rewrite of the SNMP Dimm code
  • 2008-09-11 3.0.3.2 -P is now optional (Bugfix)
  • 2008-09-10 3.0.3.1 -P bugfixes
  • 2008-09-10 3.0.3 Bugfix in snmpwalk cpqHeComponents. New Parameter –protocol (default: 2c)
  • 2008-07-31 3.0.1 Bugfix in customthreshold (Thanks TheCry)
  • 2008-07-28 3.0 SNMP (Thanks Matthias Flacke)
  • 2008-04-16 2.0.3.1 configure-Bug fixed. (–with-perl, –with-perfdata)
  • 2008-04-09 2.0.3 Blacklisting for Controllers. Dimm-Bug fixed.
  • 2008-02-11 2.0.2 empty cpu&fan sockets are now properly handled
  • 2008-02-08 2.0.1 multiline output for nagios 3.x
  • 2008-02-08 2.0 complete code redesign, integrated raid checking with hpacucli
  • 2008-01-18 1.6.2.2 Fixed misleading message under Debian 3.1
  • 2007-12-12 1.6.2.1 Bugfix. Fans were overseen.
  • 2007-11-16 1.6.2 New option -i, output of model, biosrelease and serial number by default (Thanks Marcus Fleige).
  • 2007-11-07 1.6.1 Bugfix. Failed fans were possibly overseen. Perfdata use single quotes.
  • 2007-07-27 1.6 Performance data.
  • 2007-06-14 1.5 New option supports user-defined temperature thresholds.
  • 2007-05-22 1.4 Support for hpasmxld and hpasmlited.
  • 2007-04-18 1.3 Added –with-degrees to configure. Added –blacklist
  • 2007-04-16 1.2 Added –with-noinst-level option to configure.
  • 2007-04-14 1.1 First published release.

 

Copyright

Gerhard Lausser

Check_hpasm is released under the GNU General Public License. GPL

Author

Gerhard Lausser (gerhard.lausser@consol.de) will gladly answer your questions.

183 Responses to “check_hpasm”

  1. Piotr Palka Says:
    October 30th, 2009 at 0:05

    Hi! Found bug in a script, first power supply is not recognized, script depends on empty line between them. hpasmcli> show powersupply Power supply #1         Present  : Yes         Redundant: No         Condition: FAILED         Hotplug  : Supported Power supply #2         Present  : Yes         Redundant: No         Condition: Ok         Hotplug  : Supported

    lausser Reply:

    That’s indeed a severe bug. Thank you for bringing this to my notice. Gerhard

    Guenther Sommer Reply:

    @lausser, Is there already a fix available or workaround (patch)? Can this be done soon, I would really need this (and can’t find in the code where it gets evaluated.

    Martin Reply:

    Hi Lausser,

    Do you have a fix for this bug yet? It’s a great script.

    lausser Reply:

    Have a look at the blog entry “check_hpasm Sneak Preview II”. This pre-release should handle it. Please try it and mail me immediately if you have problems. I wanted to release 4.0 in the next days.

    Acid Reply:

    @lausser,

    Hi,

    I’m testing the 4.0.1 version on a dl360g4, the power supplies do not show at all :

    OK – System: ‘proliant dl360 g4p’, S/N: ‘CZJ64202ST’, ROM: ‘P54 07/16/2007′, hardware working fine, da: 1 logical drives, 2 physical drives, cpu_0=ok fan_1=49% fan_2=49% temp_1=32 temp_2=37 temp_4=29 temp_5=23 checking cpus cpu 0 is ok checking power supplies checking fans fan 1 is present, speed is normal, pctmax is 49%, location is processor_zone, redundance is notRedundant, partner is 0 fan 2 is present, speed is normal, pctmax is 49%, location is system, redundance is notRedundant, partner is 0 checking temperatures 1 i/o_zone temperature is 32C (63 max) 2 cpu#1 temperature is 37C (85 max) 4 power_supply_bay temperature is 29C (48 max) 5 system_bd temperature is 23C (41 max) checking memory dimm module 0:1 (module 1 @ cartridge 0) is ok dimm module 0:2 (module 2 @ cartridge 0) is ok dimm module 0:3 (module 3 @ cartridge 0) is ok dimm module 0:4 (module 4 @ cartridge 0) is ok checking disk subsystem da controller 1 in slot 0 is ok controller accelerator is ok controller accelerator battery is notPresent logical drive 1:1 is ok (raid 1) physical drive 1:0 is ok physical drive 1:1 is ok

    Acid Reply:

    @lausser,

    Here are the output of hpasmcli and the hp-health version : hpasmcli> show powersupply Power supply #1 Present : Yes Redundant: Yes Condition: Ok Hotplug : Supported Power supply #2 Present : Yes Redundant: Yes Condition: Ok Hotplug : Supported hpasmcli> quit [root@maribor ~]# rpm -qa | grep hp-health hp-health-8.3.2.2-1

    lausser Reply:

    Hi Acid, have a look at the script under “call to participate” above. Please mail me the output of that script.

  2. Monitor hardware-health on HP Proliant ML370 G3 with Nagios « Mozekoze Says:
    November 7th, 2009 at 22:50

    [...] For nagios you’ll need the check_hpasm plugin, found here. [...]

  3. Claudio Says:
    November 10th, 2009 at 11:22

    Using this plugin for years now and still like it! Great work! Thanks for continuing the plugin!

  4. Ovidiu Says:
    November 26th, 2009 at 8:02

    How do you use this script with windows servers?

    lausser Reply:

    You need the hp system management driver and agent packages (look at the links above). Then you can query the server with SNMP. (check_hpasm –hostname –community …)

  5. normes Says:
    November 30th, 2009 at 10:34

    I’m sorry, but the Windows HP software (your link above) couldn’t installed on Proliant DL360 G6. Now I’m unsure which components I have to install from HP: HP Version Control Repository Manager HP System Management Homepage for Windows HP Version Control Agent for Windows HP ProLiant Array Configuration Utility (CLI) for Windows HP ProLiant Array Configuration Utility for Windows HP Insight Management WBEM Providers for Windows Server 2003/2008 HP ProLiant Integrated Management Log Viewer for Windows HP ProLiant Remote Monitor Service for Windows Server 2003/2008 HP Insight Diagnostics Online Edition for Windows Server 2003/2008 HP Insight Management Agents for Windows Server 2003/2008 HP NULL IPMI Controller Driver for Windows Server 2003 HP Insight Management WBEM Providers for Windows Server 2003/2008 Virtual Server Environment 4.1 Update1 HP ProLiant Array Diagnostics Utility for Windows HP ProLiant Firmware Inventory Agent for System Center Configuration Manager 2007

    There are so many different packages… But no “Win2003 System Management Driver”. I’m using already the Windows integrated SNMP Server, so I hope I can use that with the HP tools.

    Thanks,

    Norman
    

  6. Geir O. Høgberg Says:
    December 3rd, 2009 at 17:28

    Possible error regarding performance output. I see that it reports that -p is not a valid option anymore, fixed that with –enable-perfdata. Also, I get no output when I run ./check_hpasm -h or –help :) Besides from that, working as a charm and reporting good it seems. We are looking into some of the new things it picked up to see if they’re correct.

    Thanks, Geir

  7. tex Says:
    December 4th, 2009 at 2:02

    This is in the blog part of the site, but I wanted to note it here: there is a bug when compiling to use Fahrenheit, so until fixed one may want to stick with Celsius.

  8. Xavier Capell Says:
    December 7th, 2009 at 19:13

    I am trying to blacklist an msa1000 controller but with no luck trying with the “-b” parameter. When I execute the following command I get the following output:

    check_hpasm -v -H hostname -c public

    …. … msa1000 controller in box 1 slot 1 needs attention msa1000 controller in box 1 slot 2 needs attention ….

    I would like to blacklist these two entries. Is it possible? which argument should I send with the -b option?

    thanks

    lausser Reply:

    Looks like you are using the old 3.x version of check_hpasm. You can’t blacklist a msa with it. Please upgrade to 4.1 and post the output again.

  9. Peter R. Says:
    December 8th, 2009 at 12:40

    Hallo, das ist ein wirklich tolles Tool, aber leider funktioniert es nicht ganz auf einem ‘proliant dl385 g2′ mit ‘hp-health-8.3.0′

    check_hpasm (4.1) sagt: fan ist NICHT redundant:

    fan 1 is present, speed is normal, pctmax is 50%, location is i/o_zone, redundance is notRedundant, partner is 0 …

    hpasmcli sagt: fan IST redundant: Fan Location Present Speed of max Redundant Partner Hot-pluggable — ——– ——- —– —— ——— ——- ————-

    1 I/O_ZONE Yes NORMAL 50% Yes 0 Yes

    … Dabei glaube ich eher dem hpasmcli…

    Schöne Grüße Peter

    lausser Reply:

    Ein Stück weiter oben steht unter “Aufruf zum Mitmachen” ein Script. Bitte schick mir dessen Output per Mail zu.

    lausser Reply:

    Jetzt verstehe ich, was da passiert ist.

    fans Fan  Location        Present Speed  of max  Redundant  Partner  Hot-pluggable
    fans ---  --------        ------- -----  ------  ---------  -------  -------------
    fans #1   I/O_ZONE        Yes     NORMAL  50%     Yes        0        Yes
    fans #2   I/O_ZONE        Yes     NORMAL  50%     Yes        0        Yes
    
    Normalerweise sollte bei einem redundanten Fan unter der Spalte “Partner” die Nummer des anderen Fans stehen, mit dem zusammen er ein redundantes Pärchen bildet. Die Null weist darauf hin, dass etwas nicht stimmt. Das muss aber kein physikalisches Problem sein, es gibt auch zahlreiche Firmwarestände, auf die nicht 100% Verlass ist. Daher habe ich es so programmiert, dass in dem Fall der Lüfter von “redundant” auf “notRedundant” zurückgestuft wird. Dies führt aber nicht zu einem Fehler, da ein Partner=0 auch angezeigt wird, wenn z.B. bei 1-CPU-Maschinen anstelle des zweiten Lüfters nur ein Dummy eingebaut wird. In diesem Fall ist das nicht so, da man ja die Drehzahlen sieht. Zugegeben, die Ausgabe von check_hpasm entspricht nicht der von hpasmcli, aber auch dessen Angaben sind irreführend.

    Ich hoffe, damit können sie leben.

  10. paul snoep Says:
    December 9th, 2009 at 13:22

    Hi,

    Great plugin, however for some mysterious reason our disk array is with the check_hpasm not recognized. We do can get output when run from commandline. My perl knowledge is too limited to debug and/or find the cause. Can you help?

    Thanks

    pacucli ctrl all show status

    Smart Array P400i in Slot 0 (Embedded) Controller Status: OK Cache Status: OK Battery/Capacitor Status: OK

    hpacucli ctrl all show config

    Smart Array P400i in Slot 0 (Embedded) (sn: PH8CMQ4778 )

    array A (SAS, Unused Space: 0 MB)

      logicaldrive 1 (341.7 GB, RAID 5, OK)

    physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 72 GB, OK) physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 72 GB, OK) physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 72 GB, OK) physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 72 GB, OK) physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SAS, 72 GB, OK) physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS, 72 GB, OK)

    lausser Reply:

    Hi Paul, scroll up to the section “call to participate” and you will find a small shellscript. Can you run it and mail me the output please?

    paul snoep Reply:

    @lausser,

    Hi,

    Below the requested output of the script.

    Thanks Paul

    server server System : ProLiant DL360 G5 server Serial No. : CZJ902A7RF server ROM version : P58 05/18/2009 server iLo present : Yes server Embedded NICs : 2 server NIC1 MAC: 00:23:7d:a2:22:8e server NIC2 MAC: 00:23:7d:a2:22:96 server server Processor: 0 server Name : Intel Xeon server Stepping : 10 server Speed : 3000 MHz server Bus : 1333 MHz server Core : 4 server Thread : 4 server Socket : 1 server Level2 Cache : 12288 KBytes server Status : Ok server server Processor: 1 server Name : Intel Xeon server Stepping : 10 server Speed : 3000 MHz server Bus : 1333 MHz server Core : 4 server Thread : 4 server Socket : 2 server Level2 Cache : 12288 KBytes server Status : Ok server server Processor total : 2 server server Memory installed : 8192 MBytes server ECC supported : Yes server powersupply powersupply Power supply #1 powersupply Present : Yes powersupply Redundant: Yes powersupply Condition: Ok powersupply Hotplug : Supported powersupply Power supply #2 powersupply Present : Yes powersupply Redundant: Yes powersupply Condition: Ok powersupply Hotplug : Supported powersupply fans fans Fan Location Present Speed of max Redundant Partner Hot-pluggable fans — ——– ——- —– —— ——— ——- ————- fans #1 POWERSUPPLY_BAY Yes NORMAL 34% Yes 0 No fans #2 CPU#2 Yes NORMAL 29% Yes 0 No fans #3 CPU#1 Yes NORMAL 37% Yes 0 No fans fans temp temp Sensor Location Temp Threshold temp —— ——– —- ——— temp #1 I/O_ZONE 44C/111F 65C/149F temp #2 AMBIENT 20C/68F 40C/104F temp #3 CPU#1 30C/86F 95C/203F temp #4 CPU#1 30C/86F 95C/203F temp #5 POWER_SUPPLY_BAY 33C/91F 60C/140F temp #6 CPU#2 30C/86F 95C/203F temp #7 CPU#2 30C/86F 95C/203F temp temp dimm dimm DIMM Configuration dimm —————— dimm Cartridge #: 0 dimm Module #: 1 dimm Present: Yes dimm Form Factor: fh dimm Memory Type: 14h dimm Size: 1024 MB dimm Speed: 667 MHz dimm Supports Lock Step: No dimm Configured for Lock Step: No dimm Status: Ok dimm dimm Cartridge #: 0 dimm Module #: 2 dimm Present: Yes dimm Form Factor: fh dimm Memory Type: 14h dimm Size: 1024 MB dimm Speed: 667 MHz dimm Supports Lock Step: No dimm Configured for Lock Step: No dimm Status: Ok dimm dimm Cartridge #: 0 dimm Module #: 3 dimm Present: Yes dimm Form Factor: fh dimm Memory Type: 14h dimm Size: 1024 MB dimm Speed: 667 MHz dimm Supports Lock Step: No dimm Configured for Lock Step: No dimm Status: Ok dimm dimm Cartridge #: 0 dimm Module #: 4 dimm Present: Yes dimm Form Factor: fh dimm Memory Type: 14h dimm Size: 1024 MB dimm Speed: 667 MHz dimm Supports Lock Step: No dimm Configured for Lock Step: No dimm Status: Ok dimm dimm Cartridge #: 0 dimm Module #: 5 dimm Present: Yes dimm Form Factor: fh dimm Memory Type: 14h dimm Size: 1024 MB dimm Speed: 667 MHz dimm Supports Lock Step: No dimm Configured for Lock Step: No dimm Status: Ok dimm dimm Cartridge #: 0 dimm Module #: 6 dimm Present: Yes dimm Form Factor: fh dimm Memory Type: 14h dimm Size: 1024 MB dimm Speed: 667 MHz dimm Supports Lock Step: No dimm Configured for Lock Step: No dimm Status: Ok dimm dimm Cartridge #: 0 dimm Module #: 7 dimm Present: Yes dimm Form Factor: fh dimm Memory Type: 14h dimm Size: 1024 MB dimm Speed: 667 MHz dimm Supports Lock Step: No dimm Configured for Lock Step: No dimm Status: Ok dimm dimm Cartridge #: 0 dimm Module #: 8 dimm Present: Yes dimm Form Factor: fh dimm Memory Type: 14h dimm Size: 1024 MB dimm Speed: 667 MHz dimm Supports Lock Step: No dimm Configured for Lock Step: No dimm Status: Ok dimm dimm config config Smart Array P400i in Slot 0 (Embedded) (sn: PH8CMQ4778 ) config config array A (SAS, Unused Space: 0 MB) config config logicaldrive 1 (341.7 GB, RAID 5, OK) config config physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 72 GB, OK) config physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 72 GB, OK) config physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 72 GB, OK) config physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 72 GB, OK) config physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SAS, 72 GB, OK) config physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS, 72 GB, OK) config status status Smart Array P400i in Slot 0 (Embedded) status Controller Status: OK status Cache Status: OK status Battery/Capacitor Status: OK status status

    lausser Reply:

    Where is your hpacucli located? check_hpasm tries to find it in /usr/sbin/hpacucli and /usr/local/sbin/hpacucli. If it is unable to locate the command, the array check will be skipped. Maybe this is the cause.

    paul snoep Reply:

    @lausser, It’s in /usr/sbin as below. root@asnlnm001:~# ls -al /usr/sbin/hpacucli -rwxr-xr-x 1 root root 676 2009-07-10 19:16 /usr/sbin/hpacucli

    Tim Reply:

    @paul snoep, did you figure out a solution to this, I’m experiencing the same issue.

    Tim Reply:

    I got it. When I compiled it I hadn’t used –enable-hpacucli…. FACEPALM

  11. Waruna Says:
    December 18th, 2009 at 10:55

    hi All My error CRITICAL – sudo must be configured with requiretty=no (man sudo), System: ‘unknown’, S/N: ‘unknown’, ROM: ‘unknown’

    my configuration like this

    /etc/nagios/localhost.cfg

    define service{ use local-service host_name adlive service_description Check HP Hardware check_command check_hpasm }

    /etc/nagios/commands.cfg define command{ command_name check_hpasm command_line $USER1$/check_hpasm }

    And

    Add this lines to /etc/sudoers

    Cmnd_Alias HPASM = /usr/sbin/hpacucli, /sbin/hpacucli, /usr/lib/nagios/plugins/check_hpasm

    nagios ALL = HPASM

    nagios ALL=(ALL) NOPASSWD: ALL

    I get Error in nagios web interface

    CRITICAL – sudo must be configured with requiretty=no (man sudo), System: ‘unknown’, S/N: ‘unknown’, ROM: ‘unknown’

    Pl help to correct it

    Waruna Reply:

    @Waruna, I can run check_hpasm without password like this

    [root@abcd plugins]# su nagios sh-3.2$ sudo ./check_hpasm OK – System: ‘proliant dl380 g5′, S/N: ‘SGA810XNVC’, ROM: ‘P56 08/03/2008′, hardware working fine sh-3.2$

    but I get Error in nagios web interface

    CRITICAL – sudo must be configured with requiretty=no (man sudo), System: ‘unknown’, S/N: ‘unknown’, ROM: ‘unknown’

    Pl help thnx

    lausser Reply:

    You need sudo-privileges for the hpasmcli command, not for the check_hpasm plugin.

    lausser Reply:

    Everything you need to to is mentioned in the error message. Read the sudo-manpage, look for requiretty and set this parameter in your /etc/sudoers to ‘no’. A running Nagios process has no controlling tty, that’s why you need this setting.

    Waruna Reply:

    @lausser,

    Thank you.

    I comment Defaults requiretty entry in /etc/sudoers

    thank for help my friend Lakmal & lausser

  12. netrogue Says:
    January 18th, 2010 at 15:11

    hi, we can monitor hp-ux with check_hpasm ?

    lausser Reply:

    I never tried it. You surely cannot monitor hp-ux, but, if at all, the hardware of a server running hp-ux. Precondition is the presence of the CPQHLT-MIB. Try “snmpwalk … 1.3.6.1.4.1.232″. If you get a response, you might give check_hpasm a try.

  13. Stefan Says:
    February 1st, 2010 at 15:08

    Hallo,

    erst einmal danke für die super Arbeit! Ich habe einen kleinen Bug entdeckt. Die internen HP Tools melden mir eine defekte Platte. Der hpasm_check meldet mir aber das alles OK ist.

    Zu sehen hier: http://pastie.org/804084

    Also nicht ganz unkritisch das Ganze.

    Grüße, Stefan

    lausser Reply:

    Was sagt “check_hpasm … -vvv” dazu? Könnte ich bitte den Output von “snmpwalk … 1.3.6.1.4.1.232″ per Mail bekommen?

    Stefan Reply:

    Mail ist raus.

    lausser Reply:

    Tja, dumme Sache. Die Daten im snmpwalk zeigen 6 tiptop funktionierende Platten an. Wer hat jetzt recht? Leuchtet irgendeine rote LED an der Platte? Kannst du mal /etc/init.d/hpasm durchstarten und “hpacucli rescan” ausführen? Sind beide Methoden dann immer noch unterschiedlicher Meinung?

    Stefan Reply:

    Die Platte wurde über die LED als defekt angezeigt. Die Platte wurde von mir noch am 01.02 ausgetauscht da es kein ganz unkritisches System ist, daher kann ich darüber leider nichts mehr sagen.

    Wenn ich wieder so einen Fall entdecke werde ich mich wieder melden.

    Danke für die Hilfe.

  14. Peter Says:
    February 2nd, 2010 at 0:39

    It certainly looks like you have a fine add-on Nagios monitor for HP servers from the reviews. One issue that makes this add-on completely confusing for those of us that don’t have 200 HP servers is what parts of the HP System Management Software are we to download from HP and install on Windows servers. I see several people ask this question and all you guys do is provide a link to a drivers download page for a particular HP model. Now obviously the authors of this module know exactly what is required to be installed. Why not just give it up and give us a list? Most of us admins don’t have time to eat or take a crap, let alone go on a wild goose chase to make this thing work.

  15. lausser Says:
    February 2nd, 2010 at 15:57

    No, the author does not know exactly what is required to be installed. The author has also just a single HP under his desk which he bought from ebay. And it’s not even running Windows. So the only information i can offer is: “System Management Driver” + “Insight Management Agents”. To find the right software for a particular model was no problem for hundreds of users. Sorry, i spent months of my free time writing and maintaining this software for the sole purpose to give it away for free and help people. At least i had some fun writing the code. Anyone can take it and be happy with it.
    What this is not: a free all-inclusive no-worries allround-package. Sorry, if an admin has not the time to eat and sleep, this is not my problem. You ask no less from me than to spent my spare time or spend my worktime (which means betray my employer) for free. This is not how Open Source works. Sorry for this rude reply.

  16. Claudio Says:
    February 5th, 2010 at 9:45

    @Peter: A real admin reads all documentations about a server he bought or is about to buy and therefore would understand the possibilities of monitoring with System Management software.

    lausser made a great check plugin for Nagios but it certainly won’t get you your coffee right at your desk or give you additional brain cells.

    I know it’s not always easy to be an admin, nobody says thanks when everything runs smoothly, but it’s our friggin job to THINK and read and learn and think even more.

  17. tex Says:
    February 12th, 2010 at 4:18

    We have some Proliant DL380 G6 units with the 8.30 HP tool set. We have found that the hpasmcli is broken in the following manner: hpasmcli -s “show dimms” fails with: *** glibc detected *** free(): invalid pointer: 0x08068ce4 ***

    but if one runs hpasmcli manually and then type the “show dimms” command, it works!

    I cannot find anyone seeing this same problem, our IT group may open a ticket with HP about this since we have the latest version of the tools as far as I can tell. The IT group regressed back all the way to 7.9 to fix this problem, but now I see that it is segfaulting most of the time, not reporting all the memory and not reporting the temperatures. So I am going to have them go back to 8.30 and blacklist the dimms for now.

    Obviously this isn’t a problem with check_hpasm, but have you ever seen a problem like this?

    thanks

  18. Benzke Says:
    February 24th, 2010 at 17:56

    Hi tex, i have exactly the same issues. I also have an open ticket with hp since the 1st October 09 concerning this issue… Our G6 servers are already in production use so this is extremely annoying. This has to be the worst hardware support i have ever experienced from any company… It was over two months writing forth and back until the folks at hp finally admitted it was a problem on their side and not with my OS (rhel4). According to hp rhel4 is verified for the G6 so it really makes me wonder if those guys did run any testing on that platform at all before releasing them to the public. Cheers, Benzke

    tex Reply:

    @Benzke, the people I work with at the South Pole just found a new release of the HP tools(8.4) which fixes this issue. Is dated from 3/8 and they say it fixes the problem….. cheers tex

  19. Chris Says:
    February 24th, 2010 at 22:24

    kann ich über dieses Plugin ein Windows 2008 64 bit System überwachen ???

    Grüße

    Chris

    lausser Reply:

    Ja, sollte kein Problem sein. Natürlich muss auf der Maschine die entsprechende HP-Management-Software installiert werden, damit der Hardwarezustand per SNMP abgefragt werden kann.

  20. Andy Says:
    February 25th, 2010 at 16:14

    Hallo, habe da ein Problem mit einem DL360 G5: ./check_hpasm -H “ProLiant DL360 G5″ -v meldet mir:

    Use of uninitialized value $romversion in pattern match (m//) at ./check_hpasm line 846. Use of uninitialized value $romversion in pattern match (m//) at ./check_hpasm line 846. Use of uninitialized value $Proliant::Hardware::Hpasm::Server::serial in sprintf at ./check_hpasm line 818. Use of uninitialized value $Proliant::Hardware::Hpasm::Server::rom in sprintf at ./check_hpasm line 819. Use of uninitialized value $Proliant::Hardware::Hpasm::Server::serial in sprintf at ./check_hpasm line 205. Use of uninitialized value $Proliant::Hardware::Hpasm::Server::rom in sprintf at ./check_hpasm line 205. OK – System: ”, S/N: ”, ROM: ”, hardware working fine| System : Serial No. : ROM version : kann man das was machen? Danke, Gruß Andy

    lausser Reply:

    Dazu müsste ich den Output von

    snmpwalk ….. 1.3.6.1.4.1.232

    sehen. Könnte ich den per Mail bekommen?

  21. Waruna Says:
    March 1st, 2010 at 10:04

    I try to user this check_hpasm with DL370 g6 & OS RHEL 4 U 7, I did the latest firmware upgrade 1/13/2010 in HP site, but error came like please upgrade firmware [root@a_aa1 plugins]# ./check_hpasm *** glibc detected *** free(): invalid pointer: 0x00c02820 *** WARNING – status of all 0 dimms is n/a (please upgrade firmware), System: ‘proliant dl370 g6′, S/N: ‘XXXXX’, ROM: ‘P63 01/13/2010′

    but it is ok with DL370 g6 & OS RHEL 5 U 2,

    [root@a_app1 plugins]# ./check_hpasm OK – System: ‘proliant dl370 g6′, S/N: ‘XXXXXXX’, ROM: ‘P63 01/13/2010′, hardware working fine

    please help me to user this plugin for RHEL 4U7

    lausser Reply:

    “*** glibc detected *** free(): invalid pointer…..” is not a check_hpasm-message. It surly comes from the hpasmcli command (which is executed by check_hpasm).

    hpasmcli -s “show dimm”

    should bring you this error message. Only HP can tell you what’s wrong.

  22. Peter Andersson Says:
    March 5th, 2010 at 10:49

    Hi

    Thanks Gerhard for a great nagios plugin!

    I have written a blog entry howto install the HP software, configure SNMP and configure Nagios to get it running. Take a peak at: http://www.it-slav.net/blogs/2010/03/02/monitor-hp-proliant-with-nagios-or-op5-monitor/

    lausser Reply:

    Hi Peter, i saw it yesterday and added a comment. :-)

  23. Rico Says:
    March 23rd, 2010 at 20:49

    Hi, I get the following error on one of my Boxes: CRITICAL – fcal host controller 2 in slot 5 reports problems (ok), fcal host controller 3 in slot 5 reports problems (ok), System: proliant dl580 g4, S/N: xxxxxxxxxx, ROM: P59 08/10/2007

    But i cannot see any problems when i log in to the managementpage of the Box. How to deal with this?

    bye!

    lausser Reply:

    Please mail me the output of

    snmpwalk ….. 1.3.6.1.4.1.232

  24. Marco Hill Says:
    March 24th, 2010 at 17:37

    Hallo,

    erstmal ein grosses dankeschön für ein weltklasse Nagiosplugin. :) Ich habe da mal eine kurze Frage. Ich würde gerne einen scsi controller und eine physikal drive blacklisten. welches typ-kuerzel muss ich da nehmen? In der liste oben finde ich es nicht. Die -v ausgabe zu dem controller ist:

    scsi controller in slot 4 is ok scsi controller in slot 5 needs attention physical drive 4:0 is failed

    Danke

    Gruss Marco

    lausser Reply:

    Ich sehe gerade, dass das Blacklisten für SCSI-Equipment gar nicht implementiert ist. Könnte ich bitte per Mail Testdaten kriegen (weiter oben unter Aufruf zum Mitmachen beschrieben), ich hol’s dann schnell nach.

    Marco Hill Reply:

    @lausser,

    Mail ist unterwegs.

    gruss Marco

    Marco Hill Reply:

    @lausser,

    Ich habe da noch eine Kleinigkeit. Sollte der snmpwalk befehl nicht wie folgt aussehen?

    snmpwalk 1.3.6.1.4.1.232

    Oben sind IP und 1.3.6.1.4.1.232 vertauscht.

    Gruss Marco

    Marco Hill Reply:

    @Marco Hill,

    snmpwalk command ip 1.3.6.1.4.1.232

    lausser Reply:

    Stimmt. Das muss ich morgen korrigieren.

  25. Mirko Says:
    March 26th, 2010 at 15:57

    Hello thanks for this awesome plugin!!! Is there any easy way to disable perf-data output for FANs?

    Since we use it in SNMP mode only, the never ending value of 50% value is unuseful, and could be suppressed.

    Thanks again Cheers Mirko

    lausser Reply:

    Find the following portion of code

      if ($self->{runtime}->{options}->{perfdata}) {
        $self->{runtime}->{plugin}->add_perfdata(
            label => sprintf('fan_%s', $self->{cpqHeFltTolFanIndex}),
            value => $self->{cpqHeFltTolFanPctMax},
            uom => '%',
        );
      }
    and comment out the five lines inside the if-clause.
  26. badoshi Says:
    March 29th, 2010 at 17:37

    Hi,

    This plugin is fantastic and works great with our vmware & red hat servers.

    Is it possible to use this with Solaris 10 x86 too? I have tried compiling and running, but get the following error:

    bash-3.00# /usr/local/nagios/libexec/check_hpasm ps: unknown output format: -o cmd usage: ps [ -aAdeflcjLPyZ ] [ -o format ] [ -t termlist ] [ -u userlist ] [ -U userlist ] [ -G grouplist ] [ -p proclist ] [ -g pgrplist ] [ -s sidlist ] [ -z zonelist ] ‘format’ is one or more of: user ruser group rgroup uid ruid gid rgid pid ppid pgid sid taskid ctid pri opri pcpu pmem vsz rss osz nice class time etime stime zone zoneid f s c lwp nlwp psr tty addr wchan fname comm args projid project pset CRITICAL – hpasmd needs to be restarted, System: ‘unknown’, S/N: ‘unknown’, ROM: ‘unknown’

    Could this be an issue between Solaris ‘ps’ command, and GNU ‘ps’?

    Thanks,

    lausser Reply:

    Yes, the problem is the “-ocmd” argument for the ps command. Can you please search the code for -ocmd and modify the matching line so it looks like

    if (open PS, "/bin/ps -e -oargs|") {
    

  27. Tom Says:
    March 30th, 2010 at 8:05

    Hallo und herzlichen Dank für das tolle Plugin!

    Wir haben es für mehrere ProLiant DL380 Servers im Einsatz. Mit den G4, G5 und G6 läufts super, nur mit unseren beiden G3 Servern habe ich ein Problem. Für beide Server liefert das Skript den folgenden Output:

    nagios:~# /usr/lib/nagios/plugins/check_hpasm --blacklist f:1,3,8 --ignore-fan-redundancy --community foo --hostname bar

    CRITICAL - system fan overall status is failed, cpu fan overall status is failed, System: 'proliant dl380 g3', S/N: 'foobar', ROM: 'P29 09/15/2004' | fan_1=0% fan_2=50% fan_3=0% fan_4=50% fan_5=50% fan_6=50% fan_7=50% fan_8=0% temp_1_cpu=39;62;62 temp_2_cpu=41;73;73 temp_3_ioBoard=51;68;68 temp_5_powerSupply=36;55;55

    Ist bei beiden tatsächlich etwas kaputt oder hat das Skript einen Fehler? Ich verwende check_hpasm Version 4.1.2.

    Herzlichen Dank und freundliche Grüsse Tom

    lausser Reply:

    Könnte ich bitte per Mail den Output von

    snmpwalk -v 2c -c foo bar 1.3.6.1.4.1.232
    bekommen? Wäre durchaus möglich, daß da etwas kaputt ist, da bei den Fans 1, 3 und 8 keine Drehzahl angezeigt wird. Du könntest es auch mal mit -vv aufrufen, damit siehst du mehr Details.
  28. Grzegorz Says:
    April 8th, 2010 at 11:32

    I’m trying to install 4.2 version on my Red Hat EL 5.4 servers, but i got such error:

    [root@monitor-prod check_hpasm-4.2]# ./configure –enable-perfdata –enable-hpacucli –enable-extendedinfo checking for a BSD-compatible install… /usr/bin/install -c checking whether build environment is sane… yes checking for a thread-safe mkdir -p… /bin/mkdir -p checking for gawk… gawk checking whether make sets $(MAKE)… yes checking how to create a pax tar archive… gnutar checking build system type… x86_64-unknown-linux-gnu checking host system type… x86_64-unknown-linux-gnu checking for a BSD-compatible install… /usr/bin/install -c checking whether make sets $(MAKE)… (cached) yes checking for gawk… (cached) gawk checking for sh… /bin/sh checking for perl… /usr/bin/perl configure: creating ./config.status config.status: creating Makefile config.status: creating plugins-scripts/Makefile config.status: creating plugins-scripts/subst –with-perl: /usr/bin/perl –with-nagios-user: nagios –with-nagios-group: nagios –with-noinst-level: unknown –with-degrees: unknown –enable-perfdata: yes –enable-extendedinfo: yes –enable-hwinfo: yes –enable-hpacucli: yes [root@monitor-prod check_hpasm-4.2]# make Making all in plugins-scripts make[1]: Entering directory /root/check_hpasm-4.2/plugins-scripts' make[1]: *** No rule to make targetHP/BladeSystem/Component/CommonEnclosureSubsystem/ManagerSubsystem.pm’, needed by check_hpasm'. Stop. make[1]: Leaving directory/root/check_hpasm-4.2/plugins-scripts’ make: *** [all-recursive] Error 1

    I have already installed check_hpasm v. 3.5 and i didn’t have any problems with installation. I decided to upgrade because of error with “losing” power supply. Actually, error message says nothing for me. Do I need some more perl stuff installed?

    lausser Reply:

    Hi, i think your tar is not able to unpack files with a filename longer than ~100 characters. That’s why make doesn’t find the …..ManagerSubsystem.pm file. There is also a check_hpasm-4.2.shar.gz you can download. Please get it and unpack the contents with

    cat check_hpasm-4.2.shar.gz | gzip -d | sh
  29. Grzegorz Says:
    April 8th, 2010 at 12:09

    OK, i just avoided problem by removing ManagerSubsystem.pm part from “EXTRA_MODULES =” in plugins-scripts Makefile.

  30. Sebastien douce Says:
    April 16th, 2010 at 12:13

    Hello,

    first Thank you for your work !

    I encounter this kind of Probleme on one Linux Server .

    When il execute locally check_hpasm : ./check_hpasm OK – System: ‘proliant dl585 g2′, S/N: ‘GB8730NP6F’, ROM: ‘A07 02/27/2007′, hardware working fine, da: 1 logical drives, 5 physical drives, cpu_0=ok cpu_1=ok cpu_2=ok cpu_3=ok ps_1=ok ps_2=ok fan_1=34% …etc

    And i try to execute from Nagios poller i receive : CRITICAL – dimm module 0:17 (module 17 @ cartridge 0) needs attention (n/a), dimm module 0:18 (module 18 @ cartridge 0) needs attention (n/a), dimm module 0:25 (module 25 @ cartridge 0) needs attention (n/a), dimm module 0:26 (module 26 @ cartridge 0) needs attention (n/a)CRITICAL – dimm module 0:17 (module 17 @ cartridge 0) needs attention (n/a), dimm module 0:18 (module 18 @ cartridge 0) needs attention (n/a), dimm module 0:25 (module 25 @ cartridge 0) needs attention (n/a), dimm module 0:26 (module 26 @ cartridge 0) needs attention (n/a)

    The Server is fine working actually and HP agent answer well , do you have any idea ?

    lausser Reply:

    I never saw this behaviour. So you don’t use the SNMP method, but are executing check_hpasm locally on the server (where you have a hpasmcli command)? Are only Dimm modules shown as failed or also other components?

    Sebastien douce Reply:

    @lausser, hpasmcli work as well , no dimm problems .. I try to restart snmp and hpasm, the server reboot as well … Locally check_hpasm work as well but if i try locally check_hpasm -H 127.0.0.1 , same problem !!

    I dont where the snmp has been corrupted … i have many server identic just one cause it .. Thnaks ..

    lausser Reply:

    can you send me the output of the following command please?

    snmpwalk .... 127.0.0.1 1.3.6.1.4.1.232
  31. ckpinguin Says:
    April 27th, 2010 at 9:43

    Your work is very much appreciated. I try to convince bosses to bring up some money ;-)

  32. hec Says:
    April 27th, 2010 at 15:22

    Hallo lausser, wir haben das Problem, dass das Plugin manchmal in einen Timeout läuft (WAN Strecke…). Gibt es eine Möglichkeit dem Plugin zu sagen das es bei einem TimeOut kein Critical State geben soll, sondern lediglich Warning?

    Danke für die Info.

    lausser Reply:

    Hi, ich sehe gerade, daß das Plugin selbst gar kein Timeout-Handling macht. Es ist also Nagios, das die Zeitüberschreitung feststellt und den Errorlevel festlegt. Ich würde ggf. die standardmässigen 60s mit dem Parameter service_check_timeout hochdrehen.

  33. hec Says:
    April 27th, 2010 at 16:42

    Hi, ja das hab ich schon getan, teilweise auch auf 90, wobei ich, wenn ich mich durch die status.dat grepe execution_time bis 170 sekunden (!!) habe. der timeout also einfach ignoriert wird.

  34. ckpinguin Says:
    May 11th, 2010 at 11:31

    Ist es möglich bzw. sinnvoll, bei Angabe von –blacklist, die entsprechenden Komponenten auch nicht mehr als Performancedaten zu liefern? Wir haben hier DL380 im Einsatz, die immer mal wieder Fantasiewerte bei 3 Sensoren liefern, so schauen die pnp4nagios-Grafiken auch nicht gerade toll aus.

    Vielen Dank für Eure Arbeit!

    lausser Reply:

    Könntest du bitte was ausprobieren? Such dir im Plugin die Routine “sub add_perfdata” und ändere die letzte Zeile folgendermassen:

    push (@{$self->{perfdata}}, $str) unless $self->{blacklisted};
  35. Jimmy liu Says:
    May 23rd, 2010 at 9:24

    Hi What’s wrong with me?pls help me,thanks~~~ [root@localhost libexec]# /usr/local/nagios/libexec/check_hpasm -H 192.168.0.231 -C public CRITICAL – could not find Net::SNMP module, wrong device

    lausser Reply:

    @Jimmy liu, you need to install the perl module Net::SNMP

    Jimmy liu Reply:

    @lausser,

    Thanks~~But after i installed perl module Net::SNMP,another problem “CRITICAL – snmpwalk returns no product name (cpqsinfo-mib), wrong device”.I have download “cpqsinfo-mib”file,but i have no idea how can do next step?pls help me again,thanks a lot :)

    lausser Reply:

    Maybe you didn’t install the hpasm software on the HP. Executing

    snmpwalk -v 2c -c <community> <ip-of-hp-server>  1.3.6.1.4.1.232
    should output a lot of lines.

    Jimmy liu Reply:

    @lausser,

    Many thanks for your help.It’s ok now

  36. Nikolas Nunez Says:
    June 8th, 2010 at 8:01

    I have recently installed the plugin on several HP DL360 servers, but on at least two servers, when running the check_hpasm -v, the power supplies don’t show up.

    Any ideas

    lausser Reply:

    Please look at the “call to participate” section of the check_hpasm-website. You’ll find two ways to send me diagnostic info. Please run either the snmpwalk or the local script and forward me the output, so i can check what’s wrong.

    Nikols Nunez Reply:

    @lausser,

    I run the script and the following is shown :

    server server System : ProLiant DL360 G4p server Serial No. : GB8633HERR server ROM version : P54 02/14/2006 server iLo present : Yes server Embedded NICs : 2 server NIC1 MAC: 00:18:71:e3:ae:da server NIC2 MAC: 00:18:71:e3:ae:d9 server server Processor: 0 server Name : Intel Xeon server Stepping : 10 server Speed : 3000 MHz server Bus : 800 MHz server Socket : 2 server Level2 Cache : 2048 KBytes server Status : Ok server server Processor: 1 server Name : Intel Xeon server Stepping : 10 server Speed : 3000 MHz server Bus : 800 MHz server Socket : 1 server Level2 Cache : 2048 KBytes server Status : Ok server server Processor total : 2 server server Memory installed : 4096 MBytes server ECC supported : Yes server powersupply powersupply Command NOT supported on this server at this time powersupply powersupply fans fans Command NOT supported on this server at this time fans fans temp temp Sensor Location Temp Threshold temp —— ——– —- ——— temp #0 SYSTEM_BD – - temp #1 I/O_ZONE 40C/104F 63C/145F temp #2 CPU#1 43C/109F 85C/185F temp #3 CPU#2 41C/105F 85C/185F temp #4 POWER_SUPPLY_BAY 31C/87F 48C/118F temp #5 SYSTEM_BD 27C/80F 41C/105F temp temp dimm dimm DIMM Configuration dimm —————— dimm Cartridge #: 0 dimm Module #: 1 dimm Present: Yes dimm Form Factor: 9h dimm Memory Type: 12h dimm Size: 1024 MB dimm Speed: 400 MHz dimm Status: Ok dimm dimm Cartridge #: 0 dimm Module #: 2 dimm Present: Yes dimm Form Factor: 9h dimm Memory Type: 12h dimm Size: 1024 MB dimm Speed: 400 MHz dimm Status: Ok dimm dimm Cartridge #: 0 dimm Module #: 3 dimm Present: Yes dimm Form Factor: 9h dimm Memory Type: 12h dimm Size: 1024 MB dimm Speed: 400 MHz dimm Status: Ok dimm dimm Cartridge #: 0 dimm Module #: 4 dimm Present: Yes dimm Form Factor: 9h dimm Memory Type: 12h dimm Size: 1024 MB dimm Speed: 400 MHz dimm Status: Ok dimm dimm config \config Smart Array 6i in Slot 0 (Embedded)\config \config array A (Parallel SCSI, Unused Space: 0 MB)\config \config \config logicaldrive 1 (136.7 GB, RAID 1, OK)\config \config physicaldrive 1:0 (port 1:id 0 , Parallel SCSI, 146.8 GB, OK)\config physicaldrive 1:1 (port 1:id 1 , Parallel SCSI, 146.8 GB, OK)\config \status \status Smart Array 6i in Slot 0 (Embedded)\status Controller Status: OK\status Cache Status: OK\status \status \

    It’s rather weird because I have other DL360 G4p that don’t have this problem.

    Nikolas Nunez Reply:

    @lausser,

    Please find below the output of the script. Furthermore I have run this command on another server with the same specs and the output does register the power supplies

    server server System : ProLiant DL360 G4p server Serial No. : GB8633HERR server ROM version : P54 02/14/2006 server iLo present : Yes server Embedded NICs : 2 server NIC1 MAC: 00:18:71:e3:ae:da server NIC2 MAC: 00:18:71:e3:ae:d9 server server Processor: 0 server Name : Intel Xeon server Stepping : 10 server Speed : 3000 MHz server Bus : 800 MHz server Socket : 2 server Level2 Cache : 2048 KBytes server Status : Ok server server Processor: 1 server Name : Intel Xeon server Stepping : 10 server Speed : 3000 MHz server Bus : 800 MHz server Socket : 1 server Level2 Cache : 2048 KBytes server Status : Ok server server Processor total : 2 server server Memory installed : 4096 MBytes server ECC supported : Yes server powersupply powersupply Command NOT supported on this server at this time powersupply powersupply fans fans Command NOT supported on this server at this time fans fans temp temp Sensor Location Temp Threshold temp —— ——– —- ——— temp #0 SYSTEM_BD – - temp #1 I/O_ZONE 40C/104F 63C/145F temp #2 CPU#1 42C/107F 85C/185F temp #3 CPU#2 42C/107F 85C/185F temp #4 POWER_SUPPLY_BAY 30C/86F 48C/118F temp #5 SYSTEM_BD 26C/78F 41C/105F temp temp dimm dimm DIMM Configuration dimm —————— dimm Cartridge #: 0 dimm Module #: 1 dimm Present: Yes dimm Form Factor: 9h dimm Memory Type: 12h dimm Size: 1024 MB dimm Speed: 400 MHz dimm Status: Ok dimm dimm Cartridge #: 0 dimm Module #: 2 dimm Present: Yes dimm Form Factor: 9h dimm Memory Type: 12h dimm Size: 1024 MB dimm Speed: 400 MHz dimm Status: Ok dimm dimm Cartridge #: 0 dimm Module #: 3 dimm Present: Yes dimm Form Factor: 9h dimm Memory Type: 12h dimm Size: 1024 MB dimm Speed: 400 MHz dimm Status: Ok dimm dimm Cartridge #: 0 dimm Module #: 4 dimm Present: Yes dimm Form Factor: 9h dimm Memory Type: 12h dimm Size: 1024 MB dimm Speed: 400 MHz dimm Status: Ok dimm dimm config \config Smart Array 6i in Slot 0 (Embedded)\config \config array A (Parallel SCSI, Unused Space: 0 MB)\config \config \config logicaldrive 1 (136.7 GB, RAID 1, OK)\config \config physicaldrive 1:0 (port 1:id 0 , Parallel SCSI, 146.8 GB, OK)\config physicaldrive 1:1 (port 1:id 1 , Parallel SCSI, 146.8 GB, OK)\config \status \status Smart Array 6i in Slot 0 (Embedded)\status Controller Status: OK\status Cache Status: OK\status \status \[root@mta2 plugins-scripts]#

    lausser Reply:

    @Nikolas Nunez, Wordpress messed it up. Please send it per mail to gerhard.lausser@consol.de

    lausser Reply:

    @Nikolas, Now is see it. Look into the output:

    powersupply Command NOT supported on this server at this time
    fans Command NOT supported on this server at this time
    So querying powersupplies is simply not supported on this type of machine (or maybe with this version of the hpasm software) You can see it with
    hpasmcli -s "show powersupply"
    hpasmcli -s "show fans"
  37. sak Says:
    June 8th, 2010 at 17:54

    hi lausser,

    first, thanks for this soft. second, I have a doubt about the fans, check_hp say the fans are notRedundant:

    fan 1 is present, speed is normal, pctmax is 25%, location is system, redundance is notRedundant, partner is 0 fan 2 is present, speed is normal, pctmax is 25%, location is system, redundance is notRedundant, partner is 0 fan 3 is present, speed is normal, pctmax is 47%, location is system, redundance is notRedundant, partner is 0 fan 4 is present, speed is normal, pctmax is 47%, location is system, redundance is notRedundant, partner is 0

    but hpasmcli say they are redundant:

    hpasmcli> show fans Fan Location Present Speed of max Redundant Partner Hot-pluggable — ——– ——- —– —— ——— ——- ————-

    1 SYSTEM Yes NORMAL 25% Yes 0 Yes

    2 SYSTEM Yes NORMAL 25% Yes 0 Yes

    3 SYSTEM Yes NORMAL 47% Yes 0 Yes

    4 SYSTEM Yes NORMAL 47% Yes 0 Yes

    it’s a bug on check_hp that is flipping the boolean ?

    lausser Reply:

    Please look at the posting above (Nicolas Nunez) and mail me the output of the mentioned test script.

    Nikolas Nunez Reply:

    @lausser,

    The output of these commands are the same as your report, powersupply and fans command NOT supported. I have compared the hpasm from other server and it’s a different version. So i’m trying to update the hpasm to be the same.

    will keep you posted

    Nikolas Nunez Reply:

    @Nikolas Nunez,

    The issues seems to be when the plugin communicates with the following hpasm file, hpasm-7.5.1-8.rhel4. I have since once again update the PSP and rebooted the server and all is fine.

    lausser Reply:

    I had a look at the fan-related code and i found a comment ” # cpqHeFltTolFanRedundantPartner=0: partner not avail”. I remember now, that a partner=0/redundant=yes actually means “not redundant”. It’s a bug in hpasm, which simply outputs incorrect information here. You have fans 1-4 in your system, fan 0 does not exist and can thus be no partner.

  38. sak Says:
    June 9th, 2010 at 22:46

    hi gerhard,

    doesn ‘t check_hpasm support NICs ?

    lausser Reply:

    No, this is not supported. I would rather monitor interfaces at the operating system level.

  39. Nikolas Nunez Says:
    June 11th, 2010 at 12:08

    I have an old DL380 G2, that the plugin states the following WARNING – status of all 6 dimms is n/a (please upgrade firmware). I thought that maybe the version of the PSP was too new for this server and downgraded to the recommended version of HP. I have run the script and the following is displayed, dimm DIMM Configuration dimm —————— dimm Cartridge #: 0 dimm Module #: 1 dimm Present: Yes dimm Form Factor: 8h dimm Memory Type: 5h dimm Size: 128 MB dimm Speed: 133 MHz dimm Status: N/A dimm dimm Cartridge #: 0 dimm Module #: 4 dimm Present: Yes dimm Form Factor: 8h dimm Memory Type: 5h dimm Size: 128 MB dimm Speed: 133 MHz dimm Status: N/A dimm dimm Cartridge #: 0 dimm Module #: 2 dimm Present: Yes dimm Form Factor: 8h dimm Memory Type: 5h dimm Size: 128 MB dimm Speed: 133 MHz dimm Status: N/A dimm dimm Cartridge #: 0 dimm Module #: 5 dimm Present: Yes dimm Form Factor: 8h dimm Memory Type: 5h dimm Size: 128 MB dimm Speed: 133 MHz dimm Status: N/A dimm dimm Cartridge #: 0 dimm Module #: 3 dimm Present: Yes dimm Form Factor: 8h dimm Memory Type: 5h dimm Size: 256 MB dimm Speed: 133 MHz dimm Status: N/A dimm dimm Cartridge #: 0 dimm Module #: 6 dimm Present: Yes dimm Form Factor: 8h dimm Memory Type: 5h dimm Size: 256 MB dimm Speed: 133 MHz dimm Status: N/A

    Could you please advise me what the problem may be.

    lausser Reply:

    Well, bad luck. As you can see from the section “Unknown memory status” in the documentation above, there are cases, where memory status cannot be aquired. (maybe it’s the bios, maybe the dimms don’t support a status at all, i don’t know). At least you can get rid of the error message with –ignore-dimms.

    Nikolas Nunez Reply:

    Thanks, didn’t read this. One more question, sorry for this. I have run a check hpasm -v to get the compnent so I can black list something, but the compenet numbers are not shown on the hpasm -v. Am i doing something wrong.

    lausser Reply:

    Please post the output inside pre-tags

    Nikolas Nunez Reply:

    I have emailed the output, as the last time I sent it, it wasn’t clear.

    lausser Reply:

    checking disk subsystem
    da controller 1 in slot 1 is ok
    controller accelerator is not
    controller accelerator battery is notPresent
    da controller 2 in slot 0 is ok
    controller accelerator is ok
    controller accelerator battery is notPresent
    Ok, now i understand. The controller accelerator (and battery) number is the same as the controller above. (1, 2) It should work with –blacklist daac:1,2 I think blacklisting a controller accelerator also blacklists the accelerator battery. If not, use –blacklist daac:1,2/dacb:1,2

    Nikolas Nunez Reply:

    Thanks for the information. I have applied the blacklist option but I have still an alarm in regards to the controller accelerator needing attention.

    Does the alarm correspond then to the issues that the controller accelerator and controller accelerator battery is not Present.

    How would it then be albe to remove the alarm.

    lausser Reply:

    Please mail me the complete output from the diagnosis script. The one you sent me (serial GB8633…) had only one controller.

    Nikolas Nunez Reply:

    Hi,

    I emailed it to you before, the server S/N starts with ’7250′ and is a DL380 G2. Anyway I’ll forward it on again.

  40. Markus Bloch Says:
    June 11th, 2010 at 16:25

    Hallo, grossartiges Skript. Wir benutzen es bei uns für DL360 und DL380 von G3 – G6. Wir hatten in der Vergangenheit defekte RAM-Module mit check_hpasm erkannt und getauscht. Eine Frage, währe es möglich bei allen gecheckten Komponenten die Eckdaten bei -v anzugeben? Bsp.

    [pre] dimm module 0:1 (module 1 @ cartridge 0, 1024MB 400MHz) is ok | d:0:1 dimm module 0:2 (module 2 @ cartridge 0, 1024MB 400MHz) is ok | d:0:2 dimm module 0:3 (module 3 @ cartridge 0, 1024MB 400MHz) is ok | d:0:3 dimm module 0:4 (module 4 @ cartridge 0, 1024MB 400MHz) is ok | d:0:4 dimm module 0:5 (module 5 @ cartridge 0, 512MB 400MHz) needs attention (degraded) | d:0:5 dimm module 0:6 (module 6 @ cartridge 0, 512MB 400MHz) is ok | d:0:6 dimm module 0:7 (module 7 @ cartridge 0, 512MB 400MHz) is ok | d:0:7 dimm module 0:8 (module 8 @ cartridge 0, 512MB 400MHz) is ok [/pre]

    Somit kann man sofort beim HP-Support anrufen und hat Serien-Nr., defektes Teil und die Eckdaten für das Ersatzteil auf einem Bildschirm (Bsp. bei Festplatten währe da die Größe + RPM).

    Das währe echt super. Weiter so!!

    Grüße Markus Bloch

  41. Mark Says:
    August 3rd, 2010 at 11:41

    I added the check but get the following result

    Return code of 255 is out of bounds

    I only get this result when added to the gui (in NagiosXI).

    When I run the check on the cmd line it works fine and I get the check results.

    lausser Reply:

    I have no idea. When a plugin runs on the commandline, my job is done.

  42. Rene Says:
    August 18th, 2010 at 11:02

    Hi, first of all, great script! We use it for allmost every server (> 700) in our WAN. Today, we received our first bladecenter c7000, when checking this chassis with check_hpasm 4.2, the individual blades are recognized by name, but the status and power indication remain value_unknown:

    server blade 1:1:1 ‘TS008′ is present, status is value_unknown, powered is value_unknown

    This is the case for every blade. When checking the OID with snmpwalk it returns:

    snmpwalk -v 2c -c public 10.205.252.4 1.3.6.1.4.1.232.22.2.4.1.1.1.21

    CPQRACK-MIB::cpqRackServerBladeStatus = No Such Object available on this agent at this OID

    As far as we know, the firmware of the c7000 is the latest version, what could be the issue?

    Hope you can help.

    lausser Reply:

    The status is unknown, because it’s defined in the mib (and implemented in check_hpasm), but the bladecenter does not return a value. You are not the first to come up with this. I can only ask you to urge your HP representative to answer “why do bladecenters not return 1.3.6.1.4.1.232.22.2.4.1.1.1.[>21] ?”

  43. Andy Says:
    August 20th, 2010 at 0:07

    Hallo!

    Ich hab da leider ein kleines Problem bei der Hardware Abfrage eines HP DL380 G6 – Xeon E5506. Auf dem Server ist ein CentOS 5.5 64bit installiert und ich verwende die hpasm version 4.2 Wenn ich hp_asm ausführe bekomme ich folgenden Fehler:

    UNKNOWN – insufficient rights to call /sbin/hpasmcli, System: ‘unknown’, S/N: ‘unknown’, ROM: ‘unknown’

    und wenn ich /sbin/hpasmcli direkt ausführe bekomme ich:

    read_buf FAILED

    ERROR: Failed to get SMBIOS system data. This does not seem to be a HP Proliant Server. ERROR: hpasmcli only runs on HP Proliant Servers.

    hoffe ihr könnt mi helfen!

    Danke im Voraus!

    LG

    lausser Reply:

    Wenn hpasmcli so eine Meldung bringt, kann nur HP selbst helfen.

    Andy Reply:

    @lausser,

    Hallo!

    Dankeschön für deine Nachricht! Ich war heute sehr überrascht vom HP Support, diese haben mir das aktuelle psp empfohlen und damit hats geklappt.. ;-)

    thanks a lot!

  44. Nathan Olney Says:
    August 22nd, 2010 at 13:51

    educate me.

    lausser Reply:

    That’s your parents’ job.

  45. wayne Says:
    August 27th, 2010 at 21:18

    Hi, I tried with snmpwalk -c public -v1 10.1.11.2 1.3.6.1.4.232

    and I got only End of MIB message. I need to check the hp server from non-hp server nagios.

    lausser Reply:

    End of MIB means that you either did not install the hpasm software or it was not started.

  46. wayne Says:
    August 27th, 2010 at 22:18

    ok, got it run after I install hp management agent. Now is time to play!

  47. wayne Says:
    August 27th, 2010 at 22:23

    Is it possible to test SATA with SNMP? I have HP AIO 1200 9TB storage. I tried with SNMP and it is responding with ” OK – System: ”, S/N: ”, ROM: ”, hardware working fine “. but actual fact is one drive failed.

    lausser Reply:

    please mail me the output of

    snmpwalk ip-of-storage 1.3.6.1.4.1.232

  48. Roman Says:
    September 6th, 2010 at 14:46

    Hallo, Leider habe ich folgenden Fehler in Nagios: **ePN /usr/lib/nagios/plugins/check_hpasm: “Use of uninitialized value $romversion in pattern match (m//) at (eval 1) line 796,”

    Dies mit der ESXi 4.1 Version.

    SNMPWALK= http://ifile.it/l9kimpy/snmpwalk.png

    Wenn Sie mir eine Idee hätten wie ich das Lösen könnte wäre ich Ihnen sehr dankbar.

    Gruss

    lausser Reply:

    Das scheint an embedded Perl zu liegen. Bitte in der nagios.cfg abschalten.

  49. Thomas Löscher Says:
    September 9th, 2010 at 8:49

    Hallo,

    bin gestern von Version 3.5 auf Version 4.2 gesprungen. Super Verbesserungen (detailiertere Fehlermeldungen). In der Doku (oder bei “–help”) sollte vieleicht vermerkt werden, dass bei “–perfdata= short|long” angegeben werden kann. “–perfdata” ohne alles funktioniert nicht. Ansonsten Super Tool, die Ausführung ist ein bisschen langsam, aber ich denke dass ist leider ein Problem von hpasmcli/hpacucli. Vielen Dank dafür

    Thomas

  50. Anthony Says:
    September 13th, 2010 at 23:42

    Hello,

    When I run the check_hpasm version 4.2 on a host that has a dimm error I get “status of all 0 dimms is n/a”. If I run the hpasmcli “show dimm” locally the number of dimms(4) are displayed but with status N/A. When I run check_hpasm in verbose mode I show that the memory check is bypassed (see below). I tried a previous version of check_hpasm 3.1.1 and the number of dimms are returned correctly. Can you please let me know how I can return the number of dimms correctly in the new version?

    ./check_hpasm -H ‘ip’ -C ‘string’ -v WARNING – status of all 0 dimms is n/a (please upgrade firmware), System: ‘proliant dl380′, S/N: ‘SN’, ROM: ‘P17 12/13/1999′ checking cpus cpu 0 is ok cpu 1 is ok checking power supplies powersupply 1 is ok powersupply 2 is ok checking fans overall fan status: system=ok, cpu=ok fan 1 is present, speed is normal, pctmax is 50%, location is cpu, redundance is notRedundant, partner is 0 fan 2 is present, speed is normal, pctmax is 50%, location is powerSupply, redundance is notRedundant, partner is 0 checking temperatures 1 cpu temperature is 22C (58 max) 2 cpu temperature is 18C (70 max) 3 ioBoard temperature is 25C (62 max) 4 cpu temperature is 18C (70 max) checking memory checking disk subsystem da controller 0 in slot 0 is ok controller accelerator is ok controller accelerator battery is notPresent logical drive 0:1 is ok (mirroring) physical drive 0:16 is ok physical drive 0:17 is ok ide controller 1 in slot 0 is ok

    lausser Reply:

    Please read the documentation web page. You will find a section ‘Call to participate’. I need more information.

  51. Aitor Says:
    September 17th, 2010 at 12:20

    Great plugin!! It gets all I want!

    Thanks!

  52. Jens G. Says:
    October 7th, 2010 at 12:47

    Hi,

    Wir haben nun die ersten HP G7 Maschinen bekommen, leider scheint hier der check_hpasm ein größeres Problem zu haben, so, dass dieser keine Infos bekommt. Selbst die Modellnummer oder Seriennummer wird nicht mehr gefunden. Habt ihr bereits Erfahrungen mit den neuen G7 Maschinen?

    VG Jens

    lausser Reply:

    Schau mal auf http://labs.consol.de/lang/de/nagios/check_hpasm/ den Abschnitt “Aufruf zum Mitmachen” an. Bitte schick mir den snmpwalk per mail (Adresse steht auf der Seite weiter unten).

  53. Jens G. Says:
    October 8th, 2010 at 14:43

    Hi,

    Problem vorerst gelöst. Der Kollege hatte die falsche PSP Version installiert. (v8.3) Mit der PSP v8.6 funktioniert die DL 360 G7 Maschine.

    sollte ich Bugs mit den G7 Modellen feststellen, dann werde ich es dir melden.

    VG Jens

  54. Nicole Says:
    October 14th, 2010 at 15:11

    Hallo,

    ich kann die neue Version nicht auf rhel5.5 bauen. Ist zwar kein HP Server, aber das hat vorher auch keine Rolle gespielt. Ich möchte von dem Server nur checken. Hat noch jemand das Problem?

    ./configure –prefix=/etc/icinga –with-nagios-user=icinga –with-nagios-group=icinga –with-perl=/usr/bin/perl –with-noinst-level=critical

    [root@icinga01 check_hpasm-4.2.1]# make Making all in plugins-scripts make[1]: Entering directory /usr/src/check_hpasm-4.2.1/plugins-scripts' make[1]: *** Keine Regel vorhanden, um das Target »HP/BladeSystem/Component/CommonEnclosureSubsystem/ManagerSubsystem.pm«, benötigt von »check_hpasm«, zu erstellen. Schluss. make[1]: Leaving directory/usr/src/check_hpasm-4.2.1/plugins-scripts’ make: *** [all-recursive] Fehler 1

    Danke! Gruß Nicole

    lausser Reply:

    Das liegt daran, daß tar auf manchen Distributionen keine Dateien entpacken kann, deren Name länger als ?? Zeichen ist, z.b. diese HP/BladeSystem/Component/CommonEnclosureSubsystem/ManagerSubsystem.pm Lade dir das check_hpasm…shar.gz runter und entpack es mit cat check_hpasm….shar.gz | gzip -d | sh

  55. Nicole Says:
    October 14th, 2010 at 16:00

    Ok, jetzt funktioniert es. Danke!

  56. Qk4l Says:
    October 27th, 2010 at 7:26

    Many Thanks! Спасибо большое! =)

  57. Steven Says:
    October 28th, 2010 at 13:54

    Hi

    The HPasm module has been devided into 3 components in the latest version of the management software of HP (8.6.x). Plugin works with version 8.2 but with version 8.6 of the HP agents, the output of the plugin is “snmpwalk returns no product name (cpqsinfo-mib), wrong device.

    Is there a workaround ?

    regards,

    Steven

    lausser Reply:

    please send me the output of

    snmpwalk .... ip_of_server 1.3.6.1.4.1.232
    and the output of
    snmpwalk .... ip_of_server

  58. Steven Says:
    October 28th, 2010 at 14:52

    Hi

    This is the output of both snmpwalk’s.

    SNMPv2-SMI::enterprises.232 = No more variables left in this MIB View (It is pas t the end of the MIB tree)

    lausser Reply:

    So this machine simply doesn’t speak SNMP. No chance for monitoring then.

    Steven Reply:

    @lausser, SNMP is working, now snmpwalk gives feedback.

    SNMPv2-MIB::sysDescr.0 = STRING: Linux arvhesx10 2.6.18-164.ESX #1 Fri Apr 16 14:57:03 PDT 2010 x86_64 SNMPv2-MIB::sysObjectID.0 = OID: NET-SNMP-MIB::netSnmpAgentOIDs.10 DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (1528314) 4:14:43.14 SNMPv2-MIB::sysContact.0 = STRING: it@ardo.be SNMPv2-MIB::sysName.0 = STRING: arvhesx10 SNMPv2-MIB::sysLocation.0 = STRING: DCKelder SNMPv2-MIB::sysORLastChange.0 = Timeticks: (0) 0:00:00.00 SNMPv2-MIB::sysORID.1 = OID: SNMPv2-MIB::snmpMIB SNMPv2-MIB::sysORID.2 = OID: TCP-MIB::tcpMIB SNMPv2-MIB::sysORID.3 = OID: IP-MIB::ip SNMPv2-MIB::sysORID.4 = OID: UDP-MIB::udpMIB SNMPv2-MIB::sysORID.5 = OID: SNMP-VIEW-BASED-ACM-MIB::vacmBasicGroup SNMPv2-MIB::sysORID.6 = OID: SNMP-FRAMEWORK-MIB::snmpFrameworkMIBCompliance SNMPv2-MIB::sysORID.7 = OID: SNMP-MPD-MIB::snmpMPDCompliance SNMPv2-MIB::sysORID.8 = OID: SNMP-USER-BASED-SM-MIB::usmMIBCompliance SNMPv2-MIB::sysORDescr.1 = STRING: The MIB module for SNMPv2 entities SNMPv2-MIB::sysORDescr.2 = STRING: The MIB module for managing TCP implementations SNMPv2-MIB::sysORDescr.3 = STRING: The MIB module for managing IP and ICMP implementations SNMPv2-MIB::sysORDescr.4 = STRING: The MIB module for managing UDP implementations SNMPv2-MIB::sysORDescr.5 = STRING: View-based Access Control Model for SNMP. SNMPv2-MIB::sysORDescr.6 = STRING: The SNMP Management Architecture MIB. SNMPv2-MIB::sysORDescr.7 = STRING: The MIB for Message Processing and Dispatching. SNMPv2-MIB::sysORDescr.8 = STRING: The management information definitions for the SNMP User-based Security Model. SNMPv2-MIB::sysORUpTime.1 = Timeticks: (0) 0:00:00.00 SNMPv2-MIB::sysORUpTime.2 = Timeticks: (0) 0:00:00.00 SNMPv2-MIB::sysORUpTime.3 = Timeticks: (0) 0:00:00.00 SNMPv2-MIB::sysORUpTime.4 = Timeticks: (0) 0:00:00.00 SNMPv2-MIB::sysORUpTime.5 = Timeticks: (0) 0:00:00.00 SNMPv2-MIB::sysORUpTime.6 = Timeticks: (0) 0:00:00.00 SNMPv2-MIB::sysORUpTime.7 = Timeticks: (0) 0:00:00.00 SNMPv2-MIB::sysORUpTime.8 = Timeticks: (0) 0:00:00.00 HOST-RESOURCES-MIB::hrSystemUptime.0 = Timeticks: (69330937) 8 days, 0:35:09.37 HOST-RESOURCES-MIB::hrSystemUptime.0 = No more variables left in this MIB View (It is past the end of the MIB tree)

    lausser Reply:

    what about 1.3.6.1.4.1.232? Without this tree check_hpasm can’t work.

    Steven Reply:

    @lausser,

    SNMPv2-SMI::enterprises.232 = No more variables left in this MIB View (It is past the end of the MIB tree)

    lausser Reply:

    No 1.3.6.1.4.1.232 means no monitoring. Maybe the hpasm software was not installed correctly or the snmp daemon needs to be restarted. But as long as there is no result for 1.3.6.1.4.1.232 i can’t do anything.

  59. Steven Says:
    October 28th, 2010 at 15:48

    Hi,

    Checked the HP agents via de System Management homepage and the agents seems to be working fine. I guess that HP changed somethings into the hpasm software. we have esx servers with 8.2 where is works, 8.6 doesn’t work for all servers where we updated the HP management software.

    Regards Steven

  60. Steven Says:
    October 28th, 2010 at 16:32

    Hi

    Found the error : the install script of the HP management agents added “rwcommunity ****** 127.0.0.1 on top of the /etc/snmp/snmpd.conf file. changed the IP address 127.0.0.1 to the nagios server and now it’s working. Anyway thanks for the great plugin and support

    Steven

  61. mx Says:
    November 3rd, 2010 at 8:54

    Hi !

    Sorry Bad my engl :( ( russian )

    os: CentOS 5.5 x86_64 plugins: check_hpasm-4.2.1.1.tar.gz

    ./check_hpasm WARNING – status of all 6 dimms is n/a (please upgrade firmware), System: ‘proliant dl180 g6′, S/N: ‘CZJ0360L0S’, ROM: ‘O20 08/17/2010′ | fan_1=55% fan_2=55% fan_3=59% fan_4=53% temp_1_memory_bd=25;87;87 …….

    hpasmcli -s “show dimm” … Cartridge #: 0 Processor #: 2 Module #: 6 Present: Yes Form Factor: fh Memory Type: 5h Size: 4096 MB Speed: 1333 MHz Status: N/A …

    Really to – please upgrade firmware ?

    Thaks !

    lausser Reply:

    I can’t tell you what to do. Upgrading firmware is my only idea. Please ask your HP representative how to get rid of the ‘N/A’ in hpasmcli.

  62. rb Says:
    November 6th, 2010 at 9:42

    Hi, thx for plugin. But it’s not working in my environment. nagios is installed on a VM – centos 54

    when i run snmpwalk on my nagios server, it works fine ( is a windows 2008 DL360G6, psp installed):

    snmpwalk -c private -v1 1.3.6.1.4.1.232

    but check_hpasm returns an error:

    [root@nagios libexec]# ./check_hpasm -H -C private CRITICAL – could not find Net::SNMP module, wrong device [root@nagios libexec]# ./check_hpasm -H -C public CRITICAL – could not find Net::SNMP module, wrong device

    The perl snmp module is installed on the nagios host:

    rpm -qa

    net-snmp-perl-5.3.2.2-9.el5_5.1

    I added “dlmod cmaX /usr/lib64/libcmaX64.so” in /etc/snmp/snmp.conf

    ll /usr/lib64/libcmaX64.so

    lrwxrwxrwx 1 root root 16 Nov 6 08:10 /usr/lib64/libcmaX64.so -> ./libcmaX64.so.1

    but still not working …

    Any idea? Thx a lot

    lausser Reply:

    First make sure that the Net::SNMP perl module is really usable.

    perl -e 'use Net::SNMP;'
    must not show an error message. Have a look at the first line of your check_hpasm file, where you’ll find something like #!/usr/bin/perl Make sure you use this path when you run the test above.
  63. rb Says:
    November 7th, 2010 at 12:37

    perl -e … outputs “Can’t locate Net/SNMP.pm in @INC…..

    so i found http://nagios.manubulon.com/faq.html#FAQ1 and installed Net::SNMP via CPAN

    check_hpasm was working now, but not in nagios.

    there was a wrong path in commands.cfg: “command_line $USER1$/custom/libexe/check_hpasm -H $HOSTAD”

    changed it to $USER1$/check_hpasm and now it’s working fine in nagios ;)

    DANKE!

  64. hec Says:
    November 12th, 2010 at 13:50

    Hallo lausser, vielen Dank für den Einbau des Timeout handlers.

    Noch eine Anregung: Wie wäre es die NIC Settings abzufragen?

  65. Monitoring health (SMART data, temperatures etc.) of many remote computers Says:
    November 22nd, 2010 at 22:57

    [...] and have the passive checks query those tools. check_openmanage does that for dell servers and check_hpasm for HP hardware. With those tools you monitor all hardware in the servers (except if you add other [...]

  66. TimE Says:
    December 30th, 2010 at 12:53

    Hi, is it possible to show alerts for a specific hardware device, for example show only memory, disks, cpu, etc.? I know you can use blacklists but with things like disks you have to add every possible combination to exclude the disks. Thanks Tim

    lausser Reply:

    No, it’s not possible to pick one category.

  67. Ciro Iriarte Says:
    February 1st, 2011 at 1:34

    I got it running, but the plugins needs about 90 seconds to finish, is this the expected behavior?.

    lausser Reply:

    It’s not indented, it’s just the time your machine needs to send the informations needed by the plugin. 90 seconds are long, it takes usually less than 10 seconds.

    Ciro Iriarte Reply:

    @Ciro Iriarte,

    s/plugins/plugin/g

  68. Jason Says:
    February 17th, 2011 at 2:11

    Great plugin! One of the best. Have a bit of trouble blacklisting a physical drive, I need to blacklist physical drive 1i:1:2. I have tried dapd:1:2, dapd:1:1:2, and dapd:1i:1:2. Blacklisting other components works (tested removing a power supply and running -b p:2 and it worked). Output shows “physical drive 1i:1:2 is failed” and it is directly attached. For kicks, I tried scpd with the same variations and no luck. Is there something weird about how it is parsing/looking for the “1i” in the beginning? Am I missing something?

    lausser Reply:

    Hi, can you mail me the output of “snmpwalk … 1.3.6.1.4.1.232″ of your machine please? Gerhard

  69. Marco Kohn Says:
    February 24th, 2011 at 15:36

    Hi,

    great plugin. I’ ve blacklisted some temp-values an with -v they are marked as blacklisted. The problem is, that in performance-data the values are already present. I would seperate some sind checks such as memoy temperature … and when i check disks the perf-data from the remperature are not really happend at this place. Make I some mistakes?

  70. hec Says:
    February 25th, 2011 at 11:14

    Hallo lausser, aus aktuellem Anlass eine Frage zum Plugin:

    Wir benötigen mehr Infos (z.B. Disk Model/Grösse) um diese Infos gleich an z.B. HP weiterzuleiten. Ist es geplant das Plugin dahingehend zu erweitern? Wenn erlaubt wurde ich gern mein Glück versuchen und die Änderung hier posten…

    mfg sven

    lausser Reply:

    Das ist nicht vorgesehen und ich halte ein Überladen der Ausgabe mit tech. Details generell nicht für sinnvoll.

    hec Reply:

    ok, danke für die Info…

    … Details nur im Fehlerfall anzeigen auch nicht? :) …bin schon ruhig…

  71. Grant Says:
    March 2nd, 2011 at 7:01

    We have this plugin working for several HP server models: -DL360 G6 -DL360 G2 -DL380 G5 -ML370 G3

    However, we own a number of HP DL380 G4 servers, and I can’t get it to work correctly on this model (despite installing the same Agent, etc). Does anyone have the plugin working for this particular server running Server 2003 X86? If so, any ideas for me?

  72. Carl Lennart Says:
    March 11th, 2011 at 13:24

    Hello mate, thanks for cool script!

    I’m wondering about an output I got which stated that “another hpasmdcli is running”. How is this handled, dose it stop the check or is it just “reminder”?

    Br Lennart

    lausser Reply:

    It stops the check. check_hpasm calls the hpasmcli script to aquire hardware information. If you get this message, this means, somebody else is running hpasmcli and probably forgot to exit from it (it has a prompt). But: there can only be one hpasmcli at a time so your check_hpasm is prematurely aborted.

  73. Jan Hakala Says:
    March 14th, 2011 at 17:10

    Hi,

    I really like this scrpit but I have found som minor bugs. I have had two different machines that harddrives have failed and in both cases script reports wrong physical drive. In my first case HP ML370 G5 with SA P400 in Slot1 (Internal on MB) The script says: CRITICAL – physical drive 2:7 is failed, da controller 2 in slot 1 needs attention, logical drive 2:1 is recovering, System: ‘proliant ml370 g5 And the failed drive is Port:2I, Box:1 Bay:1 Ant the second case is on A DL380 G5 with P800 card and an MSA70 box attached. SERVICE ALERT: STO-OA01;HP Hardware;CRITICAL;HARD;3;CRITICAL – physical drive 2:23 is degraded, da controller 2 in slot 3 needs attention, System: ‘proliant dl380 g5 In IRL it was physical drive 16 that was failing. The third thing is one HP DL360 G6 and on this machine LSI adapter (SAS 3000 Series) and i can´b blacklist this device with blacklist switch. Maybee it it so that you can´t blacklist SAS devices yet?

    Kind Regards Jan

    lausser Reply:

    Section “Call to participate”

  74. baq Says:
    March 17th, 2011 at 19:38

    Hi,

    /usr/lib64/nagios/libexec/check_hpasm

    CRITICAL – hpasmd needs to be restarted, System: ‘unknown’, S/N: ‘unknown’, ROM: ‘unknown’

    lausser Reply:

    So what?

    baq Reply:

    @lausser, I dont now how resolve this problem.

    baq Reply:

    I dont install hpasm deamon. where can I download it for Debian6 ?

    lausser Reply:

    I have no idea. You must ask HP. I doubt Debian is certified at all.

    baq Reply:

    I was install hpasm, but when I started deamon : (Debian6, amd86_64)

    Starting Proliant System Health Monitor (hpasmd): [ SUCCESS ]

    Starting Foundation Agents (cmafdtn): cmathreshd cmahostd cmapeerd Starting Threshold agent (cmathreshd): [ SUCCESS ]

    Starting Host agent (cmahostd): [ SUCCESS ]

    Starting SNMP Peer (cmapeerd): [ SUCCESS ]

    Starting Server Agents (cmasvr): cmastdeqd cmahealthd cmaperfd cpqriisd cmasm2d cmarackd Starting Standard Equipment agent (cmastdeqd): [ SUCCESS ]

    Starting Health agent (cmahealthd): [ SUCCESS ]

    Starting Performance agent (cmaperfd): [ SUCCESS ]

    cpqriisd requires hp_ilo. [ SUCCESS ]

    Starting RIB agent (cmasm2d): [ SUCCESS ]

    cpqriisd requires hp_ilo. [ SUCCESS ]

    Starting Rack agent (cmarackd): [ SUCCESS ]

    Starting Storage Agents (cmastor): cmaeventd cmaidad cmafcad cmaided cmascsid cmasasd Starting Storage Event Logger (cmaeventd): [ SUCCESS ]

    Starting IDA agent (cmaidad): [ SUCCESS ]

    Starting FCA agent (cmafcad): [ SUCCESS ]

    Starting IDE agent (cmaided): [ SUCCESS ]

    FATAL: Module sg not found. Starting SCSI agent (cmascsid): [ SUCCESS ]

    Starting SAS agent (cmasasd): [ SUCCESS ]

    Starting NIC Agents (cmanic): All agents Starting NIC Agent Daemon (cmanicd): Unable to determine if cmanic successfully started

    The binary “/opt/compaq/hpasmd/bin/hpasmxld” depends on “linux-vdso.so.1″. The binary “/opt/compaq/hpasmd/bin/hpasmxld” depends on “/usr/lib/libhpasmintrfc64.so.2″. The binary “/opt/compaq/hpasmd/bin/hpasmxld” depends on “/lib/libc.so.6″. The binary “/sbin/hpbootcfg” depends on “linux-vdso.so.1″. The binary “/sbin/hpbootcfg” depends on “/usr/lib/libhpasmintrfc64.so.2″. The binary “/sbin/hpbootcfg” depends on “/usr/lib/libhpev64.so.1″. The binary “/sbin/hpbootcfg” depends on “/lib/libc.so.6″. hpasm: Server Management is not fully enabled touch: cannot touch `/var/lock/subsys/hpasm’: No such file or directory

  75. Zyth Says:
    March 18th, 2011 at 17:01

    First off, thanks for a brilliant plugin.

    I have a (potensially) stupid question I hope someone can answer me though; How do you run check_hpasm in local mode through NSclient on a Windows box, without installing Perl?

    lausser Reply:

    You just can’t do that.

  76. Daniel Says:
    March 24th, 2011 at 3:12

    Hi, Gerhard!

    First I want to thank you for the work you’ve done with this plugin for Nagios which interacts with both hpacucli and hpasmcli. It has been very useful for me.

    Some time ago I use it with DL380 G5 and DL380 G6 smoothly.

    However, I wanted to inform some differences in the output that I have observed when using the plugin with DL180 G6 servers. I’m surprised that without using “-v”, it does not display information about the RAID; This doesn’t happen with any of the models I have of DL380 series. What could be the difference?

    It also seems to be a problem retrieving information on the DIMMs.

    root@ss09:~# /usr/local/nagios/libexec/check_hpasm WARNING – status of all 2 dimms is n/a (please upgrade firmware), System: ‘proliant dl180 g6′, S/N: ‘MXQ03906M6′, ROM: ‘O20 08/17/2010′

    This is the output of the script in “call to participate “:

    root@ss09:~# ./gerhard.sh server server System : ProLiant DL180 G6 server Serial No. : MXQ03906M6 server ROM version : O20 08/17/2010 server iLo present : No server Embedded NICs : 2 server NIC1 MAC: d4:85:64:53:f1:7c server NIC2 MAC: d4:85:64:53:f1:7d server server Processor: 0 server Name : Intel Xeon server Stepping : 2 server Speed : 2400 MHz server Bus : 532 MHz server Core : 4 server Thread : 8 server Socket : 2 server Level2 Cache : 1024 KBytes server Status : Ok server server Processor: 1 server Name : Intel Xeon server Stepping : 2 server Speed : 2400 MHz server Bus : 532 MHz server Core : 4 server Thread : 8 server Socket : 1 server Level3 Cache : 12288 KBytes server Status : Ok server server Processor total : 2 server server Memory installed : 8192 MBytes server ECC supported : Yes server powersupply powersupply Power supply #1 powersupply Present : Yes powersupply Redundant: Yes powersupply Condition: Ok powersupply Hotplug : Not supported powersupply Power supply #2 powersupply Present : Yes powersupply Redundant: Yes powersupply Condition: Ok powersupply Hotplug : Not supported powersupply fans fans Fan Location Present Speed of max Redundant Partner Hot-pluggable fans — ——– ——- —– —— ——— ——- ————- fans #1 SYSTEM Yes NORMAL 55% N/A N/A No fans #2 SYSTEM Yes NORMAL 55% N/A N/A No fans #3 SYSTEM Yes NORMAL 61% N/A N/A No fans #4 SYSTEM Yes NORMAL 53% N/A N/A No fans fans temp temp Sensor Location Temp Threshold temp —— ——– —- ——— temp #1 MEMORY_BD 24C/75F 100C/212F temp #2 MEMORY_BD – 100C/212F temp #3 MEMORY_BD 24C/75F 100C/212F temp #4 MEMORY_BD – 100C/212F temp #5 MEMORY_BD – 100C/212F temp #6 MEMORY_BD – 100C/212F temp #7 MEMORY_BD 40C/104F 100C/212F temp #8 MEMORY_BD – 100C/212F temp #9 MEMORY_BD – 100C/212F temp #10 MEMORY_BD – 100C/212F temp #11 MEMORY_BD – 100C/212F temp #12 MEMORY_BD – 100C/212F temp #13 MEMORY_BD – 100C/212F temp #14 MEMORY_BD 40C/104F 100C/212F temp #15 SYSTEM_BD – 100C/212F temp #16 SYSTEM_BD – 100C/212F temp #17 AMBIENT 20C/68F 100C/212F temp #18 AMBIENT 27C/80F 100C/212F temp #19 SYSTEM_BD 17C/62F 60C/140F temp #20 SYSTEM_BD 29C/84F 100C/212F temp #21 SYSTEM_BD 24C/75F 100C/212F temp #22 SYSTEM_BD 24C/75F 100C/212F temp #23 SYSTEM_BD 24C/75F 100C/212F temp #24 SYSTEM_BD 22C/71F 100C/212F temp #25 SYSTEM_BD 22C/71F 100C/212F temp #26 SYSTEM_BD 21C/69F 100C/212F temp #27 SYSTEM_BD 21C/69F 100C/212F temp #28 SYSTEM_BD 24C/75F 100C/212F temp #29 SYSTEM_BD 23C/73F 100C/212F temp #30 SYSTEM_BD 24C/75F 100C/212F temp #31 SYSTEM_BD 27C/80F 100C/212F temp #32 SYSTEM_BD 27C/80F 100C/212F temp #33 SYSTEM_BD 20C/68F 100C/212F temp #34 SYSTEM_BD 21C/69F 100C/212F temp #35 SYSTEM_BD 50C/122F 120C/248F temp temp dimm dimm Cartridge #: 0 dimm Processor #: 1 dimm Module #: 2 dimm Present: Yes dimm Form Factor: fh dimm Memory Type: 5h dimm Size: 4096 MB dimm Speed: 1333 MHz dimm Status: N/A dimm dimm Cartridge #: 0 dimm Processor #: 1 dimm Module #: 4 dimm Present: Yes dimm Form Factor: fh dimm Memory Type: 5h dimm Size: 4096 MB dimm Speed: 1333 MHz dimm Status: N/A dimm dimm config config Smart Array P410 in Slot 1 (sn: PACCRID103409RW)config config array A (SATA, Unused Space: 0 MB)config config logicaldrive 1 (5.5 TB, RAID 5, OK)config config physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SATA, 1TB, OK)config physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SATA, 1TB, OK)config physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SATA, 1TB, OK)config physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SATA, 1TB, OK)config physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SATA, 1TB, OK)config physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SATA, 1TB, OK)config physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SATA, 1TB, OK)config physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SATA, 1TB, OK, spare)config status status Smart Array P410 in Slot 1status Controller Status: OKstatus Cache Status: OKstatus Battery/Capacitor Status: OKstatus status root@ss09:~#

    Thanks in advance for your reply.

    Regards, Daniel

  77. Daniel Says:
    March 24th, 2011 at 3:29

    Hello again, Gerhard!

    Since the comment was improperly formatted, sent it by mail.

    Regards, Daniel

  78. Ingmar Verheij – The dutch IT guy » Monitor “HP Proliant Server health” on “Citrix XenServer” with Nagios » Ingmar Verheij - The dutch IT guy Says:
    July 8th, 2011 at 11:07

    [...] Plugins can be found at Nagios Exchange, this is where I found the check check_hpasm plugin (direct link). Unfortunately this plugin does not check the ASR status.In this article I will describe how [...]

  79. Monitor HP Proliant with Nagios or Op5 Monitor | An It-Slave in the digital saltmine Says:
    December 8th, 2011 at 23:02

    [...] check_hpasm can be downloaded from Console [...]

  80. Installer shinken ? c’est facile ! | Communauté Francophone de la Supervision Libre Says:
    February 2nd, 2012 at 14:51

    [...] check_hpasm [...]

  81. check_hpasm unterstützt jetzt Proliant Gen8 – ConSol* Labs Says:
    November 26th, 2012 at 1:17

    [...] Ein Kunde mit Supportvertrag spendierte mir die Zeit und ich konnte mir endlich genauer ansehen, was da los war. Der uninitialized-Fehler tritt auf, wenn das Eventlog eines Proliant gelesen wird und der Wert von EventUpdateTime aus binären Nullen anstatt eines Datumswertes besteht. Scheint ein Bug in der HP-Firmware zu sein. Den Zeitstempel rekonstruiere ich nun einfach aus den Zeiten benachbarter Events. Was hat es nun mit den Minusgraden auf sich? Besagte Sensoren liefern realistische Temperaturwerte, dienen aber nicht dazu, bei einer Schwellwertverletzung den Server herunterzufahren oder die Leistung zu drosseln. Sie gehören zum neuen Feature sea of sensors. Damit wird ein 3D-Abbild der Temperaturverteilung im Server und sogar im ganzen Rechenzentrum erstellt. Es hilft dabei, überlastete Rechner zu identifizieren und die Workload gleichmässiger zu verteilen. Für check_hpasm bedeutet es schlichtweg, dass die Temperaturmesswerte solcher Sensoren nur noch als Performancedaten ausgegeben werden. Ein Vergleich mit Thresholds findet nicht mehr statt. Das neue Release von check_hpasm heisst 4.6.3 und ist an gewohnter Stelle zu finden: http://labs.consol.de/nagios/check_hpasm [...]