check_nwc_health

Posted on September 24th, 2012 by lausser

Hinweis: Praxis-Kurs

Am 19.-21.5. sowie 6.-8.10.2014 findet in der ConSol*-Akademie in München wieder die dreitägige Einsteiger-Schulung zum Thema Open Source Monitoring mit Nagios - Der Praxiskurs statt. Nähere Informationen finden sie unter http://www.consol.de/open-source-monitoring/schulungen/open-source-monitoring-praxiskurs/
Während der drei Tage werden sie auch in engem Kontakt mit dem Monitoring-Team von ConSol* stehen. Fragen, die über den Kursinhalt hinausgehen, werden beim gemeinsamen Essen, bei einem Bier am Abend oder wann immer sie ihre Nase in eins unserer Büros stecken, gerne beantwortet.

Beschreibung

check_nwc_health ist ein Plugin für Nagios, Shinken und Icinga, welches der Überwachung von Netzwerkkomponenten dient. Es ist in der Lage, Interfacestatistiken, Hardware (CPU, Speicher, Lüfter, Stromversorgungsmodule etc.), Firewall-Policies, HSRP, Loadbalancer-Pools, Prozessor- und Speichernutzung abzufragen.

Die Kommunikation mit den Endgeräten wird über SNMP abgewickelt, wobei die Versionen 1, 2c und 3 unterstützt werden.

Bisher können damit folgende Netzwerkkomponenten, Firewalls, SAN-Switches und Loadbalancer überwacht werden:

Cisco IOS, Cisco Nexus, F5 BIG-IP, CheckPoint Firewall1, Juniper NetScreen, HP Procurve, Nortel, Brocade 4100/4900, EMC DS 4700, EMC DS 24, Allied Telesyn. Blue Coat SG600, Cisco Wireless Lan Controller 5500, Brocade ICX6610-24-HPOE, NX-OS, FOUNDRY-SN-AGENT-MIB, FRITZ!BOX 7390, FRITZ!DECT 200, Juniper IVE, Pulse-Gateway MAG4610, Cisco IronPort AsyncOS, Foundry….

Allerdings ist nicht jeder Modus für jeden Gerätetyp verfügbar.

Detaillierte Anleitung gibt es keine, dazu fehlt mir die Zeit. Ausprobieren und Code lesen führt auch zum Ziel. (Das Argument “ich hab aber keine Zeit” lasse ich nicht gelten)

 

Wer ein paar Euro übrig hat, kann sich die Anleitung auch von mir vorlesen lassen. Ich halte einen Vortrag auf der diesjährigen Monitoring-Konferenz in Nürnberg:

Monitoring von Netzwerkkomponenten mit check_nwc_health (DE)

Die Komponenten einer Unternehmens-IT können nur zusammenarbeiten und mit der Außenwelt kommunizieren, wenn das Netzwerk funktioniert. Switches, Router, Firewalls und Loadbalancer bilden das Rückgrat vernetzter Systeme und sind somit Primärziele für das Monitoring. Bisher gab es für jedes Fabrikat und jeden Abfragetyp ein extra Plugin. Dies führte dazu, dass in Nagios-Installationen mehr als zehn Plugins, natürlich jedes mit seiner eigenen Kommandozeilensyntax, zum Einsatz kamen. Um diesen Irrsinn zu beenden wurde check_nwc_health geschrieben. Es hat sich zum Ziel gesetzt, sämtliche Anforderungen beim Monitoring der gebräuchlichsten Netzwerkkomponenten in einem einzigen Plugin zu bündeln.
Mittlerweile wird es in mehreren Umgebungen mit jeweils tausenden von Netzknoten (Cisco, Juniper, HP, CheckPoint, F5, Brocade, Bluecoat uvm.) erfolgreich eingesetzt und die Liste der Features wächst stetig.
Gerhard Laußer zeigt, wie mit wenig Aufwand ein Netzwerkmonitoring auf Basis von check_nwc_health eingerichtet werden kann und wie man das Plugin mit wenigen Zeilen Code für spezielle Anforderungen aufbohren kann.

 

Download

check_nwc_health-3.0i.tar.gz

 

Changelog

  • 2014-01-24 2.6.5
    - add mode –check-config, which finds unsaved (cisco only) configs (Thanks Simon Meggle)
  • 2014-01-18 2.6.4.3
    - bugfix in uptime (Thanks Finn Christiansen)
  • 2014-01-15 2.6.4.2
    - add http connection checks for bluecoat sg
  • 2014-01-14 2.6.4
    - add cisco ccm
  • 2014-01-11 2.6.3.1
    - support more SecureOS devices (i bought Juniper SSG5)
    - bugfix in upnp-detection
  • 2013-12-21 2.6.3
    - output number of sessions for f5 bigip load balancer pools
    - deal with obviously wrong values from devices (20000% cpu usage)
    - foundry server load balancing
    - bugfix in interface-* for Juniper IVE
    - filter hsrp groups by name
  • 2013-11-08
    - added support for role based login for Fritz Boxes (available since
      FRITZ!OS 5.50). Use –community for password, –username for username if
      role based security is switched on.
  • 2013-11-09
    - bugfix for fritzbox
  • 2013-11-08 2.6.1
    - hardware-health for Checkpoint Firewall-1
  • 2013-11-07 2.6
    - finished bgp-peer-status (focus on as numbers with –name2)
    - admin down with –interface-status can have any level with –mitigation
  • 2013-10-31 2.5.4.1
    - add Fujitsu Intelligent Blade Panel 30/12
  • 2013-10-30 2.5.4

    - add bgp

  • 2013-10-01 2.5.3

    - detect more brocade devices

  • 2013-09-26 2.5.2.1

    - supress double output for html f5 pool members

  • 2013-09-25 2.5.2

    - add html output for f5 pool members

  • 2013-09-18 2.5.1.2
    - remove a leftover Data::Dumper (Thanks Frank Belau)
  • 2013-09-17 2.5.1.1
    - bugfix in lsmpi_io
  • 2013-09-11 2.5.1
    - set a 100% threshold for lsmpi_io memory pools of Cisco ASR (Thanks James Clark & Perun)
  • 2013-09-10 2.5
    - implemented offline mode with –snmpwalk & –offline
  • 2013-09-03 2.4
    - add Cisco IronPort AsyncOS
  • 2013-08-27 2.3
    - add Juniper IVE (ex. Pulse-Gateway MAG4610)
    - add count-connections for cisco asa
  • 2013-07-11 2.2
    - add memory-usage for checkpoint
    - add detection for cpx
  • 2013-07-09 2.1.1
    - skip non-interface files in /sys/class/net for servertype linuxlocal (Thanks Sven Nierlein)
    - better error handling on unwritable statefiles/dirs
  • 2013-06-12 2.1
    - add "–servertype ifmib" so you can use "–mode interface" with every kind of ifmib-capable device
  • 2013-06-01 2.0
    - added FRITZ!DECT 200 smart plugs
  • 2013-05-27 1.9.8.1
    - bugfix for the bugfix in commandline Options (Thanks Webspace Mario)
  • 2013-05-23 1.9.8
    - add Brocade Communications Systems, Inc. ICX6610-24-HPOE, IronWare
    - bugfix in commandline options (Thanks TheCry)
  • 2013-04-20 1.9.7.3
    - fixed a bug in snmpwalk simulation and savestate
  • 2013-04-08 1.9.7.2
    - bugfix, interfaces were shown twice in list-interfaces
  • 2013-03-29 1.9.7.1
    - bugfix in link-aggregation-availability
  • 2013-03-25 1.9.7
    - added link-aggregation-availability
  • 2013-03-19 1.9.6
    - fixed a bug in interface-*
    - speedup in interface-* (with –name and 64bit)
    - added a hostname/community hash to statefiles
  • 2013-03-12 1.9.5.1
    - bugfix in interface-usage, snmp bulk walks and a long list of interfaces
  • 2013-02-24 1.9.5
    - add mode interface-availability
  • 2013-02-11 1.9.4
    - add Cisco Wireless LAN Controller 5500
  • 2013-02-11 1.9.3.1
    - fixed a bug in statefiles with uppercase directory names. (Thanks Matthias Gallinger)
  • 2013-02-10 1.9.3
    - add blue coat sg600
  • 2013-02-02 1.9.2
    - removed my static ip from FRITZ!BOX interface-usage (Thanks Stef)
  • 2013-01- 1.9.1
    - fixed a bug in FFritz!BOX uptime (Thanks Lars Urban)
  • 2013-01-21 1.9
    - add uptime, cpu-load, memory-usage and interface-usage for AVM FRITZ!Box 7390 with Firmware 84.05.50
  • 2013-01-13 1.8
    -add cpu&memory check for juniper netscreen
  • 2013-01-12 1.7.1
    - add a name caching mechanism for f5 bigip pools
  • 2013-01-08 1.7
    - add f5 bigip pool completeness
    - add member info for failed f5 pools
  • 2012-12-10 1.6
    - add checkpoint firewall-1
  • 2012-11-23 1.5
    - add 64bit interfaces
  • 2012-09-26 1.4.9.1
    - fix a bug in uptime
  • 2012-09-24 1.4.9
    - add hp procurve cpu-load and memory-usage
    - fix a bug in cisco memory perfdata
  • 2012-08-28 1.4.8
    - add hp procurve hardware
  • 2012-08-21 1.4.7.1
    - fix a bug in servertype locallinux, interfaces and –name (Thanks Simon Meggle)
  • 2012-08-21 1.4.7
    - add f5 bigip
    - bugfix in mode uptime
  • 2012-08-10 1.4.6
    - add mode uptime
  • 2012-08-10 1.4.5.2
    - fix a bug in statefilesdir creation under omd
  • 2012-08-02 1.4.5.1
    - add more hardware info for EMC-DS24M2 (McData Sphereon 4500)
  • 2012-07-31 1.4.5
    - add UCD-MIB for SecureOS (McAfee Sidewinder)
  • 2012-07-31 1.4.4
    - add fibre alliance mib sensor table for MeOS/DS-4700M
  • 2012-07-20 1.4.3.1
    - add the index to interface names, if interfaces all have the same name
    - first experiments with MeOS
  • 2012-07-12 1.4.3
    - fix a bug in the role parameter for hsrp
    - fix a temperature index where ios doesn’t set the counter itself
    - add mib2-interface-modes to brocade fabos
  • 2012-07-05 1.4.2
    - add mode encode for interface names with ‘ or "
  • 2012-07-05 1.4.1
    - add –ifspeedin, –ifspeedout, –ifspeed (used for asymmetric mpls)
  • 2012-06-22 1.4
    - add linux local interfaces (interface-usage/errors only) with –servertype linuxlocal
    - add mode walk
    - rename brocade300 -> fabos
  • 2012-04-23 1.3
    - add mode list-interfaces-detail (Cisco only)
    - add brocade300 (hardware-health,memory-usage,cpu-load only)
  • 2012-03-29 1.2
    - add support for Cisco Nexus (cpu, mem, )
    - add Nexus sensors
    - add Allied Telesyn (only interfaces so far)
  • 2012-03-19 1.1.1.1
    - bugfix in list-hsrp-groups
    - –units KBi/MBi/GBi for interface-usage
  • 2012-02-22 1.1.1
    - add mode hsrp-failover (Thanks Munich)
  • 2012-02-22 1.1
    - add mode hsrp-state (Thanks Munich)
  • 2012-01-05
    - some more debug messages
  • 2012-01-04 1.0
    - Nortel devices are recognized. (only interfaces can be queried by now)

23 Responses to “check_nwc_health”

  1. Monitoring einer FRITZ!Box 7390 mit check_nwc_health – ConSol* Labs Says:
    January 22nd, 2013 at 3:40

    [...] ist die Version 1.9, mit der folgende Abfragen möglich [...]

  2. Bjorn Frostberg Says:
    January 27th, 2013 at 11:06

    Hi,

    Any plans to add hardware support for JunOS and Juniper Netscreen?

    OIDs etc are available here, in this plugin:

    http://exchange.nagios.org/directory/Plugins/Hardware/Network-Gear/Cisco/Check-various-hardware-environmental-sensors/details

    Also, is the Cisco XE HW check complete? It is a lot faster than above script.

    Regards, Bjorn

  3. Allesfresser check_nwc_health kriegt nicht genug – ConSol* Labs Says:
    February 10th, 2013 at 17:05

    [...] Das Plugin gibts hier… [...]

  4. F.A.N : Sortie de la version 2.4 | Communauté Francophone de la Supervision Libre Says:
    March 17th, 2013 at 12:15

    [...] bien connu check_nwc_health livré par Consol.de pour interroger une multitude d’équipements [...]

  5. Wolfgang Says:
    March 25th, 2013 at 17:48

    Allied Telesyn AT-8350GB supported? Script sagt: CRITICAL – unknown device(AT-8350GB), wrong device

    lausser Reply:

    Servus, in plugins-scripts/NWC/Device.pm nach AlliedTelesyn suchen und } elsif ($self->{productname} =~ /AT-\d+GB/i) { bless $self, ‘NWC::AlliedTelesyn’; $self->debug(‘using NWC::AlliedTelesyn’); einfügen.

  6. Ferdi Says:
    April 24th, 2013 at 15:14

    I’ve tried your plugin for Nagios but I ran into some problems after compiling. I’m using Nagios XI 2012v1.6 on CentOS 6.3 (x64) <- virtual machine from Nagios.

    I’ve compiled the script with

    ./configure –prefix=/usr/local/nagios –with-nagios-user=nagios –with-nagios-group=nagios –with-perl=/usr/bin/perl Make Make install

    When I execute the plugin (as root) to a HP Procurve switch I get an error: [root@SV14808 libexec]# ./check_nwc_health –hostname AC07887 Use of uninitialized value in pattern match (m//) at ./check_nwc_health line 14348. Use of uninitialized value in string eq at ./check_nwc_health line 14352. Use of uninitialized value in string eq at ./check_nwc_health line 14352. Use of uninitialized value in string eq at ./check_nwc_health line 14352. Use of uninitialized value in string eq at ./check_nwc_health line 14352. Use of uninitialized value in string eq at ./check_nwc_health line 14352. Use of uninitialized value in string eq at ./check_nwc_health line 14352. Use of uninitialized value in string eq at ./check_nwc_health line 14352. Use of uninitialized value in string eq at ./check_nwc_health line 14352. Use of uninitialized value in string eq at ./check_nwc_health line 14352. Use of uninitialized value in string eq at ./check_nwc_health line 14352. Use of uninitialized value in string eq at ./check_nwc_health line 14352. Use of uninitialized value in string eq at ./check_nwc_health line 14352. Use of uninitialized value in string eq at ./check_nwc_health line 14352. Use of uninitialized value in string eq at ./check_nwc_health line 14352. Use of uninitialized value in string eq at ./check_nwc_health line 14352. Use of uninitialized value in string eq at ./check_nwc_health line 14352. Use of uninitialized value in string eq at ./check_nwc_health line 14352. Use of uninitialized value in string eq at ./check_nwc_health line 14352. Use of uninitialized value in string eq at ./check_nwc_health line 14352. Use of uninitialized value in string eq at ./check_nwc_health line 14352. Use of uninitialized value in string eq at ./check_nwc_health line 14352. Use of uninitialized value in string eq at ./check_nwc_health line 14352. Use of uninitialized value in string eq at ./check_nwc_health line 14352. Use of uninitialized value in string eq at ./check_nwc_health line 14352. Use of uninitialized value in string eq at ./check_nwc_health line 14352. Use of uninitialized value in string eq at ./check_nwc_health line 14352. Use of uninitialized value in string eq at ./check_nwc_health line 14352. Use of uninitialized value in string eq at ./check_nwc_health line 14352. Use of uninitialized value in string eq at ./check_nwc_health line 14352. Use of uninitialized value in string eq at ./check_nwc_health line 14358. Use of uninitialized value in printf at ./check_nwc_health line 14360. UNKNOWN – mode check_nwc_health $Revision: 1.9.7.3 $ [http://labs.consol.de/nagios/check_nwc_health]

    Am I missing something. Hope to hear from you.

    Ferdi Reply:

    @Ferdi, Solved it. Had to use –mode.. :)

  7. Ferdi Says:
    May 3rd, 2013 at 15:58

    Are there plans to support HP Procurve 5800 switches. All works well exept for those switches @ our infrastructure.

    lausser Reply:

    Currently there are no such plans. New Hardware will be implemented every time there is a customer’s request, so in the future it can’t be ruled out.

  8. cark Says:
    May 29th, 2013 at 14:29

    I get an error with AlliedTeleis:

    Can’t locate object method “new” via package “NWC::AlliedTelesyn::Component::EnvironmentalSubsystem” (perhaps you forgot to load “NWC::AlliedTelesyn::Component::EnvironmentalSubsystem”?) at /usr/lib64/nagios/plugins/check_nwc_health line 3891.

    can you help out with this error ?

    lausser Reply:

    Hi, environmental checks for Allied Telesis were not implemented. Support customers may order this feature, so perhaps in the future it will be available.

  9. Perun Says:
    June 11th, 2013 at 8:38

    ist es möglich einen Check zu unterbinden? z.B. bei dem Cisco ASR wenn man memory-usage Test ausführt bringt lsmpi_io_usage immer 99% was ein false positive ist… lässt sich dies irgendwie excludieren?

    lausser Reply:

    @Perun, Servus, in Version 2.5.1 ist es gefict. Diese Sorte von Memory Pool bekommt autom. Thresholds von 100%

  10. Peter Says:
    June 24th, 2013 at 10:12

    Hallo zusammen,

    vorweg ich bin ein ziemlicher linuxnoob. Habe mich jetzt aber schon ein weilichen mit dem Pi beschäftigt und mich gut durch diverse foren gewelst. Neuerding ebens auch Nagios. Ich verzweifle nur gerade am installieren des Plugins, versuch es mit aptitude aber irgendwie will es nicht.

    Wie habt ihr denn das gemacht? Wäre sau kuhl wenn mir jemand helfen könnte.

    Viele Grüße

    Peter

  11. cmdrhenner Says:
    June 26th, 2013 at 10:33

    Hi,

    habe zwei Checkpoint 4200 zu prüfen. Folgende Fehlermeldung: root@srv-nagios:/tmp/check_nwc_health-2.0/plugins-scripts# ./check_nwc_health –hostname XXX.XX.XX.XX –community TEST –mode ha-role CRITICAL – unknown device(Linux xx-xxx-xx 2.6.18-92cp #1 SMP Sun Feb 10 22:17:51 IST 2013 i686), wrong device

    Anpassungen in Device.pm: elsif ($self->{productname} =~ /Linux xx-xxx/i) { bless $self, ‘NWC::CheckPoint’; $self->debug(‘using NWC::CheckPoint’);

    Neue Fehlermeldung nach Kompilieren: Deep recursion on subroutine “NWC::CheckPoint::init” at ./check_nwc_health line 8486. UNKNOWN – check_nwc_health timed out after 15 seconds

    Können Sie mir bitte einen Tip geben, woran dies liegen könnte?

  12. Peter Jonkers Says:
    June 28th, 2013 at 15:01

    great plugin to start with :)

    we have a HP 3COM 5500G switch and i am getting the follwing error:

    3Com Switch 5500G-EI 24-Port Software Version 3Com OS V3.03.02s168), wrong device

    how can i resolve this issue ?

    thx

    Peter Jonkers

  13. Ton Says:
    July 26th, 2013 at 14:08

    As my collegue Ferdi allready asked, are there any plans to implement support foor the HP A5800-series? We use your plugin for all the other switsches and this is working fine.

  14. HiSPeed Says:
    July 28th, 2013 at 1:11

    Hi,

    thanks for the (probably) great plugin (ie. it seems to be done well from what I’ve skimmed over the source), yet there is one little flaw that keeps it from using any other function than retrieving the uptime on my Fritz!Box 7360 SL (ie. the “1und1 Homeserver 50000″).

    root@myplace:/mynagioscontribdir# ./check_nwc_health --verbose --hostname fbox-ip --port 49000 --mode uptime
    I am a upnp
    ^[[AOK - device is up since 1194 minutes | 'uptime'=1194.28;15:;5:
    root@myplace:/mynagioscontribdir# ./check_nwc_health --verbose --hostname fbox-ip --port 49000 --mode list-interfaces
    I am a upnp
    Deep recursion on subroutine "UPNP::AVM::init" at ./check_nwc_health line 1101.
    ^C
    root@myplace:/mynagioscontribdir# 
    

    This occurs with the 2.0 package from your site as well as the current github “master” package (today’s date being 2013-07-27).

    If you have a fix for this, or if I can help you fix this annoying bummer of a bug, just drop me a line what you need me to do.

    Regards, HiSPeed

  15. Wayne Says:
    August 29th, 2013 at 15:19

    I’m working on getting Nagios setup for my company and I’m using check_nwc_health to monitor errors on our HP Procurve 2626 switches. Here’s the command I’m using:

    $USER1$/check_nwc_health –hostname $HOSTADDRESS$ –community public –mode interface-errors –lookback 900 –critical 2000 $ARG1$

    It seems that the –lookback isn’t having any effect, as it’s warning and critical-ing even though there aren’t any new errors, just some old ones. Is there something I’m doing wrong?

  16. Waso Says:
    August 30th, 2013 at 10:20

    I get CRITICAL usage on ProCurve 2910al-48G (example): CRITICAL – interface 2 usage is in:82.30% (9.81MB/s) out:888.49% (105.92MB/s)

    But all local network speed 100/1000T and interface 2 is down: sw10-2910-48-data# sh int 2

    Status and Counters – Port Counters for port 2

    Name : MAC Address : c09134-d727be Link Status : Down

    lausser Reply:

    @Waso, Should be fixed with 2.5

  17. Nagios belching out Lava – ConSol* Labs Says:
    December 2nd, 2013 at 13:53

    […] can monitor the Dect!200 with check_nwc_health which beside the switched state also measures the energy consumption and current power. But in our […]