Mod Gearman

Posted on September 13th, 2010 by sven

What is Mod-Gearman

Mod_Gearman is an easy way of distributing active Nagios checks across your network and increasing Nagios scalability. Mod-Gearman can even help to reduce the load on a single Nagios host, because its much smaller and more efficient in executing checks.

It consists of three parts:

  • There is a NEB module which resides in the Nagios core and adds servicechecks, hostchecks and eventhandler to a Gearman queue.

  • The counterpart is one or more worker clients executing the checks. Worker can be configured to only run checks for specific host- or servicegroups.

  • And you need at least one Gearman Job Server running.

  • See the common scenarios for some examples.

Presentations

Features

  • Reduce load of your central Nagios machine

  • Make Nagios scalable up to thousands of checks per second

  • Easy distributed setups without configuration overhead

  • Real loadbalancing across all workers

  • Real failover for redundant workers

  • Embedded Perl support for very fast execution of perl scripts

  • Fast transport of passive check results with included tools like send_gearman and send_multi

Download

Support

Changelog

The changelog is available on github.

How does it work

When the Mod-Gearman broker module is loaded, it intercepts all servicechecks, hostchecks and the eventhandler events. Eventhandler are then sent to a generic eventhandler queue. Checks for hosts which are in one of the specified hostgroups, are sent into a seperate hostgroup queue. All non matching hosts are sent to a generic hosts queue. Checks for services are first checked against the list of servicegroups, then against the hostgroups and if none matches they will be sent into a generic service queue. The NEB module starts a single thread, which monitors the check_results where all results come in.

mod_gearman architecture

A simple example queue would look like:

+---------------+------------------+--------------+--------------+
| Queue Name    | Worker Available | Jobs Waiting | Jobs Running |
+---------------+------------------+--------------+--------------+
| check_results | 1                | 0            | 0            |
| eventhandler  | 50               | 0            | 0            |
| host          | 50               | 0            | 1            |
| service       | 50               | 0            | 13           |
+---------------+------------------+--------------+--------------+

There is one queue for the results and two for the checks plus the eventhandler queue.

The workflow is simple:

  1. Nagios wants to execute a service check.

  2. The check is intercepted by the Mod-Gearman neb module.

  3. Mod-Gearman puts the job into the service queue.

  4. A worker grabs the job and puts back the result into the check_results queue

  5. Mod-Gearman grabs the result job and puts back the result onto the check result list

  6. The Nagios reaper reads all checks from the result list and updates hosts and services

You can set some host or servicegroups for special worker. This example uses a seperate hostgroup for Japan and a seperate servicegroup for resource intensive selenium checks.

It would look like this:

+-----------------------+------------------+--------------+--------------+
| Queue Name            | Worker Available | Jobs Waiting | Jobs Running |
+-----------------------+------------------+--------------+--------------+
| check_results         | 1                | 0            | 0            |
| eventhandler          | 50               | 0            | 0            |
| host                  | 50               | 0            | 1            |
| hostgroup_japan       | 3                | 1            | 3            |
| service               | 50               | 0            | 13           |
| servicegroup_selenium | 2                | 0            | 2            |
+-----------------------+------------------+--------------+--------------+

You still have the generic queues and in addition there are two queues for the specific groups.

The worker processes will take jobs from the queues and put the result back into the check_result queue which will then be taken back by the neb module and put back into the Nagios core. A worker can work on one or more queues. So you could start a worker which only handles the hostgroup_japan group. One worker for the selenium checks and one worker which covers the other queues. There can be more than one worker on each queue to share the load.

mod_gearman architecture

Common Scenarios

Load Balancing

Load Balancing

The easiest variant is a simple load balancing. For example if your single Nagios box just cannot handle the load, you could just add a worker in the same network (or even on the same host) to reduce your load on the Nagios box. Therefor we just enable hosts, services and eventhandler on the server and the worker.

Pro:

  • reduced load on your monitoring box

Contra:

  • no failover

Distributed Monitoring

Distributed Monitoring

If your checks have to be run from different network segments, then you can use the hostgroups (or servicegroups) to define a hostgroup for specific worker. The general hosts and services queue is disabled for this worker and just the hosts and services from the given hostgroup will be processed.

Pro:

  • reduced load on your monitoring box

  • ability to access remote networks

Contra:

  • no failover

Distributed Monitoring with Load Balancing

Distributed Monitoring with Load Balancing

Your distributed setup could easily be extended to a load balanced setup with just adding more worker of the same config.

Pro:

  • reduced load on your monitoring box

  • ability to access remote networks

  • automatic failover and load balancing for worker

Contra:

  • no failover for the master

NSCA Replacement

NSCA Replacement

If you just want to replace a current NSCA solution, you could load the Mod-Gearman NEB module and disable all distribution features. You still can receive passive results by the core send via send_gearman / send_multi. Make sure you use the same encryption settings like the neb module or your core won’t be able to process the results or use the accept_clear_results option.

Pro:

  • easy to setup in existing environments

Distributed Setup With Remote Scheduler

Distributed Setup With Remote Scheduler

In case your network is unstable or you need a gui view from the remote location or any other reason which makes a remote core unavoidable you may want this setup. Thist setup consists of 2 independent Mod-Gearman setups and the slave worker just send their results to the master via the dup_server option. The master objects configuration must contain all slave services and hosts. The configuration sync is not part of Mod-Gearman.

Pro:

  • independent from network outtakes

  • local view

Contra:

  • more complex setup

  • requires configuration sync

Gearman Proxy

Gearman Proxy

Sometimes you may need to reverse the direction of the initial connection attempt. Usually the worker and the neb module open the initial connection so they need to access the gearmand port. In cases where no direct connection is possible use ssh tunnel or the Gearman proxy. The Gearman proxy just puts jobs from one gearmand into another gearmand and vice versa.

Just copy the gearman_proxy.pl from the contrib or share directory and adjust the first few lines to match you needs.

Pro:

  • changes direction of initial connection setup

  • buffers network outages

Contra:

  • two more daemon to monitor and maintain

Installation

OMD

Using OMD is propably the easiest way of installing and using Mod-Gearman. You just have to run omd config or set Mod-Gearman to on.

OMD is available for Debian, Ubuntu, Centos/Redhat and SLES.

OMD[test]:~$ omd config set MOD_GEARMAN on
Note
Mod-Gearman is included in OMD since version 0.48.

Debian / Ubuntu

It is strongly recommended to use the official packages or the unoffical packages which contains Debian Squeeze and various Ubuntu packages.

Centos/Redhat

The easy and proper way is to build RPM packages. The following steps assume a Centos 5.7. Other releases may have different versions but should behave similar.

Note
use the prebuild packages if available.

Build/install Gearmand rpms

#> yum install autoconf automake libtool boost141-devel boost141-program-options
#> cd /tmp
#> wget http://launchpad.net/gearmand/trunk/0.33/+download/gearmand-0.33.tar.gz
#> tar zxf gearmand-0.33.tar.gz
#> ln -s gearmand-0.33/support/gearmand.init /tmp/gearmand.init
#> vi gearmand-0.33/support/gearmand.spec
   change in line 9 and 25:
   Requires: sqlite, libevent >= 1.4, boost-program-options >=  1.39
   in
   Requires: sqlite, libevent >= 1.4, boost141-program-options >=  1.39
#> tar cfz gearmand-0.33.tar.gz gearmand-0.33
#> LIBRARY_PATH=/usr/lib64/boost141:/usr/lib/boost141 \
   LD_LIBRARY_PATH=/usr/lib64/boost141:/usr/lib/boost141 \
   CPATH=/usr/include/boost141 \
   rpmbuild -tb gearmand-0.33.tar.gz
#> yum --nogpgcheck install /usr/src/redhat/RPMS/*/gearmand*-0.33-1*.rpm
Note
The link to gearmand.init is a workaround, otherwise the build will fail. It may not be necessary for future gearman versions.

Build/install Mod-Gearman rpms

#> wget http://labs.consol.de/wp-content/uploads/2010/09/mod_gearman-1.2.0.tar.gz
#> rpmbuild -tb mod_gearman-1.2.0.tar.gz
#> yum --nogpgcheck install /usr/src/redhat/RPMS/*/mod_gearman-1.2.0-1.*.rpm

Finally start and check your installation

#> /etc/init.d/gearmand start
#> /etc/init.d/mod_gearman_worker start
#> gearman_top

From Source

Note
source installation should be avoided if possible. Prebuild packages are way easier to maintain.

Pre Requirements:

  • gcc / g++

  • autoconf / automake / autoheader

  • libtool

  • libgearman (>= 0.14)

Download the tarball and perform the following steps:

#> ./configure
#> make
#> make install

Then add the mod_gearman.o to your Nagios installation and add a broker line to your nagios.cfg:

broker_module=.../mod_gearman.o server=localhost:4730 eventhandler=yes services=yes hosts=yes

see Configuration for details on all parameters

The next step is to start one or more worker. You may use the same configuration file as for the neb module.

./mod_gearman_worker --server=localhost:4730 --services --hosts

or use the supplied init script.

Note
Make sure you have started your Gearmand job server. Usually it can be started with
/usr/sbin/gearmand -t 10 -j 0

or a supplied init script (extras/gearmand-init). Command line arguments have change in recent gearman versions and you now should use something like:

/usr/sbin/gearmand --threads=10 --job-retries=0

Patch Nagios

Note
The needed patch is already applied to Nagios 3.2.2. Use the patch if you have an older version.

It is not possible to distribute eventhandler with Nagios versions prior 3.2.2. Just apply the patch from the ./extras/patches directory to your Nagios sources and build Nagios again if you want to use an older version. You only need to replace the Nagios binary. Nothing else has changed. If you plan to distribute only Host/Servicechecks, no patch is needed.

Configuration

Nagios Core

A sample broker in your nagios.cfg could look like:

broker_module=/usr/local/share/nagios/mod_gearman.o keyfile=/usr/local/share/nagios/secret.txt server=localhost eventhandler=yes hosts=yes services=yes

See the following list for a detailed explanation of available options:

Common Options

Shared options for worker and the NEB module:

config

include config from this file. Options are the same as described here. include is an alias for config.

config=/etc/nagios3/mod_gm_worker.conf
debug

use debug to increase the verbosity of the module. Possible values are:

  • 0 – only errors

  • 1-4 – debug verbosity

  • 5 – trace and all gearman related logs are going to stdout

Default is 0.

debug=1
logmode

set way of logging. Possible values are:

  • automatic – logfile when a logfile is specified. stdout when no logfile is given. stdout for tools.

  • stdout – just print all messages to stdout

  • syslog – use syslog for all log events

  • file – use logfile

  • core – use Nagios internal loging (not thread safe! Use with care)

Default is automatic.

logmode=automatic
logfile

Path to the logfile.

logfile=/path/to/log.file
server

sets the address of your gearman job server. Can be specified more than once to add more server. Mod-Gearman uses the first server available.

server=localhost:4730,remote_host:4730
eventhandler

defines if the module should distribute execution of eventhandlers.

eventhandler=yes
services

defines if the module should distribute execution of service checks.

services=yes
hosts

defines if the module should distribute execution of host checks.

hosts=yes
hostgroups

sets a list of hostgroups which will go into seperate queues.

hostgroups=name1,name2,name3
servicegroups

sets a list of servicegroups which will go into seperate queues.

servicegroups=name1,name2,name3
encryption

enables or disables encryption. It is strongly advised to not disable encryption. Anybody will be able to inject packages to your worker. Encryption is enabled by default and you have to explicitly disable it. When using encryption, you will either have to specify a shared password with key=... or a keyfile with keyfile=.... Default is On.

encryption=yes
key

A shared password which will be used for encryption of data pakets. Should be at least 8 bytes long. Maximum length is 32 characters.

key=secret
keyfile

The shared password will be read from this file. Use either key or keyfile. Only the first 32 characters from the first line will be used. Whitespace to the right will be trimmed.

keyfile=/path/to/secret.file
use_uniq_jobs

Using uniq keys prevents the gearman queues from filling up when there is no worker. However, gearmand seems to have problems with the uniq key and sometimes jobs get stuck in the queue. Set this option to off when you run into problems with stuck jobs but make sure your worker are running. Default is On.

+

use_uniq_jobs=on

Server Options

Additional options for the NEB module only:

localhostgroups

sets a list of hostgroups which will not be executed by gearman. They are just passed through.

localhostgroups=name1,name2,name3
localservicegroups

sets a list of servicegroups which will not be executed by gearman. They are just passed through.

localservicegroups=name1,name2,name3
queue_custom_variable

Can be used to define the target queue by a custom variable in addition to host/servicegroups. When set for ex. to WORKER you then could define a _WORKER custom variable for your hosts and services to directly set the worker queue. The host queue is inherited unless overwritten by a service custom variable. Set the value of your custom variable to local to bypass Mod-Gearman (Same behaviour as in localhostgroups/localservicegroups).

queue_custom_variable=WORKER
do_hostchecks

Set this to no if you want Mod-Gearman to only take care of servicechecks. No hostchecks will be processed by Mod-Gearman. Use this option to disable hostchecks and still have the possibility to use hostgroups for easy configuration of your services. If set to yes, you still have to define which hostchecks should be processed by either using hosts or the hostgroups option. Default: yes

do_hostchecks=yes
result_workers

Number of result worker threads. Usually one is enough. You may increase the value if your result queue is not processed fast enough.

result_workers=3
perfdata

Defines if the module should distribute perfdata to gearman.

perfdata=yes
Note
processing of perfdata is not part of mod_gearman. You will need additional worker for handling performance data. For example: PNP4Nagios. Performance data is just written to the gearman queue.
perfdata_mode

There will be only a single job for each host or servier when putting performance data onto the perfdata_queue in overwrite mode. In append mode perfdata will be stored as long as there is memory left. Setting this to overwrite helps preventing the perf_data queue from getting to big. Monitor your perfdata carefully when using the append mode. Possible values are:

  • 1 – overwrite

  • 2 – append

Default is 1.

perfdata_mode=1
result_queue

sets the result queue. Necessary when putting jobs from several Nagios instances onto the same gearman queues. Default: check_results

result_queue=check_results_nagios1
orphan_host_checks

The Mod-Gearman NEB module will submit a fake result for orphaned host checks with a message saying there is no worker running for this queue. Use this option to get better reporting results, otherwise your hosts will keep their last state as long as there is no worker running. Default is yes.

orphan_host_checks=yes
orphan_service_checks

Same like orphan_host_checks but for services. Default is yes.

orphan_service_checks=yes
accept_clear_results

When enabled, the NEB module will accept unencrypted results too. This is quite useful if you have lots of passive checks and make use of send_gearman/send_multi where you would have to spread the shared key to all clients using these tools. Default is no.

accept_clear_results=yes

Worker Options

Additional options for worker:

identifier

Identifier for this worker. Will be used for the worker_identifier queue for status requests. You may want to change it if you are using more than one worker on a single host. Defaults to the current hostname.

identifier=hostname_test
pidfile

Path to the pidfile.

pidfile=/path/to/pid.file
job_timeout

Default job timeout in seconds. Currently this value is only used for eventhandler. The worker will use the values from the core for host and service checks. Default: 60

job_timeout=60
max-age

Threshold for discarding too old jobs. When a new job is older than this amount of seconds it will not be executed and just discarded. This will result in a message like “(Could Not Start Check In Time)”. Possible reasons for this are time differences between core and worker (use NTP!) or the smart rescheduler of the core which should be disabled. Set to zero to disable this check. Default: 0

max-age=600
min-worker

Minimum number of worker processes which should run at any time. Default: 1

min-worker=1
max-worker

Maximum number of worker processes which should run at any time. You may set this equal to min-worker setting to disable dynamic starting of workers. When setting this to 1, all services from this worker will be executed one after another. Default: 20

max-worker=20
spawn-rate

Defines the rate of spawned worker per second as long as there are jobs waiting. Default: 1

spawn-rate=1
load_limit1

Set a limit based on the 1min load average. When exceding the load limit, no new worker will be started until the current load is below the limit. No limit will be used when set to 0. Default: no limit

load_limit1=0
load_limit5

Set a limit based on the 5min load average. See load_limit1 for details. Default: no limit

load_limit5=0
load_limit15

Set a limit based on the 15min load average. See load_limit1 for details. Default: no limit

load_limit15=0
idle-timeout

Time in seconds after which an idling worker exits. This parameter controls how fast your waiting workers will exit if there are no jobs waiting. Set to 0 to disable the idle timeout. Default: 10

idle-timeout=30
max-jobs

Controls the amount of jobs a worker will do before he exits. Use this to control how fast the amount of workers will go down after high load times. Disabled when set to 0. Default: 1000

max-jobs=500
fork_on_exec

Use this option to disable an extra fork for each plugin execution. Disabling this option will reduce the load on the worker host, but may cause trouble with unclean plugins. Default: no

fork_on_exec=no
dupserver

sets the address of gearman job server where duplicated result will be sent to. Can be specified more than once to add more server. Useful for duplicating results for a reporting installation or remote gui.

dupserver=logserver:4730,logserver2:4730
show_error_output

Use this option to show stderr output of plugins too. When set to no, only stdout will be displayed. Default is yes.

show_error_output=yes
timeout_return

Defines the return code for timed out checks. Accepted return codes are 0 (Ok), 1 (Warning), 2 (Critical) and 3 (Unknown) Default: 2

timeout_return=2
dup_results_are_passive

Use this option to set if the duplicate result send to the dupserver will be passive or active. Default is yes (passive).

dup_results_are_passive=yes
debug-result

When enabled, the hostname of the executing worker will be put in front of the plugin output. This may help with debugging your plugin results. Default is off.

debug-result=yes
enable_embedded_perl

When embedded perl has been compiled in, you can use this switch to enable or disable the embedded perl interpreter. See Embedded Perl for details on EPN.

enable_embedded_perl=on
use_embedded_perl_implicitly

Default value used when the perl script does not have a “nagios: +epn” or “nagios: -epn” set. Perl scripts not written for epn support usually fail with epn, so its better to set the default to off.

use_embedded_perl_implicitly=off
use_perl_cache

Cache compiled perl scripts. This makes the worker process a little bit bigger but makes execution of perl scripts even faster. When turned off, Mod-Gearman will still use the embedded perl interpreter, but will not cache the compiled script.

use_perl_cache=on
workaround_rc_25

Duplicate jobs from gearmand result sometimes in exit code 25 of plugins because they are executed twice and get killed because of using the same ressource. Sending results (when exit code is 25 ) will be skipped with this enabled. Only needed if you experience problems with plugins exiting with exit code 25 randomly. Default is off.

workaround_rc_25=off

Queue Names

You may want to watch your gearman server job queue. The shipped gearman_top does this. It polls the gearman server every second and displays the current queue statistics.

+-----------------------+--------+-------+-------+---------+
| Name                  | Worker | Avail | Queue | Running |
+-----------------------+--------+-------+-------+---------+
| check_results         | 1      | 1     | 0     | 0       |
| host                  | 3      | 3     | 0     | 0       |
| service               | 3      | 3     | 0     | 0       |
| eventhandler          | 3      | 3     | 0     | 0       |
| servicegroup_jmx4perl | 3      | 3     | 0     | 0       |
| hostgroup_japan       | 3      | 3     | 0     | 0       |
+-----------------------+--------+-------+-------+---------+
check_results

this queue is monitored by the neb module to fetch results from the worker. You don’t need an extra worker for this queue. The number of result workers can be set to a maximum of 256, but usually one is enough. One worker is capable of processing several thousand results per second.

host

This is the queue for generic host checks. If you enable host checks with the hosts=yes switch. Before a host goes into this queue, it is checked if any of the local groups matches or a seperate hostgroup machtes. If nothing matches, then this queue is used.

service

This is the queue for generic service checks. If you enable service checks with the services=yes switch. Before a service goes into this queue it is checked against the local host- and service-groups. Then the normal host- and servicegroups are checked and if none matches, this queue is used.

hostgroup_<name>

This queue is created for every hostgroup which has been defined by the hostgroups=… option. Make sure you have at least one worker for every hostgroup you specify. Start the worker with --hostgroups=... to work on hostgroup queues. Note that this queue may also contain service checks if the hostgroup of a service matches.

servicegroup_<name>

This queue is created for every servicegroup which has been defined by the servicegroup=... option.

eventhandler

This is the generic queue for all eventhandler. Make sure you have a worker for this queue if you have eventhandler enabled. Start the worker with --events to work on this queue.

perfdata

This is the generic queue for all performance data. It is created and used if you switch on --perfdata=yes. Performance data cannot be processed by the gearman worker itself. You will need PNP4Nagios therefor.

Performance

While the main motivation was to ease distributed configuration, this plugin also helps to spread the load on multiple worker. Throughput is mainly limited by the amount of jobs a single Nagios instance can put onto the Gearman job server. Keep the Gearman job server close to the Nagios box. Best practice is to put both on the same machine. Both processes will utilize one core. Some testing with my workstation (Dual Core 2.50GHz) and two worker boxes gave me these results. I used a sample Nagios installation with 20.000 Services at a 1 minute interval and a sample plugin which returns just a single line of output. I got over 300 Servicechecks per second, which means you could easily setup 100.000 services at a 5 minute interval with a single Nagios box. The amount of worker boxes depends on your check types.

mod_gearman performance mod_gearman performance

See this article about benchmarks with Nagios3, Nagios4 and Mod-Gearman.

Exports

Exports export data structures from the Nagios core as JSON data. For each configurable event one job will be created. At the moment, the only useful event type is the logdata event which allows you to create a json data job for every logged line. This can be very useful for external reporting tools.

exports

Set the queue name to create the jobs in. The return code will be sent back to the core (Not all callbacks support return codes). Callbacks are a list of callbacks for which you want to export json data.

export=<queue>:<returncode>:<callback>[,<callback>,...]
export=log_queue:1:NEBCALLBACK_LOG_DATA

Embedded Perl

Since 1.2.0 Mod-Gearman has builtin embedded Perl support which means generally a big performance boost when you have lots of perl plugins.

To enable embedded Perl you need to run configure with –enable-embedded-perl

  ./configure --enable-embedded-perl otheroptions...

The –with-perlcache configure option has been replace by a runtime configure option use_perl_cache.

Note
Not all perl plugins support EPN. You can fix them, add # nagios: -epn in the first 10 lines of the script or set use_embedded_perl_implicitly=off so all scripts without the explicit tag are run without embedded Perl.

The default configuration of Mod-Gearman enables embedded Perl, but only uses it for Perl scripts which explicitly set # nagios: +epn. This is a very safe way of using embedded Perl but you probably miss some plugins which do not set the header and still would run with EPN. You may want to use the mini_epn from your Nagios installation to verify if a plugin works with EPN or not.

General EPN documentation is valid for Mod-Gearman as well:

Note
Mod-Gearman does not fix all of the memory leaks introduced with Nagios and Embedded Perl, but it moves the leaks away from the core. And they do not affect Mod-Gearman at all, as they are only in the preforked worker processes which will be restarted automatically from time to time (see max-jobs).

How To

How to Monitor Job Server and Worker

Use the supplied check_gearman to monitor your worker and job server. Worker have a own queue for status requests.

%> ./check_gearman -H <job server hostname> -q worker_<worker hostname> -t 10 -s check
check_gearman OK - localhost has 10 worker and is working on 1 jobs|worker=10 running=1 total_jobs_done=1508

This will send a test job to the given job server and the worker will respond with some statistical data.

Job server can be monitored with:

%> ./check_gearman -H localhost -t 20
check_gearman OK - 6 jobs running and 0 jobs waiting.|check_results=0;0;1;10;100 host=0;0;9;10;100 service=0;6;9;10;100

How to Submit Passive Checks

You can use send_gearman to submit active and passive checks to a gearman job server where they will be processed just like a finished check would do.

%> ./send_gearman --server=<job server> --encryption=no --host="<hostname>" --service="<service>" --message="message"

How to build send_gearman.exe

After installing strawberry perl, you need to install the PAR::Packer module and run pp:

  pp -z 9 -M Class::Load::XS -M Moose -M Nagios::Passive::Base -M Params::Validate::XS -o send_gearman.exe send_gearman.pl

Or just use the prebuild one from labs.consol.de: send_gearman.exe.

How to Submit check_multi Results

check_multi is a plugin which executes multiple child checks. See more details about the feed_passive mode at: www.my-plugin.de

You can pass such child checks to Nagios via the mod_gearman neb module:

%> check_multi -f multi.cmd -r 256 | ./send_multi --server=<job server> --encryption=no --host="<hostname>" --service="<service>"

If you want to use only check_multi and no other workers, you can achieve this with the following neb module settings:

broker_module=/usr/local/share/nagios/mod_gearman.o server=localhost encryption=no eventhandler=no hosts=no services=no hostgroups=does_not_exist
Note
encryption is not necessary if you both run the check_multi checks and the Nagios check_results queue on the same server.

How to Set Queue by Custom Variable

Set queue_custom_variable=worker in your Mod-Gearman NEB configuration. Then adjust your nagios host/service configuration and add the custom variable:

  define host {
    ...
    _WORKER    hostgroup_test
  }

The test hostgroup does not have to exist, it is a virtual queue name which is used by the worker.

Adjust your Mod-Gearman worker configuration and put test in the hostgroups attribute. From then on, the worker will work on all jobs in the hostgroup_test queue.

What About Notifications

Notifications are currently not possible to distribute via Mod-Gearman. The Nagios core would have to be patched to support this. And i think its not very useful at all. So don’t expect this feature to be implemented in the near future.

Supported Dependencies

Note
Mod-Gearman works best with libgearman/gearmand 0.33 and Nagios 3.2.3. If in doubt, use these versions.

Lib-Gearman

Mod-Gearman has successfully been tested on the following Gearmand Versions. It is recommended to always use the latest listed version of libgearman.

Nagios

Mod-Gearman works best since version 3.2.2 up to the latest stable Nagios 3.5.1. Nagios 4 is not fully tested yet, but there is a preview version available here http://mod-gearman.org/download/v1.4.0nagios4/ or in the nagios4 branch of the source tree.

Naemon

Mod-Gearman works on the Naemon core as well with the same remarks as Nagios 4.

Icinga

To be clear, Icinga is not the recommended platform for Mod-Gearman and not supported in any way. However, people have reported it works with Icinga 1.2.0, 1.8 and 1.10.1 but it seems like some statistics are not updated.

Hints

  • Make sure you have at least one worker for every queue. You should monitor that (check_gearman).

  • Add Logfile checks for your gearmand server and mod_gearman worker.

  • Make sure all gearman checks are in local groups. Gearman self checks should not be monitored through gearman.

  • Checks which write directly to the Nagios command file (ex.: check_mk) have to run on a local worker or have to be excluded by the localservicegroups.

  • Keep the gearmand server close to Nagios for better performance.

  • If you have some checks which should not run parallel, just setup a single worker with –max-worker=1 and they will be executed one after another. For example for cpu intesive checks with selenium.

  • Make sure all your worker have the Nagios-Plugins available under the same path. Otherwise they could’nt be found by the worker.

Archive


13 Responses to “Mod Gearman”

  1. sven Says:
    February 8th, 2011 at 22:55

    Comments have been disabled, please use the support forum for bug reports and comments.

  2. OMD: State Retention der nagios-Plugins in /var/tmp !??? | KenntWas.de - Technische Tipps Says:
    May 28th, 2011 at 12:37

    […] einigen. Umgebungsvariablen müssten dann aber auch von anderen Komponenten, die Plugins aufrufen (mod_gearman, check_multi,..) weitergegeben werden.Ein Patchen der plugins fällt als Lösung aus, weil dann die […]

  3. Neu in omd 0.48: mod_gearman und check_multi | KenntWas.de - Technische Tipps Says:
    May 28th, 2011 at 16:24

    […] der Entwickler waren da und konnten ihre Neuigkeiten vorstellen. Ganz neu dabei in OMD 0.48  sind Mod_Gearman und check_multi. Shinken hat nicht mehr den Tag “Experimentell”. Da die meisten […]

  4. mod_gearman : Nagios distribué 2.0 | Communauté Francophone de la Supervision Libre Says:
    October 5th, 2011 at 14:07

    […] dans ce contexte qu’il devenait donc urgent de vous présenter mod_gearman. Outre le fait que celui-ci est développé par des « pointures » de la […]

  5. Happy new year ! – ConSol* Labs Says:
    December 30th, 2011 at 13:48

    […] mod_gearman started its life on the Nagios Workshop in Hannover and continously rocks the Nagios world since then. […]

  6. Rétrospective 2011 | Communauté Francophone de la Supervision Libre Says:
    December 30th, 2011 at 23:13

    […] communauté allemande a aussi lancé un autre projet: Mod_Gearman. Ce projet sort tout droit du laboratoire de Consol.de. Cette solution permet de gérer sa […]

  7. Mod Gearman : Sauvons les perfs Nagios « Jean Prat Says:
    February 29th, 2012 at 22:01

    […] http://labs.consol.de/lang/de/nagios/mod-gearman/ […]

  8. Mod-Gearman 1.2.6 released – ConSol* Labs Says:
    March 15th, 2012 at 16:38

    […] Normal Download RPM packages Debian and Ubuntu have official packages already. Thanks to the packager. […]

  9. Mod-Gearman 1.2.2 released – ConSol* Labs Says:
    March 26th, 2012 at 14:09

    […] Normal Download RPM packages Debian and Ubuntu have official packages already. Thanks to the packager. […]

  10. Check_mk, pnp4nagios and mod_gearman « Dvas0004's Blog Says:
    August 11th, 2012 at 21:21

    […] Core: nagios and mod_gearman. This combination worked wonders for me. It’s fast and easy. Nagios’ community base […]

  11. Photo & Life Style. » How to install php-gearman on CentOS 6. Says:
    September 17th, 2012 at 11:35

    […] : http://labs.consol.de/nagios/mod-gearman/#_centos_redhat Category: How to / Tags: centos, gearman, how to, libgreaman, php, php-gearman /* mobile […]

  12. Nagios: un’infrastruttura ridondante, distribuita e scalabile | KM Consulting Says:
    April 16th, 2013 at 11:05

    […] mod_gearman […]

  13. Submit Nagios metrics to Graphite with ModGearman and MetricFactory Says:
    May 10th, 2013 at 15:59

    […] of choice for many.  I would have abandoned it if there weren’t projects like Livestatus, Mod_Gearman and Thruk which to my opinion should never be missing from any Nagios setup.  Mod_Gearman, the […]