Initial tarball release, based on check_afsspace 1.16, check_bos 1.7,
check_rxdebug 1.11, and check_udebug 1.3.
+ Rename check_afsspace to check_afs_space, check_bos to check_afs_bos,
+ check_rxdebug to check_afs_rxdebug, and check_udebug to
+ check_afs_udebug for more consistent naming and easier identification
+ of the AFS Nagios probes.
+
Add check_afs_quotas, which monitors AFS volumes for quota usage,
either for specific volumes or for all volumes on a particular server
(and optionally partition). Based on a script by Steve Rader.
- Support checking a single partition in check_afsspace and print more
+ Support checking a single partition in check_afs_space and print more
verbose information about total, used, and free space in that mode.
Format partition sizes using Number::Format if available. Based on
work by Steve Rader.
If the salvager is running (such as when started manually with bos
- salvage), check_bos now reports a warning stating that, rather than a
- critical error showing the auxiliary status line. Reported by Steve
- Rader.
+ salvage), check_afs_wbos now reports a warning stating that, rather
+ than a critical error showing the auxiliary status line. Reported by
+ Steve Rader.
Print an UNKNOWN status on standard output on syntax errors in all
scripts, rather than reporting the problem only to standard error.
Check that the host to check was specified and report a syntax error
if it wasn't. Thanks, Tobias Wolter.
- Ignore "bos: running unauthenticated" in check_bos, since bos status
- is always run unauthenticated.
+ Ignore "bos: running unauthenticated" in check_afs_bos, since bos
+ status is always run unauthenticated.
- Add support for reporting warnings in check_bos and report a warning
- if there is inappropriate access on server directories. Patch from
- Steve Rader.
+ Add support for reporting warnings in check_afs_bos and report a
+ warning if there is inappropriate access on server directories. Patch
+ from Steve Rader.
- If check_bos is successful, report the number of instances running
+ If check_afs_bos is successful, report the number of instances running
normally. Patch from Steve Rader.
Look for rxdebug in /usr/sbin and /usr/local/sbin since OpenAFS
installs it into sbindir by default.
- Report the database version as extra information in check_udebug if
- there are no errors or warnings.
+ Report the database version as extra information in check_afs_udebug
+ if there are no errors or warnings.
afs-monitor provides Nagios-compatible probe scripts that can be used to
monitor AFS servers. It contains five scripts: check_afs_quotas, which
- monitors AFS volumes for quota usage; check_afsspace, which monitors
- file server partitions for disk usage; check_bos, which monitors any
+ monitors AFS volumes for quota usage; check_afs_space, which monitors
+ file server partitions for disk usage; check_afs_bos, which monitors any
bosserver-managed set of processes for problems reported by bos;
- check_rxdebug, which monitors AFS fileservers for connections waiting
- for a thread; and check_udebug, which monitors Ubik services (such as
- vlserver and ptserver) for replication and quorum problems.
+ check_afs_rxdebug, which monitors AFS fileservers for connections
+ waiting for a thread; and check_afs_udebug, which monitors Ubik services
+ (such as vlserver and ptserver) for replication and quorum problems.
DESCRIPTION
server or server partition for quota usage and reports errors or
warnings if the used space is over a configurable threshold.
- check_afsspace uses vos partinfo to check the available space on each
+ check_afs_space uses vos partinfo to check the available space on each
partition on a file server. It reports a critical error if the
percentage used is above a configurable threshold (90% by default) and a
warning if it is above a lower configurable threshold (85% by default).
- check_bos runs bos status on a file server or volume location server and
- scans the output, making sure that all commands are running normally and
- the file server isn't salvaging. If it sees any output it doesn't
+ check_afs_bos runs bos status on a file server or volume location server
+ and scans the output, making sure that all commands are running normally
+ and the file server isn't salvaging. If it sees any output it doesn't
expect from bos status, it reports that output in an alert.
- check_rxdebug runs rxdebug against a file server and looks for any
+ check_afs_rxdebug runs rxdebug against a file server and looks for any
client connections that are in the state "waiting for a thread." This
indicates client connections that are blocked waiting for a file server
thread. We've found this to be a reliable test for detecting serious
and a warning if it is above a lower configurable threshold (2 by
default).
- check_udebug runs udebug against a ubik service (vlserver, ptserver,
+ check_afs_udebug runs udebug against a ubik service (vlserver, ptserver,
kaserver, or buserver) and makes sure that it is in a reasonable state.
It checked to be sure that there is a sync site for the service, and
when there is, that the sync site believes that the recovery state is 1f
These scripts were written by Xueshan Feng, Neil Crellin, Quanah
Gibson-Mount, and Russ Allbery and are currently maintained by Russ
- Allbery.
+ Allbery. Many modifications to the scripts were based on work by Steve
+ Rader.
REQUIREMENTS
bos, rxdebug, and udebug) and expect them to be in either /usr/bin or in
/usr/local/bin.
- check_afs_quotas and check_afsspace will use Number::Format, if
+ check_afs_quotas and check_afs_space will use Number::Format, if
available, to format sizes with IEC 60027 prefixes.
INSTALLATION
You can then use the scripts directly with commands such as:
- check_afsspace -H afs1
+ check_afs_space -H afs1
To use the scripts in a Nagios probe, configure a command such as:
define command {
- command_name check_afsspace
- command_line /path/to/install/check_afsspace -H $HOSTADDRESS$
+ command_name check_afs_space
+ command_line /path/to/install/check_afs_space -H $HOSTADDRESS$
}
changing the path to the script to wherever you installed the scripts.
--- /dev/null
+#!/usr/bin/perl -w
+our $VERSION = '@VERSION@ @DATE@';
+#
+# check_afs_bos -- Monitor AFS bos output for problems in Nagios.
+#
+# Given an AFS server (file or VLDB), runs bos status on each one. Checks to
+# see if there is a communication failure, and also checks to see if anything
+# in the output looks unusual or wrong. If either of these conditions are
+# true, print that information to STDOUT. Suitable for being run inside
+# Nagios.
+#
+# Written by Russ Allbery <rra@stanford.edu>
+# Based on an earlier script by Neil Crellin <neilc@stanford.edu>
+# Copyright 2003, 2004, 2010 Board of Trustees, Leland Stanford Jr. University
+#
+# This program is free software; you may redistribute it and/or modify it
+# under the same terms as Perl itself.
+
+##############################################################################
+# Modules and declarations
+##############################################################################
+
+require 5.006;
+
+use strict;
+
+use Getopt::Long qw(GetOptions);
+
+##############################################################################
+# Site configuration
+##############################################################################
+
+# The full path to bos. Make sure that this is on local disk so that
+# monitoring doesn't have an AFS dependency.
+our ($BOS) = grep { -x $_ } qw(/usr/bin/bos /usr/local/bin/bos);
+$BOS ||= '/usr/bin/bos';
+
+# The default timeout in seconds (implemented by alarm) for rxdebug.
+our $TIMEOUT = 10;
+
+# The list of regular expressions matching expected output. You may need to
+# customize this for what you're running at your site. Any output from bos
+# that doesn't match one of these regular expressions or the warning regular
+# expressions below will throw a critical error.
+our @OKAY = (
+ qr/^\s*$/,
+ qr/^bos: running unauthenticated$/,
+ qr/^Instance\ \S+,\ \(type\ is\ \S+\)(\ has\ core\ file,)?
+ \ currently\ running\ normally\.$/x,
+ qr/^\s*Auxiliary status is: file server running\.$/,
+ qr/^\s*Process last started at /,
+ qr/^\s*Last exit at /,
+ qr/^\s*Last error exit at /,
+ qr/^\s*Command \d+ is /
+);
+
+# The list of regular expressions that match output that should produce a
+# warning. You may need to customize this for what you expect at your site.
+our @WARNINGS = (
+ qr/^\s*Bosserver reports inappropriate access on server directories/
+);
+
+##############################################################################
+# Implementation
+##############################################################################
+
+# Report a syntax error and exit. We do this via stdout in order to satisfy
+# the Nagios plugin output requirements, but also report a more conventional
+# error via stderr in case people are calling this outside of Nagios.
+sub syntax {
+ print "BOS UNKNOWN - ", join ('', @_), "\n";
+ warn "$0: ", join ('', @_), "\n";
+ exit 3;
+}
+
+# Parse command line options.
+my ($help, $host, $version);
+Getopt::Long::config ('bundling', 'no_ignore_case');
+GetOptions ('H|hostname=s' => \$host,
+ 'h|help' => \$help,
+ 't|timeout=i' => \$TIMEOUT,
+ 'V|version' => \$version)
+ or syntax ("invalid option");
+if ($help) {
+ print "Feeding myself to perldoc, please wait....\n";
+ exec ('perldoc', '-t', $0) or die "Cannot fork: $!\n";
+} elsif ($version) {
+ my $version = $VERSION;
+ print "check_afs_bos $version\n";
+ exit 0;
+}
+syntax ("extra arguments on command line") if @ARGV;
+syntax ("host to check not specified") unless (defined $host);
+
+# Set up the alarm.
+$SIG{ALRM} = sub {
+ print "BOS CRITICAL - network timeout after $TIMEOUT seconds\n";
+ exit 2;
+};
+alarm ($TIMEOUT);
+
+# Collect the bos output into a variable.
+unless (open (BOS, "$BOS status $host -noauth -long 2>&1 |")) {
+ print "BOS UNKNOWN - cannot run bos\n";
+ exit 3;
+}
+my @bos = <BOS>;
+close BOS;
+
+# Make sure that bos was successful. Note that it generally does return
+# success even if it can't contact the bos server.
+if ($? != 0) {
+ print "BOS CRITICAL - bos status failed\n";
+ exit 2;
+}
+
+# Scan the output. If we see anything that we don't expect, immediately
+# report it as a fatal error.
+my $count = 0;
+my $prev_line = '';
+for my $line (@bos) {
+ my $okay = 0;
+ my $warn = 0;
+ for my $regex (@OKAY) {
+ if ($line =~ /$regex/) {
+ $okay = 1;
+ last;
+ }
+ }
+ for my $regex (@WARNINGS) {
+ if ($line =~ /$regex/) {
+ $warn = 1;
+ last;
+ }
+ }
+ unless ($okay || $warn) {
+ $line =~ s/^\s+//;
+ $line =~ s/\s+$//;
+ if ($prev_line =~ /^Instance salvage,/ && $line =~ /running now/) {
+ print "BOS WARNING - salvage is running\n";
+ exit 1;
+ } else {
+ print "BOS CRITICAL - $line\n";
+ exit 2;
+ }
+ }
+ if ($warn) {
+ $line =~ s/^\s+//;
+ $line =~ s/\s+$//;
+ print "BOS WARNING - $line\n";
+ exit 1;
+ }
+ $count++ if ($line =~ /currently running normally\.$/);
+ $prev_line = $line;
+}
+if ($count == 1) {
+ print "BOS OK - one process running normally\n";
+} else {
+ print "BOS OK - $count processes running normally\n";
+}
+exit 0;
+
+##############################################################################
+# Documentation
+##############################################################################
+
+=head1 NAME
+
+check_afs_bos - Monitor AFS bos output for problems in Nagios
+
+=head1 SYNOPSIS
+
+B<check_afs_bos> [B<-hV>] [B<-t> I<timeout>] B<-H> I<host>
+
+=head1 DESCRIPTION
+
+B<check_afs_bos> is a Nagios plugin for querying the AFS bosserver for
+process status and reporting an alert if there are any unexpected lines in
+the bos output. The acceptable lines of output from B<bos> are configured
+at the top of this script; they should be generally suitable for most
+sites, but may require some customization.
+
+B<check_afs_bos> will always print out a single line of output. If there
+is a line that isn't matched by any regexes identifying acceptable lines,
+it will output the first non-matching line prefixed by C<BOS CRITICAL>.
+If the salvager is running (such as when started by C<bos salvage>) or
+other warnings are found, it will print that warning information prefixed
+by C<BOS WARNING>. Otherwise, it will output C<BOS OK>. Note that this
+monitoring may not catch such things as a service being constantly
+restarted if it happens to be up and running normally each time the probe
+runs; it doesn't pay any attention to the last start time, the last error
+exit status, the presence of core files, and the like. It mostly just
+looks for the "running normally" part of the B<bos> output and makes sure
+the auxilliary status is also "running normally" for a file server
+process.
+
+=head1 OPTIONS
+
+=over 4
+
+=item B<-H> I<host>, B<--hostname>=I<host>
+
+The AFS server whose B<bos> status B<check_afs_bos> should check. This
+option is required.
+
+=item B<-h>, B<--help>
+
+Print out this documentation (which is done simply by feeding the script
+to C<perldoc -t>).
+
+=item B<-t> I<timeout>, B<--timeout>=I<timeout>
+
+Change the timeout for the B<bos> command. The default timeout is 10
+seconds.
+
+=item B<-V>, B<--version>
+
+Print out the version of B<check_afs_bos> and quit.
+
+=back
+
+=head1 EXIT STATUS
+
+B<check_afs_bos> follows the standard Nagios exit status requirements.
+This means that it will exit with status 0 if there are no problems, with
+status 1 if the salvager is running, or with status 2 if there is a
+problem detected. For other errors, such as invalid syntax,
+B<check_afs_bos> will exit with status 3.
+
+=head1 BUGS
+
+The standard B<-v> verbose Nagios plugin option is not supported. It
+should display the complete bos status output.
+
+The usage message for invalid options and for the B<-h> option doesn't
+conform to Nagios standards.
+
+=head1 CAVEATS
+
+This script does not use the Nagios util library or any of the defaults
+that it provides, which makes it somewhat deficient as a Nagios plugin.
+This is intentional, though, since this script can be used with other
+monitoring systems as well. It's not clear what a good solution to this
+would be.
+
+=head1 SEE ALSO
+
+This script is part of the afs-monitor package, which includes various AFS
+monitoring plugins for Nagios. It is available from the AFS monitoring
+tools page at L<http://www.eyrie.org/~eagle/software/afs-monitor/>.
+
+=head1 AUTHORS
+
+The original idea behind this script was from Neil Crellin. Russ Allbery
+<rra@stanford.edu> updated it to work with Nagios and stripped out some
+rather neat but now unnecessary code to look for any changes in the bos
+output, instead just scanning it for acceptable lines.
+
+=head1 COPYRIGHT AND LICENSE
+
+Copyright 2003, 2004, 2010 Board of Trustees, Leland Stanford Jr.
+University.
+
+This program is free software; you may redistribute it and/or modify it
+under the same terms as Perl itself.
+
+=cut
--- /dev/null
+#!/usr/bin/perl -w
+our $VERSION = '@VERSION@ @DATE@';
+#
+# check_afs_rxdebug -- Nagios AFS server check for waiting connections.
+#
+# Expects a file server with the -H option and runs rxdebug against that file
+# server, looking for any connections that are waiting for a thread. Exits
+# with status 1 if there are more than two connections in that state (a
+# warning) and with status 2 if there are more than eight connections in that
+# state. The thresholds can be overridden from the command line.
+#
+# Written by Quanah Gibson-Mount based on work by Neil Crellin
+# Updated by Russ Allbery <rra@stanford.edu>
+# Copyright 2003, 2004, 2005, 2010
+# Board of Trustees, Leland Stanford Jr. University
+#
+# This program is free software; you may redistribute it and/or modify it
+# under the same terms as Perl itself.
+
+##############################################################################
+# Modules and declarations
+##############################################################################
+
+require 5.006;
+
+use strict;
+
+use Getopt::Long qw(GetOptions);
+
+##############################################################################
+# Site configuration
+##############################################################################
+
+# The default count of blocked connections at which to warn or send a critical
+# alert. These can be overridden with the -w and -c command-line options.
+our $WARNINGS = 2;
+our $CRITICAL = 8;
+
+# The default timeout in seconds (implemented by alarm) for rxdebug.
+our $TIMEOUT = 60;
+
+# The full path to rxdebug. Make sure that this is on local disk so that
+# monitoring doesn't have an AFS dependency.
+our ($RXDEBUG) = grep { -x $_ }
+ qw(/usr/bin/rxdebug /usr/sbin/rxdebug /usr/local/bin/rxdebug
+ /usr/local/sbin/rxdebug);
+$RXDEBUG ||= '/usr/bin/rxdebug';
+
+##############################################################################
+# Implementation
+##############################################################################
+
+# Report a syntax error and exit. We do this via stdout in order to satisfy
+# the Nagios plugin output requirements, but also report a more conventional
+# error via stderr in case people are calling this outside of Nagios.
+sub syntax {
+ print "AFS UNKNOWN - ", join ('', @_), "\n";
+ warn "$0: ", join ('', @_), "\n";
+ exit 3;
+}
+
+# Parse command line options.
+my ($help, $host, $version);
+Getopt::Long::config ('bundling', 'no_ignore_case');
+GetOptions ('c|critical=i' => \$CRITICAL,
+ 'H|hostname=s' => \$host,
+ 'h|help' => \$help,
+ 't|timeout=i' => \$TIMEOUT,
+ 'V|version' => \$version,
+ 'w|warning=i' => \$WARNINGS)
+ or syntax ("invalid option");
+if ($help) {
+ print "Feeding myself to perldoc, please wait....\n";
+ exec ('perldoc', '-t', $0) or die "Cannot fork: $!\n";
+} elsif ($version) {
+ my $version = $VERSION;
+ print "check_afs_rxdebug $version\n";
+ exit 0;
+}
+syntax ("extra arguments on command line") if @ARGV;
+syntax ("host to check not specified") unless (defined $host);
+if ($WARNINGS > $CRITICAL) {
+ syntax ("warning level $WARNINGS greater than critical level $CRITICAL");
+}
+
+# Set up the alarm.
+$SIG{ALRM} = sub {
+ print "AFS CRITICAL - network timeout after $TIMEOUT seconds\n";
+ exit 2;
+};
+alarm ($TIMEOUT);
+
+# Run rxdebug and parse the output to find the number of calls waiting for a
+# thread.
+unless (open (RXDEBUG, "$RXDEBUG $host -noconn |")) {
+ warn "$0: cannot run rxdebug\n";
+ exit 3;
+}
+my $blocked;
+while (<RXDEBUG>) {
+ if (/^(\d+) calls waiting for a thread/) {
+ $blocked = $1;
+ last;
+ }
+}
+close RXDEBUG;
+if ($? != 0) {
+ print "AFS CRITICAL - cannot contact server\n";
+ exit 2;
+}
+unless (defined $blocked) {
+ print "AFS CRITICAL - cannot parse rxdebug output\n";
+ exit 2;
+}
+
+# Check the connection count against our limits and make sure that it's okay.
+if ($blocked >= $CRITICAL) {
+ print "AFS CRITICAL - $blocked blocked connections\n";
+ exit 2;
+} elsif ($blocked >= $WARNINGS) {
+ print "AFS WARNING - $blocked blocked connections\n";
+ exit 1;
+} else {
+ print "AFS OK - $blocked blocked connections\n";
+ exit 0;
+}
+
+##############################################################################
+# Documentation
+##############################################################################
+
+=head1 NAME
+
+check_afs_rxdebug - Check AFS servers for blocked connections in Nagios
+
+=head1 SYNOPSIS
+
+B<check_afs_rxdebug> [B<-hV>] [B<-c> I<threshold>] [B<-w> I<threshold>]
+ [B<-t> I<timeout>] B<-H> I<host>
+
+=head1 DESCRIPTION
+
+B<check_afs_rxdebug> is a Nagios plugin for checking AFS file servers to
+see if there are client connections waiting for a free thread. If there
+are more than a few of these, AFS performance tends to be very slow; this
+is a fairly reliable way to catch overloaded file servers. By default,
+B<check_afs_rxdebug> returns a critical error if there are more than eight
+connections waiting for a free thread and a warning if there are more than
+two. These thresholds can be changed with the B<-c> and B<-w> options.
+
+B<check_afs_rxdebug> will always print out a single line of output
+including the number of blocked connections, displaying whether this is
+critical, a warning, or okay.
+
+=head1 OPTIONS
+
+=over 4
+
+=item B<-c> I<threshold>, B<--critical>=I<threshold>
+
+Change the critical blocked connection count threshold to I<threshold>,
+which should be an integer. The default is 8.
+
+=item B<-H> I<host>, B<--hostname>=I<host>
+
+The AFS file server whose connections B<check_afs_rxdebug> should check.
+This option is required.
+
+=item B<-h>, B<--help>
+
+Print out this documentation (which is done simply by feeding the script
+to C<perldoc -t>).
+
+=item B<-t> I<timeout>, B<--timeout>=I<timeout>
+
+Change the timeout for the B<rxdebug> command. The default timeout is 60
+seconds.
+
+=item B<-V>, B<--version>
+
+Print out the version of B<check_afs_rxdebug> and quit.
+
+=item B<-w> I<threshold>, B<--warning>=I<threshold>
+
+Change the warning blocked connection threshold to I<threshold>, which
+should be an integer. The default is 2.
+
+=back
+
+=head1 EXIT STATUS
+
+B<check_afs_rxdebug> follows the standard Nagios exit status requirements.
+This means that it will exit with status 0 if there are no problems, with
+status 1 if there is a warning, and with status 2 if there is a critical
+problem. For other errors, such as invalid syntax, B<check_afs_rxdebug>
+will exit with status 3.
+
+=head1 BUGS
+
+The standard B<-v> verbose Nagios plugin option is not supported, although
+it's not entirely clear what it would add.
+
+The usage message for invalid options and for the B<-h> option doesn't
+conform to Nagios standards.
+
+=head1 CAVEATS
+
+This script does not use the Nagios util library or any of the defaults
+that it provides, which makes it somewhat deficient as a Nagios plugin.
+This is intentional, though, since this script can be used with other
+monitoring systems as well. It's not clear what a good solution to this
+would be.
+
+=head1 SEE ALSO
+
+This script is part of the afs-monitor package, which includes various AFS
+monitoring plugins for Nagios. It is available from the AFS monitoring
+tools page at L<http://www.eyrie.org/~eagle/software/afs-monitor/>.
+
+=head1 AUTHORS
+
+The original idea behind this script was from Neil Crellin. It was
+updated by Quanah Gibson-Mount to work with Nagios, and then further
+updated by Russ Allbery <rra@stanford.edu> to support more standard
+options and to use a more uniform coding style.
+
+=head1 COPYRIGHT AND LICENSE
+
+Copyright 2003, 2004, 2005, 2010 Board of Trustees, Leland Stanford
+Jr. University.
+
+This program is free software; you may redistribute it and/or modify it
+under the same terms as Perl itself.
+
+=cut
--- /dev/null
+#!/usr/bin/perl -w
+our $VERSION = '@VERSION@ @DATE@';
+#
+# check_afs_space -- Monitor AFS disk space usage under Nagios.
+#
+# Expects a host with the -H option and checks the partition usage with vos
+# partinfo. Exits with status 1 if the free space is below a warning
+# percentage and with status 2 if the free space is above a critical
+# percentage (this works with the Nagios check architecture).
+#
+# Written by Susan Feng <sfeng@stanford.edu>
+# Updated by Russ Allbery <rra@stanford.edu>
+# Copyright 2003, 2004, 2010 Board of Trustees, Leland Stanford Jr. University
+#
+# This program is free software; you may redistribute it and/or modify it
+# under the same terms as Perl itself.
+
+##############################################################################
+# Modules and declarations
+##############################################################################
+
+require 5.006;
+
+use strict;
+
+use Getopt::Long qw(GetOptions);
+
+# Use Number::Format if it's available, but don't require it.
+our $FORMAT = 0;
+eval {
+ require Number::Format;
+ Number::Format->import ('format_bytes');
+};
+unless ($@) {
+ $FORMAT = 1;
+}
+
+##############################################################################
+# Site configuration
+##############################################################################
+
+# The default percentage full at which to warn and at which to send a critical
+# alert. These can be overridden with the -w and -c command-line options.
+our $WARNINGS = 85;
+our $CRITICAL = 90;
+
+# The default timeout in seconds (implemented by alarm) for vos partinfo.
+our $TIMEOUT = 300;
+
+# The full path to vos. Make sure that this is on local disk so that
+# monitoring doesn't have an AFS dependency.
+our ($VOS) = grep { -x $_ } qw(/usr/bin/vos /usr/local/bin/vos);
+$VOS ||= '/usr/bin/vos';
+
+##############################################################################
+# Implementation
+##############################################################################
+
+# Report a syntax error and exit. We do this via stdout in order to satisfy
+# the Nagios plugin output requirements, but also report a more conventional
+# error via stderr in case people are calling this outside of Nagios.
+sub syntax {
+ print "AFS UNKNOWN - ", join ('', @_), "\n";
+ warn "$0: ", join ('', @_), "\n";
+ exit 3;
+}
+
+# Parse command line options.
+my ($help, $host, $partition, $version);
+Getopt::Long::config ('bundling', 'no_ignore_case');
+GetOptions ('c|critical=i' => \$CRITICAL,
+ 'H|hostname=s' => \$host,
+ 'h|help' => \$help,
+ 'p|partition=s' => \$partition,
+ 't|timeout=i' => \$TIMEOUT,
+ 'V|version' => \$version,
+ 'w|warning=i' => \$WARNINGS)
+ or syntax ("invalid option");
+if ($help) {
+ print "Feeding myself to perldoc, please wait....\n";
+ exec ('perldoc', '-t', $0) or die "Cannot fork: $!\n";
+} elsif ($version) {
+ my $version = $VERSION;
+ print "check_afs_space $version\n";
+ exit 0;
+}
+syntax ("extra arguments on command line") if @ARGV;
+syntax ("host to check not specified") unless (defined $host);
+if ($WARNINGS > $CRITICAL) {
+ syntax ("warning level $WARNINGS greater than critical level $CRITICAL");
+}
+if ($partition) {
+ $partition = "/vicep$partition" if length ($partition) <= 2;
+ $partition = "/$partition" if $partition !~ m%^/%;
+}
+
+# Set up the alarm.
+$SIG{ALRM} = sub {
+ print "AFS CRITICAL - network timeout after $TIMEOUT seconds\n";
+ exit 2;
+};
+alarm ($TIMEOUT);
+
+# Get the partinfo information and calculate the percentage free for each
+# partition. Accumulate critical messages in @critical and warnings in
+# @warnings. Accumulate all percentages in @all.
+my (@critical, @warnings, @all);
+my $command = "$VOS partinfo -server '$host'";
+$command .= " -partition $partition" if defined ($partition);
+my @data = `$command 2> /dev/null`;
+if ($? != 0) {
+ print "AFS CRITICAL - cannot contact server\n";
+ exit 2;
+}
+$partition .= ':';
+for (@data) {
+ my ($part, $free, $total) = (split)[4,5,11];
+ next if (defined ($partition) and $part ne $partition);
+ my $percent = int ((($total - $free) / $total) * 100);
+ my $used = $total - $free;
+ if ($FORMAT) {
+ $total = format_bytes ($total, mode => 'iec');
+ $free = format_bytes ($free, mode => 'iec');
+ $used = format_bytes ($used, mode => 'iec');
+ }
+ my $summary;
+ if ($partition) {
+ $summary = "$part$percent% used"
+ . " ($total total, $used used, $free free)";
+ } else {
+ $summary = "$part$percent% (free $free)";
+ }
+ if ($percent >= $CRITICAL) {
+ push (@critical, $summary);
+ } elsif ($percent >= $WARNINGS) {
+ push (@warnings, $summary);
+ }
+ if ($partition) {
+ push (@all, $summary);
+ } else {
+ push (@all, "$part$percent%");
+ }
+}
+unless (@all) {
+ print "AFS CRITICAL - no partition found\n";
+ exit 2;
+}
+
+# Exit with the appropriate error messages.
+if (@critical) {
+ print "AFS CRITICAL - @critical\n";
+ exit 2;
+} elsif (@warnings) {
+ print "AFS WARNING - @warnings\n";
+ exit 1;
+} else {
+ print "AFS OK - @all\n";
+ exit 0;
+}
+
+##############################################################################
+# Documentation
+##############################################################################
+
+=head1 NAME
+
+check_afs_space - Monitor AFS disk space usage under Nagios
+
+=head1 SYNOPSIS
+
+B<check_afs_space> [B<-hV>] [B<-c> I<threshold>] [B<-w> I<threshold>]
+ [B<-p> I<partition>] [B<-t> I<timeout>] B<-H> I<host>
+
+=head1 DESCRIPTION
+
+B<check_afs_space> is a Nagios plugin for checking free space on AFS server
+partitions. It uses C<vos partinfo> to obtain the free space on the
+partitions on an AFS server and will return an alert if the percentage of
+used space exceeds a threshold. By default, it returns a critical error
+if the used space is over 90% and a warning if it is over 85% (changaable
+with the B<-c> and B<-w> options).
+
+If C<vos partinfo> doesn't return within the timeout, B<check_afs_space>
+will return a critical error. The default timeout is 300 seconds,
+changeable with the B<-t> option.
+
+B<check_afs_space> will always print out a single line of output, giving
+the critical errors if any, otherwise giving the warnings if any,
+otherwise listing in an abbreviated form the percentage free space for all
+partitions.
+
+The check can be limited to a single partition by specifying that
+partition with the B<-p> option. In this case, more verbose information
+about the total, used, and free space is given in the one line of output.
+
+=head1 OPTIONS
+
+=over 4
+
+=item B<-c> I<threshold>, B<--critical>=I<threshold>
+
+Change the critical percentage threshold to I<threshold>, which should be
+an integer percentage. The default is 90.
+
+=item B<-H> I<host>, B<--hostname>=I<host>
+
+The AFS file server whose free space B<check_afs_space> should check. This
+option is required.
+
+=item B<-h>, B<--help>
+
+Print out this documentation (which is done simply by feeding the script
+to C<perldoc -t>).
+
+=item B<-p> I<partition>, B<--partition>=I<partition>
+
+Limit the results to the specified partition. The partition can be given
+as the partition letter (C<a>, for example) or the full partition name
+(C</vicepa>), with or without the leading slash. If this option is given,
+only that partition will be checked and more verbose information about
+total, used, and free space will be printed.
+
+=item B<-t> I<timeout>, B<--timeout>=I<timeout>
+
+Change the timeout for the C<vos partinfo> command. The default timeout
+is 300 seconds.
+
+=item B<-V>, B<--version>
+
+Print out the version of B<check_afs_space> and quit.
+
+=item B<-w> I<threshold>, B<--warning>=I<threshold>
+
+Change the warning percentage threshold to I<threshold>, which should be
+an integer percentage. The default is 85.
+
+=back
+
+=head1 EXIT STATUS
+
+B<check_afs_space> follows the standard Nagios exit status requirements.
+This means that it will exit with status 0 if there are no problems, with
+status 2 if there is at least one critical partition for that server, and
+with status 1 if there are no critical partitions but at least one warning
+partition. For other errors, such as invalid syntax, B<check_afs_space>
+will exit with status 3.
+
+=head1 BUGS
+
+The standard B<-v> verbose Nagios plugin option is not supported and
+should be. (For example, under B<-vv> we would want to show the actual
+total, free, and used byte counts, not just the percentages.)
+
+The usage message for invalid options and for the B<-h> option doesn't
+conform to Nagios standards.
+
+=head1 CAVEATS
+
+This script does not use the Nagios util library or any of the defaults
+that it provides, which makes it somewhat deficient as a Nagios plugin.
+This is intentional, though, since this script can be used with other
+monitoring systems as well. It's not clear what a good solution to this
+would be.
+
+=head1 SEE ALSO
+
+vos(1)
+
+This script is part of the afs-monitor package, which includes various AFS
+monitoring plugins for Nagios. It is available from the AFS monitoring
+tools page at L<http://www.eyrie.org/~eagle/software/afs-monitor/>.
+
+=head1 AUTHORS
+
+Originally written by Susan Feng for use with mon. Updated by Quanah
+Gibson-Mount to work with Nagios, and then further updated by Russ Allbery
+<rra@stanford.edu> to support more standard options and to use a more
+uniform coding style. Support for checking a single partition based on
+work by Steve Rader.
+
+=head1 COPYRIGHT AND LICENSE
+
+Copyright 2003, 2004, 2010 Board of Trustees, Leland Stanford Jr.
+University.
+
+This program is free software; you may redistribute it and/or modify it
+under the same terms as Perl itself.
+
+=cut
--- /dev/null
+#!/usr/bin/perl -w
+our $VERSION = '@VERSION@ @DATE@';
+#
+# check_afs_udebug -- Check AFS database servers using udebug for Nagios.
+#
+# Takes a hostname and a port number and checks the udebug output for that
+# host and port. Reports an error if the recovery state is not 1f on the sync
+# site (ensuring that it considers all of the other servers up-to-date) or if
+# any of the servers don't believe there is a sync site.
+#
+# Written by Russ Allbery <rra@stanford.edu>
+# Copyright 2004, 2010 Board of Trustees, Leland Stanford Jr. University
+#
+# This program is free software; you may redistribute it and/or modify it
+# under the same terms as Perl itself.
+
+##############################################################################
+# Modules and declarations
+##############################################################################
+
+require 5.006;
+
+use strict;
+
+use Getopt::Long qw(GetOptions);
+
+##############################################################################
+# Site configuration
+##############################################################################
+
+# The default timeout in seconds (implemented by alarm) for udebug.
+our $TIMEOUT = 10;
+
+# The full path to udebug. Make sure that this is on local disk so that
+# monitoring doesn't have an AFS dependency.
+our ($UDEBUG) = grep { -x $_ } qw(/usr/bin/udebug /usr/local/bin/udebug);
+$UDEBUG ||= '/usr/bin/udebug';
+
+##############################################################################
+# Implementation
+##############################################################################
+
+# Report a syntax error and exit. We do this via stdout in order to satisfy
+# the Nagios plugin output requirements, but also report a more conventional
+# error via stderr in case people are calling this outside of Nagios.
+sub syntax {
+ print "UBIK UNKNOWN - ", join ('', @_), "\n";
+ warn "$0: ", join ('', @_), "\n";
+ exit 3;
+}
+
+# Parse command line options.
+my ($help, $host, $port, $version);
+Getopt::Long::config ('bundling', 'no_ignore_case');
+GetOptions ('H|hostname=s' => \$host,
+ 'h|help' => \$help,
+ 'p|port=i' => \$port,
+ 't|timeout=i' => \$TIMEOUT,
+ 'V|version' => \$version)
+ or syntax ("invalid option");
+if ($help) {
+ print "Feeding myself to perldoc, please wait....\n";
+ exec ('perldoc', '-t', $0) or die "Cannot fork: $!\n";
+} elsif ($version) {
+ my $version = $VERSION;
+ print "check_afs_udebug $version\n";
+ exit 0;
+}
+syntax ("extra arguments on command line") if @ARGV;
+syntax ("host to check not specified") unless (defined $host);
+syntax ("port to check not specified") unless (defined $port);
+
+# Set up the alarm.
+$SIG{ALRM} = sub {
+ print "UBIK CRITICAL - network timeout after $TIMEOUT seconds\n";
+ exit 2;
+};
+alarm ($TIMEOUT);
+
+# Run udebug and parse the output. We're looking for three things: first,
+# we're looking to see if this host claims to be the sync site. If so, check
+# that recovery state is 1f. Otherwise, make sure that there's a defined sync
+# host.
+unless (open (UDEBUG, "$UDEBUG $host $port |")) {
+ warn "$0: cannot run udebug\n";
+ exit 3;
+}
+my ($issync, $recovery, $synchost, $db);
+while (<UDEBUG>) {
+ $issync = 1 if /^I am sync site /;
+ $recovery = 1 if /^Recovery state 1f/;
+ $synchost = 1 if /^Sync host \d+(\.\d+){3} was set /;
+ if (/Local db version is (\d+\.\d+)/) {
+ $db = "db version $1";
+ }
+}
+close UDEBUG;
+if ($? != 0) {
+ print "UBIK CRITICAL - udebug failed\n";
+ exit 2;
+}
+
+# Check the results.
+if ($issync && !$recovery) {
+ print "UBIK CRITICAL - recovery state not 1f\n";
+ exit 2;
+} elsif (!$issync && !$synchost) {
+ print "UBIK CRITICAL - no sync site\n";
+ exit 2;
+} else {
+ print "UBIK OK - $db\n";
+ exit 0;
+}
+
+##############################################################################
+# Documentation
+##############################################################################
+
+=head1 NAME
+
+check_afs_udebug - Check AFS servers for blocked connections in Nagios
+
+=head1 SYNOPSIS
+
+B<check_afs_udebug> [B<-hV>] [B<-t> I<timeout>] B<-H> I<host> B<-p> I<port>
+
+=head1 DESCRIPTION
+
+B<check_afs_udebug> is a Nagios plugin for checking AFS database servers
+to make sure the Ubik replication between the database servers is running
+correctly. B<udebug> is used to connect to the specified port on the
+specified server. The port should generally be one of 7002 (ptserver),
+7003 (vlserver), or 7004 (kaserver). The resulting output is checked to
+make sure that the recovery state is 1f if that server is the sync site,
+or that a sync site is known if that server doesn't claim to be the sync
+site.
+
+B<check_afs_udebug> will always print out a single line of output. That
+line will be C<UBIK OK> if everything is fine, or C<UBIK CRITICAL - >
+followed by an error message otherwise.
+
+=head1 OPTIONS
+
+=over 4
+
+=item B<-H> I<host>, B<--hostname>=I<host>
+
+The AFS database server whose Ubik status B<check_afs_udebug> should
+check. This option is required.
+
+=item B<-h>, B<--help>
+
+Print out this documentation (which is done simply by feeding the script
+to C<perldoc -t>).
+
+=item B<-p> I<port>, B<--port>=I<port>
+
+The port to connect to on the AFS database server. This should generally
+be one of 7002 (ptserver), 7003 (vlserver), or 7004 (kaserver). This
+option is required.
+
+=item B<-t> I<timeout>, B<--timeout>=I<timeout>
+
+Change the timeout for the B<udebug> command. The default timeout is 60
+seconds.
+
+=item B<-V>, B<--version>
+
+Print out the version of B<check_afs_udebug> and quit.
+
+=back
+
+=head1 EXIT STATUS
+
+B<check_afs_udebug> follows the standard Nagios exit status requirements.
+This means that it will exit with status 0 if there are no problems or
+with status 2 if there are critical problems. For other errors, such as
+invalid syntax, B<check_afs_udebug> will exit with status 3.
+
+=head1 BUGS
+
+The standard B<-v> verbose Nagios plugin option is not supported. It
+should print out the full B<udebug> output.
+
+The usage message for invalid options and for the B<-h> option doesn't
+conform to Nagios standards.
+
+=head1 CAVEATS
+
+This script does not use the Nagios util library or any of the defaults
+that it provides, which makes it somewhat deficient as a Nagios plugin.
+This is intentional, though, since this script can be used with other
+monitoring systems as well. It's not clear what a good solution to this
+would be.
+
+=head1 SEE ALSO
+
+This script is part of the afs-monitor package, which includes various AFS
+monitoring plugins for Nagios. It is available from the AFS monitoring
+tools page at L<http://www.eyrie.org/~eagle/software/afs-monitor/>.
+
+=head1 AUTHORS
+
+Russ Allbery <rra@stanford.edu>
+
+=head1 COPYRIGHT AND LICENSE
+
+Copyright 2004, 2010 Board of Trustees, Leland Stanford Jr. University.
+
+This program is free software; you may redistribute it and/or modify it
+under the same terms as Perl itself.
+
+=cut
+++ /dev/null
-#!/usr/bin/perl -w
-our $VERSION = '@VERSION@ @DATE@';
-#
-# check_afsspace -- Monitor AFS disk space usage under Nagios.
-#
-# Written by Susan Feng <sfeng@stanford.edu>
-# Updated by Russ Allbery <rra@stanford.edu>
-# Copyright 2003, 2004, 2010 Board of Trustees, Leland Stanford Jr. University
-#
-# This program is free software; you may redistribute it and/or modify it
-# under the same terms as Perl itself.
-#
-# Expects a host with the -H option and checks the partition usage with vos
-# partinfo. Exits with status 1 if the free space is below a warning
-# percentage and with status 2 if the free space is above a critical
-# percentage (this works with the Nagios check architecture).
-
-##############################################################################
-# Modules and declarations
-##############################################################################
-
-require 5.006;
-
-use strict;
-
-use Getopt::Long qw(GetOptions);
-
-# Use Number::Format if it's available, but don't require it.
-our $FORMAT = 0;
-eval {
- require Number::Format;
- Number::Format->import ('format_bytes');
-};
-unless ($@) {
- $FORMAT = 1;
-}
-
-##############################################################################
-# Site configuration
-##############################################################################
-
-# The default percentage full at which to warn and at which to send a critical
-# alert. These can be overridden with the -w and -c command-line options.
-our $WARNINGS = 85;
-our $CRITICAL = 90;
-
-# The default timeout in seconds (implemented by alarm) for vos partinfo.
-our $TIMEOUT = 300;
-
-# The full path to vos. Make sure that this is on local disk so that
-# monitoring doesn't have an AFS dependency.
-our ($VOS) = grep { -x $_ } qw(/usr/bin/vos /usr/local/bin/vos);
-$VOS ||= '/usr/bin/vos';
-
-##############################################################################
-# Implementation
-##############################################################################
-
-# Report a syntax error and exit. We do this via stdout in order to satisfy
-# the Nagios plugin output requirements, but also report a more conventional
-# error via stderr in case people are calling this outside of Nagios.
-sub syntax {
- print "AFS UNKNOWN - ", join ('', @_), "\n";
- warn "$0: ", join ('', @_), "\n";
- exit 3;
-}
-
-# Parse command line options.
-my ($help, $host, $partition, $version);
-Getopt::Long::config ('bundling', 'no_ignore_case');
-GetOptions ('c|critical=i' => \$CRITICAL,
- 'H|hostname=s' => \$host,
- 'h|help' => \$help,
- 'p|partition=s' => \$partition,
- 't|timeout=i' => \$TIMEOUT,
- 'V|version' => \$version,
- 'w|warning=i' => \$WARNINGS)
- or syntax ("invalid option");
-if ($help) {
- print "Feeding myself to perldoc, please wait....\n";
- exec ('perldoc', '-t', $0) or die "Cannot fork: $!\n";
-} elsif ($version) {
- my $version = $VERSION;
- print "check_afsspace $version\n";
- exit 0;
-}
-syntax ("extra arguments on command line") if @ARGV;
-syntax ("host to check not specified") unless (defined $host);
-if ($WARNINGS > $CRITICAL) {
- syntax ("warning level $WARNINGS greater than critical level $CRITICAL");
-}
-if ($partition) {
- $partition = "/vicep$partition" if length ($partition) <= 2;
- $partition = "/$partition" if $partition !~ m%^/%;
-}
-
-# Set up the alarm.
-$SIG{ALRM} = sub {
- print "AFS CRITICAL - network timeout after $TIMEOUT seconds\n";
- exit 2;
-};
-alarm ($TIMEOUT);
-
-# Get the partinfo information and calculate the percentage free for each
-# partition. Accumulate critical messages in @critical and warnings in
-# @warnings. Accumulate all percentages in @all.
-my (@critical, @warnings, @all);
-my $command = "$VOS partinfo -server '$host'";
-$command .= " -partition $partition" if defined ($partition);
-my @data = `$command 2> /dev/null`;
-if ($? != 0) {
- print "AFS CRITICAL - cannot contact server\n";
- exit 2;
-}
-$partition .= ':';
-for (@data) {
- my ($part, $free, $total) = (split)[4,5,11];
- next if (defined ($partition) and $part ne $partition);
- my $percent = int ((($total - $free) / $total) * 100);
- my $used = $total - $free;
- if ($FORMAT) {
- $total = format_bytes ($total, mode => 'iec');
- $free = format_bytes ($free, mode => 'iec');
- $used = format_bytes ($used, mode => 'iec');
- }
- my $summary;
- if ($partition) {
- $summary = "$part$percent% used"
- . " ($total total, $used used, $free free)";
- } else {
- $summary = "$part$percent% (free $free)";
- }
- if ($percent >= $CRITICAL) {
- push (@critical, $summary);
- } elsif ($percent >= $WARNINGS) {
- push (@warnings, $summary);
- }
- if ($partition) {
- push (@all, $summary);
- } else {
- push (@all, "$part$percent%");
- }
-}
-unless (@all) {
- print "AFS CRITICAL - no partition found\n";
- exit 2;
-}
-
-# Exit with the appropriate error messages.
-if (@critical) {
- print "AFS CRITICAL - @critical\n";
- exit 2;
-} elsif (@warnings) {
- print "AFS WARNING - @warnings\n";
- exit 1;
-} else {
- print "AFS OK - @all\n";
- exit 0;
-}
-
-##############################################################################
-# Documentation
-##############################################################################
-
-=head1 NAME
-
-check_afsspace - Monitor AFS disk space usage under Nagios
-
-=head1 SYNOPSIS
-
-B<check_afsspace> [B<-hV>] [B<-c> I<threshold>] [B<-w> I<threshold>]
- [B<-p> I<partition>] [B<-t> I<timeout>] B<-H> I<host>
-
-=head1 DESCRIPTION
-
-B<check_afsspace> is a Nagios plugin for checking free space on AFS server
-partitions. It uses C<vos partinfo> to obtain the free space on the
-partitions on an AFS server and will return an alert if the percentage of
-used space exceeds a threshold. By default, it returns a critical error
-if the used space is over 90% and a warning if it is over 85% (changaable
-with the B<-c> and B<-w> options).
-
-If C<vos partinfo> doesn't return within the timeout, B<check_afsspace>
-will return a critical error. The default timeout is 300 seconds,
-changeable with the B<-t> option.
-
-B<check_afsspace> will always print out a single line of output, giving
-the critical errors if any, otherwise giving the warnings if any,
-otherwise listing in an abbreviated form the percentage free space for all
-partitions.
-
-The check can be limited to a single partition by specifying that
-partition with the B<-p> option. In this case, more verbose information
-about the total, used, and free space is given in the one line of output.
-
-=head1 OPTIONS
-
-=over 4
-
-=item B<-c> I<threshold>, B<--critical>=I<threshold>
-
-Change the critical percentage threshold to I<threshold>, which should be
-an integer percentage. The default is 90.
-
-=item B<-H> I<host>, B<--hostname>=I<host>
-
-The AFS file server whose free space B<check_afsspace> should check. This
-option is required.
-
-=item B<-h>, B<--help>
-
-Print out this documentation (which is done simply by feeding the script
-to C<perldoc -t>).
-
-=item B<-p> I<partition>, B<--partition>=I<partition>
-
-Limit the results to the specified partition. The partition can be given
-as the partition letter (C<a>, for example) or the full partition name
-(C</vicepa>), with or without the leading slash. If this option is given,
-only that partition will be checked and more verbose information about
-total, used, and free space will be printed.
-
-=item B<-t> I<timeout>, B<--timeout>=I<timeout>
-
-Change the timeout for the C<vos partinfo> command. The default timeout
-is 300 seconds.
-
-=item B<-V>, B<--version>
-
-Print out the version of B<check_afsspace> and quit.
-
-=item B<-w> I<threshold>, B<--warning>=I<threshold>
-
-Change the warning percentage threshold to I<threshold>, which should be
-an integer percentage. The default is 85.
-
-=back
-
-=head1 EXIT STATUS
-
-B<check_afsspace> follows the standard Nagios exit status requirements.
-This means that it will exit with status 0 if there are no problems, with
-status 2 if there is at least one critical partition for that server, and
-with status 1 if there are no critical partitions but at least one warning
-partition. For other errors, such as invalid syntax, B<check_afsspace>
-will exit with status 3.
-
-=head1 BUGS
-
-The standard B<-v> verbose Nagios plugin option is not supported and
-should be. (For example, under B<-vv> we would want to show the actual
-total, free, and used byte counts, not just the percentages.)
-
-The usage message for invalid options and for the B<-h> option doesn't
-conform to Nagios standards.
-
-=head1 CAVEATS
-
-This script does not use the Nagios util library or any of the defaults
-that it provides, which makes it somewhat deficient as a Nagios plugin.
-This is intentional, though, since this script can be used with other
-monitoring systems as well. It's not clear what a good solution to this
-would be.
-
-=head1 SEE ALSO
-
-vos(1)
-
-This script is part of the afs-monitor package, which includes various AFS
-monitoring plugins for Nagios. It is available from the AFS monitoring
-tools page at L<http://www.eyrie.org/~eagle/software/afs-monitor/>.
-
-=head1 AUTHORS
-
-Originally written by Susan Feng for use with mon. Updated by Quanah
-Gibson-Mount to work with Nagios, and then further updated by Russ Allbery
-<rra@stanford.edu> to support more standard options and to use a more
-uniform coding style. Support for checking a single partition based on
-work by Steve Rader.
-
-=head1 COPYRIGHT AND LICENSE
-
-Copyright 2003, 2004, 2010 Board of Trustees, Leland Stanford Jr.
-University.
-
-This program is free software; you may redistribute it and/or modify it
-under the same terms as Perl itself.
-
-=cut
+++ /dev/null
-#!/usr/bin/perl -w
-our $VERSION = '@VERSION@ @DATE@';
-#
-# check_bos -- Monitor AFS bos output for problems in Nagios.
-#
-# Written by Russ Allbery <rra@stanford.edu>
-# Based on an earlier script by Neil Crellin <neilc@stanford.edu>
-# Copyright 2003, 2004, 2010 Board of Trustees, Leland Stanford Jr. University
-#
-# This program is free software; you may redistribute it and/or modify it
-# under the same terms as Perl itself.
-#
-# Given an AFS server (file or VLDB), runs bos status on each one. Checks to
-# see if there is a communication failure, and also checks to see if anything
-# in the output looks unusual or wrong. If either of these conditions are
-# true, print that information to STDOUT. Suitable for being run inside
-# Nagios.
-
-##############################################################################
-# Modules and declarations
-##############################################################################
-
-require 5.006;
-
-use strict;
-
-use Getopt::Long qw(GetOptions);
-
-##############################################################################
-# Site configuration
-##############################################################################
-
-# The full path to bos. Make sure that this is on local disk so that
-# monitoring doesn't have an AFS dependency.
-our ($BOS) = grep { -x $_ } qw(/usr/bin/bos /usr/local/bin/bos);
-$BOS ||= '/usr/bin/bos';
-
-# The default timeout in seconds (implemented by alarm) for rxdebug.
-our $TIMEOUT = 10;
-
-# The list of regular expressions matching expected output. You may need to
-# customize this for what you're running at your site. Any output from bos
-# that doesn't match one of these regular expressions or the warning regular
-# expressions below will throw a critical error.
-our @OKAY = (
- qr/^\s*$/,
- qr/^bos: running unauthenticated$/,
- qr/^Instance\ \S+,\ \(type\ is\ \S+\)(\ has\ core\ file,)?
- \ currently\ running\ normally\.$/x,
- qr/^\s*Auxiliary status is: file server running\.$/,
- qr/^\s*Process last started at /,
- qr/^\s*Last exit at /,
- qr/^\s*Last error exit at /,
- qr/^\s*Command \d+ is /
-);
-
-# The list of regular expressions that match output that should produce a
-# warning. You may need to customize this for what you expect at your site.
-our @WARNINGS = (
- qr/^\s*Bosserver reports inappropriate access on server directories/
-);
-
-##############################################################################
-# Implementation
-##############################################################################
-
-# Report a syntax error and exit. We do this via stdout in order to satisfy
-# the Nagios plugin output requirements, but also report a more conventional
-# error via stderr in case people are calling this outside of Nagios.
-sub syntax {
- print "BOS UNKNOWN - ", join ('', @_), "\n";
- warn "$0: ", join ('', @_), "\n";
- exit 3;
-}
-
-# Parse command line options.
-my ($help, $host, $version);
-Getopt::Long::config ('bundling', 'no_ignore_case');
-GetOptions ('H|hostname=s' => \$host,
- 'h|help' => \$help,
- 't|timeout=i' => \$TIMEOUT,
- 'V|version' => \$version)
- or syntax ("invalid option");
-if ($help) {
- print "Feeding myself to perldoc, please wait....\n";
- exec ('perldoc', '-t', $0) or die "Cannot fork: $!\n";
-} elsif ($version) {
- my $version = $VERSION;
- print "check_bos $version\n";
- exit 0;
-}
-syntax ("extra arguments on command line") if @ARGV;
-syntax ("host to check not specified") unless (defined $host);
-
-# Set up the alarm.
-$SIG{ALRM} = sub {
- print "BOS CRITICAL - network timeout after $TIMEOUT seconds\n";
- exit 2;
-};
-alarm ($TIMEOUT);
-
-# Collect the bos output into a variable.
-unless (open (BOS, "$BOS status $host -noauth -long 2>&1 |")) {
- print "BOS UNKNOWN - cannot run bos\n";
- exit 3;
-}
-my @bos = <BOS>;
-close BOS;
-
-# Make sure that bos was successful. Note that it generally does return
-# success even if it can't contact the bos server.
-if ($? != 0) {
- print "BOS CRITICAL - bos status failed\n";
- exit 2;
-}
-
-# Scan the output. If we see anything that we don't expect, immediately
-# report it as a fatal error.
-my $count = 0;
-my $prev_line = '';
-for my $line (@bos) {
- my $okay = 0;
- my $warn = 0;
- for my $regex (@OKAY) {
- if ($line =~ /$regex/) {
- $okay = 1;
- last;
- }
- }
- for my $regex (@WARNINGS) {
- if ($line =~ /$regex/) {
- $warn = 1;
- last;
- }
- }
- unless ($okay || $warn) {
- $line =~ s/^\s+//;
- $line =~ s/\s+$//;
- if ($prev_line =~ /^Instance salvage,/ && $line =~ /running now/) {
- print "BOS WARNING - salvage is running\n";
- exit 1;
- } else {
- print "BOS CRITICAL - $line\n";
- exit 2;
- }
- }
- if ($warn) {
- $line =~ s/^\s+//;
- $line =~ s/\s+$//;
- print "BOS WARNING - $line\n";
- exit 1;
- }
- $count++ if ($line =~ /currently running normally\.$/);
- $prev_line = $line;
-}
-if ($count == 1) {
- print "BOS OK - one process running normally\n";
-} else {
- print "BOS OK - $count processes running normally\n";
-}
-exit 0;
-
-##############################################################################
-# Documentation
-##############################################################################
-
-=head1 NAME
-
-check_bos - Monitor AFS bos output for problems in Nagios
-
-=head1 SYNOPSIS
-
-B<check_bos> [B<-hV>] [B<-t> I<timeout>] B<-H> I<host>
-
-=head1 DESCRIPTION
-
-B<check_bos> is a Nagios plugin for querying the AFS bosserver for process
-status and reporting an alert if there are any unexpected lines in the bos
-output. The acceptable lines of output from B<bos> are configured at the
-top of this script; they should be generally suitable for most sites, but
-may require some customization.
-
-B<check_bos> will always print out a single line of output. If there is a
-line that isn't matched by any regexes identifying acceptable lines, it
-will output the first non-matching line prefixed by C<BOS CRITICAL>. If
-the salvager is running (such as when started by C<bos salvage>) or other
-warnings are found, it will print that warning information prefixed by
-C<BOS WARNING>. Otherwise, it will output C<BOS OK>. Note that this
-monitoring may not catch such things as a service being constantly
-restarted if it happens to be up and running normally each time the probe
-runs; it doesn't pay any attention to the last start time, the last error
-exit status, the presence of core files, and the like. It mostly just
-looks for the "running normally" part of the B<bos> output and makes sure
-the auxilliary status is also "running normally" for a file server
-process.
-
-=head1 OPTIONS
-
-=over 4
-
-=item B<-H> I<host>, B<--hostname>=I<host>
-
-The AFS server whose B<bos> status B<check_bos> should check. This option
-is required.
-
-=item B<-h>, B<--help>
-
-Print out this documentation (which is done simply by feeding the script
-to C<perldoc -t>).
-
-=item B<-t> I<timeout>, B<--timeout>=I<timeout>
-
-Change the timeout for the B<bos> command. The default timeout is 10
-seconds.
-
-=item B<-V>, B<--version>
-
-Print out the version of B<check_bos> and quit.
-
-=back
-
-=head1 EXIT STATUS
-
-B<check_bos> follows the standard Nagios exit status requirements. This
-means that it will exit with status 0 if there are no problems, with
-status 1 if the salvager is running, or with status 2 if there is a
-problem detected. For other errors, such as invalid syntax, B<check_bos>
-will exit with status 3.
-
-=head1 BUGS
-
-The standard B<-v> verbose Nagios plugin option is not supported. It
-should display the complete bos status output.
-
-The usage message for invalid options and for the B<-h> option doesn't
-conform to Nagios standards.
-
-=head1 CAVEATS
-
-This script does not use the Nagios util library or any of the defaults
-that it provides, which makes it somewhat deficient as a Nagios plugin.
-This is intentional, though, since this script can be used with other
-monitoring systems as well. It's not clear what a good solution to this
-would be.
-
-=head1 SEE ALSO
-
-This script is part of the afs-monitor package, which includes various AFS
-monitoring plugins for Nagios. It is available from the AFS monitoring
-tools page at L<http://www.eyrie.org/~eagle/software/afs-monitor/>.
-
-=head1 AUTHORS
-
-The original idea behind this script was from Neil Crellin. Russ Allbery
-<rra@stanford.edu> updated it to work with Nagios and stripped out some
-rather neat but now unnecessary code to look for any changes in the bos
-output, instead just scanning it for acceptable lines.
-
-=head1 COPYRIGHT AND LICENSE
-
-Copyright 2003, 2004, 2010 Board of Trustees, Leland Stanford Jr.
-University.
-
-This program is free software; you may redistribute it and/or modify it
-under the same terms as Perl itself.
-
-=cut
+++ /dev/null
-#!/usr/bin/perl -w
-our $VERSION = '@VERSION@ @DATE@';
-#
-# check_rxdebug -- Nagios AFS server check for waiting connections.
-#
-# Written by Quanah Gibson-Mount based on work by Neil Crellin
-# Updated by Russ Allbery <rra@stanford.edu>
-# Copyright 2003, 2004, 2005, 2010
-# Board of Trustees, Leland Stanford Jr. University
-#
-# This program is free software; you may redistribute it and/or modify it
-# under the same terms as Perl itself.
-#
-# Expects a file server with the -H option and runs rxdebug against that file
-# server, looking for any connections that are waiting for a thread. Exits
-# with status 1 if there are more than two connections in that state (a
-# warning) and with status 2 if there are more than eight connections in that
-# state. The thresholds can be overridden from the command line.
-
-##############################################################################
-# Modules and declarations
-##############################################################################
-
-require 5.006;
-
-use strict;
-
-use Getopt::Long qw(GetOptions);
-
-##############################################################################
-# Site configuration
-##############################################################################
-
-# The default count of blocked connections at which to warn or send a critical
-# alert. These can be overridden with the -w and -c command-line options.
-our $WARNINGS = 2;
-our $CRITICAL = 8;
-
-# The default timeout in seconds (implemented by alarm) for rxdebug.
-our $TIMEOUT = 60;
-
-# The full path to rxdebug. Make sure that this is on local disk so that
-# monitoring doesn't have an AFS dependency.
-our ($RXDEBUG) = grep { -x $_ }
- qw(/usr/bin/rxdebug /usr/sbin/rxdebug /usr/local/bin/rxdebug
- /usr/local/sbin/rxdebug);
-$RXDEBUG ||= '/usr/bin/rxdebug';
-
-##############################################################################
-# Implementation
-##############################################################################
-
-# Report a syntax error and exit. We do this via stdout in order to satisfy
-# the Nagios plugin output requirements, but also report a more conventional
-# error via stderr in case people are calling this outside of Nagios.
-sub syntax {
- print "AFS UNKNOWN - ", join ('', @_), "\n";
- warn "$0: ", join ('', @_), "\n";
- exit 3;
-}
-
-# Parse command line options.
-my ($help, $host, $version);
-Getopt::Long::config ('bundling', 'no_ignore_case');
-GetOptions ('c|critical=i' => \$CRITICAL,
- 'H|hostname=s' => \$host,
- 'h|help' => \$help,
- 't|timeout=i' => \$TIMEOUT,
- 'V|version' => \$version,
- 'w|warning=i' => \$WARNINGS)
- or syntax ("invalid option");
-if ($help) {
- print "Feeding myself to perldoc, please wait....\n";
- exec ('perldoc', '-t', $0) or die "Cannot fork: $!\n";
-} elsif ($version) {
- my $version = $VERSION;
- print "check_rxdebug $version\n";
- exit 0;
-}
-syntax ("extra arguments on command line") if @ARGV;
-syntax ("host to check not specified") unless (defined $host);
-if ($WARNINGS > $CRITICAL) {
- syntax ("warning level $WARNINGS greater than critical level $CRITICAL");
-}
-
-# Set up the alarm.
-$SIG{ALRM} = sub {
- print "AFS CRITICAL - network timeout after $TIMEOUT seconds\n";
- exit 2;
-};
-alarm ($TIMEOUT);
-
-# Run rxdebug and parse the output to find the number of calls waiting for a
-# thread.
-unless (open (RXDEBUG, "$RXDEBUG $host -noconn |")) {
- warn "$0: cannot run rxdebug\n";
- exit 3;
-}
-my $blocked;
-while (<RXDEBUG>) {
- if (/^(\d+) calls waiting for a thread/) {
- $blocked = $1;
- last;
- }
-}
-close RXDEBUG;
-if ($? != 0) {
- print "AFS CRITICAL - cannot contact server\n";
- exit 2;
-}
-unless (defined $blocked) {
- print "AFS CRITICAL - cannot parse rxdebug output\n";
- exit 2;
-}
-
-# Check the connection count against our limits and make sure that it's okay.
-if ($blocked >= $CRITICAL) {
- print "AFS CRITICAL - $blocked blocked connections\n";
- exit 2;
-} elsif ($blocked >= $WARNINGS) {
- print "AFS WARNING - $blocked blocked connections\n";
- exit 1;
-} else {
- print "AFS OK - $blocked blocked connections\n";
- exit 0;
-}
-
-##############################################################################
-# Documentation
-##############################################################################
-
-=head1 NAME
-
-check_rxdebug - Check AFS servers for blocked connections in Nagios
-
-=head1 SYNOPSIS
-
-B<check_rxdebug> [B<-hV>] [B<-c> I<threshold>] [B<-w> I<threshold>]
- [B<-t> I<timeout>] B<-H> I<host>
-
-=head1 DESCRIPTION
-
-B<check_rxdebug> is a Nagios plugin for checking AFS file servers to see
-if there are client connections waiting for a free thread. If there are
-more than a few of these, AFS performance tends to be very slow; this is a
-fairly reliable way to catch overloaded file servers. By default,
-B<check_rxdebug> returns a critical error if there are more than eight
-connections waiting for a free thread and a warning if there are more than
-two. These thresholds can be changed with the B<-c> and B<-w> options.
-
-B<check_rxdebug> will always print out a single line of output including
-the number of blocked connections, displaying whether this is critical, a
-warning, or okay.
-
-=head1 OPTIONS
-
-=over 4
-
-=item B<-c> I<threshold>, B<--critical>=I<threshold>
-
-Change the critical blocked connection count threshold to I<threshold>,
-which should be an integer. The default is 8.
-
-=item B<-H> I<host>, B<--hostname>=I<host>
-
-The AFS file server whose connections B<check_rxdebug> should check. This
-option is required.
-
-=item B<-h>, B<--help>
-
-Print out this documentation (which is done simply by feeding the script
-to C<perldoc -t>).
-
-=item B<-t> I<timeout>, B<--timeout>=I<timeout>
-
-Change the timeout for the B<rxdebug> command. The default timeout is 60
-seconds.
-
-=item B<-V>, B<--version>
-
-Print out the version of B<check_rxdebug> and quit.
-
-=item B<-w> I<threshold>, B<--warning>=I<threshold>
-
-Change the warning blocked connection threshold to I<threshold>, which
-should be an integer. The default is 2.
-
-=back
-
-=head1 EXIT STATUS
-
-B<check_rxdebug> follows the standard Nagios exit status requirements.
-This means that it will exit with status 0 if there are no problems, with
-status 1 if there is a warning, and with status 2 if there is a critical
-problem. For other errors, such as invalid syntax, B<check_rxdebug> will
-exit with status 3.
-
-=head1 BUGS
-
-The standard B<-v> verbose Nagios plugin option is not supported, although
-it's not entirely clear what it would add.
-
-The usage message for invalid options and for the B<-h> option doesn't
-conform to Nagios standards.
-
-=head1 CAVEATS
-
-This script does not use the Nagios util library or any of the defaults
-that it provides, which makes it somewhat deficient as a Nagios plugin.
-This is intentional, though, since this script can be used with other
-monitoring systems as well. It's not clear what a good solution to this
-would be.
-
-=head1 SEE ALSO
-
-This script is part of the afs-monitor package, which includes various AFS
-monitoring plugins for Nagios. It is available from the AFS monitoring
-tools page at L<http://www.eyrie.org/~eagle/software/afs-monitor/>.
-
-=head1 AUTHORS
-
-The original idea behind this script was from Neil Crellin. It was
-updated by Quanah Gibson-Mount to work with Nagios, and then further
-updated by Russ Allbery <rra@stanford.edu> to support more standard
-options and to use a more uniform coding style.
-
-=head1 COPYRIGHT AND LICENSE
-
-Copyright 2003, 2004, 2005, 2010 Board of Trustees, Leland Stanford
-Jr. University.
-
-This program is free software; you may redistribute it and/or modify it
-under the same terms as Perl itself.
-
-=cut
+++ /dev/null
-#!/usr/bin/perl -w
-our $VERSION = '@VERSION@ @DATE@';
-#
-# check_udebug -- Check AFS database servers using udebug for Nagios.
-#
-# Written by Russ Allbery <rra@stanford.edu>
-# Copyright 2004, 2010 Board of Trustees, Leland Stanford Jr. University
-#
-# This program is free software; you may redistribute it and/or modify it
-# under the same terms as Perl itself.
-#
-# Takes a hostname and a port number and checks the udebug output for that
-# host and port. Reports an error if the recovery state is not 1f on the sync
-# site (ensuring that it considers all of the other servers up-to-date) or if
-# any of the servers don't believe there is a sync site.
-
-##############################################################################
-# Modules and declarations
-##############################################################################
-
-require 5.006;
-
-use strict;
-
-use Getopt::Long qw(GetOptions);
-
-##############################################################################
-# Site configuration
-##############################################################################
-
-# The default timeout in seconds (implemented by alarm) for udebug.
-our $TIMEOUT = 10;
-
-# The full path to udebug. Make sure that this is on local disk so that
-# monitoring doesn't have an AFS dependency.
-our ($UDEBUG) = grep { -x $_ } qw(/usr/bin/udebug /usr/local/bin/udebug);
-$UDEBUG ||= '/usr/bin/udebug';
-
-##############################################################################
-# Implementation
-##############################################################################
-
-# Report a syntax error and exit. We do this via stdout in order to satisfy
-# the Nagios plugin output requirements, but also report a more conventional
-# error via stderr in case people are calling this outside of Nagios.
-sub syntax {
- print "UBIK UNKNOWN - ", join ('', @_), "\n";
- warn "$0: ", join ('', @_), "\n";
- exit 3;
-}
-
-# Parse command line options.
-my ($help, $host, $port, $version);
-Getopt::Long::config ('bundling', 'no_ignore_case');
-GetOptions ('H|hostname=s' => \$host,
- 'h|help' => \$help,
- 'p|port=i' => \$port,
- 't|timeout=i' => \$TIMEOUT,
- 'V|version' => \$version)
- or syntax ("invalid option");
-if ($help) {
- print "Feeding myself to perldoc, please wait....\n";
- exec ('perldoc', '-t', $0) or die "Cannot fork: $!\n";
-} elsif ($version) {
- my $version = $VERSION;
- print "check_udebug $version\n";
- exit 0;
-}
-syntax ("extra arguments on command line") if @ARGV;
-syntax ("host to check not specified") unless (defined $host);
-syntax ("port to check not specified") unless (defined $port);
-
-# Set up the alarm.
-$SIG{ALRM} = sub {
- print "UBIK CRITICAL - network timeout after $TIMEOUT seconds\n";
- exit 2;
-};
-alarm ($TIMEOUT);
-
-# Run udebug and parse the output. We're looking for three things: first,
-# we're looking to see if this host claims to be the sync site. If so, check
-# that recovery state is 1f. Otherwise, make sure that there's a defined sync
-# host.
-unless (open (UDEBUG, "$UDEBUG $host $port |")) {
- warn "$0: cannot run udebug\n";
- exit 3;
-}
-my ($issync, $recovery, $synchost, $db);
-while (<UDEBUG>) {
- $issync = 1 if /^I am sync site /;
- $recovery = 1 if /^Recovery state 1f/;
- $synchost = 1 if /^Sync host \d+(\.\d+){3} was set /;
- if (/Local db version is (\d+\.\d+)/) {
- $db = "db version $1";
- }
-}
-close UDEBUG;
-if ($? != 0) {
- print "UBIK CRITICAL - udebug failed\n";
- exit 2;
-}
-
-# Check the results.
-if ($issync && !$recovery) {
- print "UBIK CRITICAL - recovery state not 1f\n";
- exit 2;
-} elsif (!$issync && !$synchost) {
- print "UBIK CRITICAL - no sync site\n";
- exit 2;
-} else {
- print "UBIK OK - $db\n";
- exit 0;
-}
-
-##############################################################################
-# Documentation
-##############################################################################
-
-=head1 NAME
-
-check_udebug - Check AFS servers for blocked connections in Nagios
-
-=head1 SYNOPSIS
-
-B<check_udebug> [B<-hV>] [B<-t> I<timeout>] B<-H> I<host> B<-p> I<port>
-
-=head1 DESCRIPTION
-
-B<check_udebug> is a Nagios plugin for checking AFS database servers to
-make sure the Ubik replication between the database servers is running
-correctly. B<udebug> is used to connect to the specified port on the
-specified server. The port should generally be one of 7002 (ptserver),
-7003 (vlserver), or 7004 (kaserver). The resulting output is checked to
-make sure that the recovery state is 1f if that server is the sync site,
-or that a sync site is known if that server doesn't claim to be the sync
-site.
-
-B<check_udebug> will always print out a single line of output. That line
-will be C<UBIK OK> if everything is fine, or C<UBIK CRITICAL - > followed
-by an error message otherwise.
-
-=head1 OPTIONS
-
-=over 4
-
-=item B<-H> I<host>, B<--hostname>=I<host>
-
-The AFS database server whose Ubik status B<check_udebug> should check.
-This option is required.
-
-=item B<-h>, B<--help>
-
-Print out this documentation (which is done simply by feeding the script
-to C<perldoc -t>).
-
-=item B<-p> I<port>, B<--port>=I<port>
-
-The port to connect to on the AFS database server. This should generally
-be one of 7002 (ptserver), 7003 (vlserver), or 7004 (kaserver). This
-option is required.
-
-=item B<-t> I<timeout>, B<--timeout>=I<timeout>
-
-Change the timeout for the B<udebug> command. The default timeout is 60
-seconds.
-
-=item B<-V>, B<--version>
-
-Print out the version of B<check_udebug> and quit.
-
-=back
-
-=head1 EXIT STATUS
-
-B<check_udebug> follows the standard Nagios exit status requirements.
-This means that it will exit with status 0 if there are no problems or
-with status 2 if there are critical problems. For other errors, such as
-invalid syntax, B<check_udebug> will exit with status 3.
-
-=head1 BUGS
-
-The standard B<-v> verbose Nagios plugin option is not supported. It
-should print out the full B<udebug> output.
-
-The usage message for invalid options and for the B<-h> option doesn't
-conform to Nagios standards.
-
-=head1 CAVEATS
-
-This script does not use the Nagios util library or any of the defaults
-that it provides, which makes it somewhat deficient as a Nagios plugin.
-This is intentional, though, since this script can be used with other
-monitoring systems as well. It's not clear what a good solution to this
-would be.
-
-=head1 SEE ALSO
-
-This script is part of the afs-monitor package, which includes various AFS
-monitoring plugins for Nagios. It is available from the AFS monitoring
-tools page at L<http://www.eyrie.org/~eagle/software/afs-monitor/>.
-
-=head1 AUTHORS
-
-Russ Allbery <rra@stanford.edu>
-
-=head1 COPYRIGHT AND LICENSE
-
-Copyright 2004, 2010 Board of Trustees, Leland Stanford Jr. University.
-
-This program is free software; you may redistribute it and/or modify it
-under the same terms as Perl itself.
-
-=cut