From 06c628e25046354e8daa02ccccf2913229cb2351 Mon Sep 17 00:00:00 2001 From: Russ Allbery Date: Wed, 10 Nov 2010 16:07:42 -0800 Subject: [PATCH] Rename all scripts to start with check_afs Rename check_afsspace to check_afs_space, check_bos to check_afs_bos, check_rxdebug to check_afs_rxdebug, and check_udebug to check_afs_udebug for more consistent naming and easier identification of the AFS Nagios probes. --- NEWS | 29 ++++++++------- README | 33 ++++++++--------- check_bos => check_afs_bos | 58 +++++++++++++++--------------- check_rxdebug => check_afs_rxdebug | 48 ++++++++++++------------- check_afsspace => check_afs_space | 32 ++++++++--------- check_udebug => check_afs_udebug | 38 ++++++++++---------- 6 files changed, 122 insertions(+), 116 deletions(-) rename check_bos => check_afs_bos (84%) rename check_rxdebug => check_afs_rxdebug (85%) rename check_afsspace => check_afs_space (93%) rename check_udebug => check_afs_udebug (87%) diff --git a/NEWS b/NEWS index da60513..be8a1f3 100644 --- a/NEWS +++ b/NEWS @@ -5,37 +5,42 @@ afs-monitor 2.0 (unreleased) Initial tarball release, based on check_afsspace 1.16, check_bos 1.7, check_rxdebug 1.11, and check_udebug 1.3. + Rename check_afsspace to check_afs_space, check_bos to check_afs_bos, + check_rxdebug to check_afs_rxdebug, and check_udebug to + check_afs_udebug for more consistent naming and easier identification + of the AFS Nagios probes. + Add check_afs_quotas, which monitors AFS volumes for quota usage, either for specific volumes or for all volumes on a particular server (and optionally partition). Based on a script by Steve Rader. - Support checking a single partition in check_afsspace and print more + Support checking a single partition in check_afs_space and print more verbose information about total, used, and free space in that mode. Format partition sizes using Number::Format if available. Based on work by Steve Rader. If the salvager is running (such as when started manually with bos - salvage), check_bos now reports a warning stating that, rather than a - critical error showing the auxiliary status line. Reported by Steve - Rader. + salvage), check_afs_wbos now reports a warning stating that, rather + than a critical error showing the auxiliary status line. Reported by + Steve Rader. Print an UNKNOWN status on standard output on syntax errors in all scripts, rather than reporting the problem only to standard error. Check that the host to check was specified and report a syntax error if it wasn't. Thanks, Tobias Wolter. - Ignore "bos: running unauthenticated" in check_bos, since bos status - is always run unauthenticated. + Ignore "bos: running unauthenticated" in check_afs_bos, since bos + status is always run unauthenticated. - Add support for reporting warnings in check_bos and report a warning - if there is inappropriate access on server directories. Patch from - Steve Rader. + Add support for reporting warnings in check_afs_bos and report a + warning if there is inappropriate access on server directories. Patch + from Steve Rader. - If check_bos is successful, report the number of instances running + If check_afs_bos is successful, report the number of instances running normally. Patch from Steve Rader. Look for rxdebug in /usr/sbin and /usr/local/sbin since OpenAFS installs it into sbindir by default. - Report the database version as extra information in check_udebug if - there are no errors or warnings. + Report the database version as extra information in check_afs_udebug + if there are no errors or warnings. diff --git a/README b/README index 2a5aed7..957e00b 100644 --- a/README +++ b/README @@ -11,12 +11,12 @@ BLURB afs-monitor provides Nagios-compatible probe scripts that can be used to monitor AFS servers. It contains five scripts: check_afs_quotas, which - monitors AFS volumes for quota usage; check_afsspace, which monitors - file server partitions for disk usage; check_bos, which monitors any + monitors AFS volumes for quota usage; check_afs_space, which monitors + file server partitions for disk usage; check_afs_bos, which monitors any bosserver-managed set of processes for problems reported by bos; - check_rxdebug, which monitors AFS fileservers for connections waiting - for a thread; and check_udebug, which monitors Ubik services (such as - vlserver and ptserver) for replication and quorum problems. + check_afs_rxdebug, which monitors AFS fileservers for connections + waiting for a thread; and check_afs_udebug, which monitors Ubik services + (such as vlserver and ptserver) for replication and quorum problems. DESCRIPTION @@ -33,17 +33,17 @@ DESCRIPTION server or server partition for quota usage and reports errors or warnings if the used space is over a configurable threshold. - check_afsspace uses vos partinfo to check the available space on each + check_afs_space uses vos partinfo to check the available space on each partition on a file server. It reports a critical error if the percentage used is above a configurable threshold (90% by default) and a warning if it is above a lower configurable threshold (85% by default). - check_bos runs bos status on a file server or volume location server and - scans the output, making sure that all commands are running normally and - the file server isn't salvaging. If it sees any output it doesn't + check_afs_bos runs bos status on a file server or volume location server + and scans the output, making sure that all commands are running normally + and the file server isn't salvaging. If it sees any output it doesn't expect from bos status, it reports that output in an alert. - check_rxdebug runs rxdebug against a file server and looks for any + check_afs_rxdebug runs rxdebug against a file server and looks for any client connections that are in the state "waiting for a thread." This indicates client connections that are blocked waiting for a file server thread. We've found this to be a reliable test for detecting serious @@ -52,7 +52,7 @@ DESCRIPTION and a warning if it is above a lower configurable threshold (2 by default). - check_udebug runs udebug against a ubik service (vlserver, ptserver, + check_afs_udebug runs udebug against a ubik service (vlserver, ptserver, kaserver, or buserver) and makes sure that it is in a reasonable state. It checked to be sure that there is a sync site for the service, and when there is, that the sync site believes that the recovery state is 1f @@ -61,7 +61,8 @@ DESCRIPTION These scripts were written by Xueshan Feng, Neil Crellin, Quanah Gibson-Mount, and Russ Allbery and are currently maintained by Russ - Allbery. + Allbery. Many modifications to the scripts were based on work by Steve + Rader. REQUIREMENTS @@ -70,7 +71,7 @@ REQUIREMENTS bos, rxdebug, and udebug) and expect them to be in either /usr/bin or in /usr/local/bin. - check_afs_quotas and check_afsspace will use Number::Format, if + check_afs_quotas and check_afs_space will use Number::Format, if available, to format sizes with IEC 60027 prefixes. INSTALLATION @@ -84,13 +85,13 @@ INSTALLATION You can then use the scripts directly with commands such as: - check_afsspace -H afs1 + check_afs_space -H afs1 To use the scripts in a Nagios probe, configure a command such as: define command { - command_name check_afsspace - command_line /path/to/install/check_afsspace -H $HOSTADDRESS$ + command_name check_afs_space + command_line /path/to/install/check_afs_space -H $HOSTADDRESS$ } changing the path to the script to wherever you installed the scripts. diff --git a/check_bos b/check_afs_bos similarity index 84% rename from check_bos rename to check_afs_bos index 70f4533..43be08c 100755 --- a/check_bos +++ b/check_afs_bos @@ -1,7 +1,13 @@ #!/usr/bin/perl -w our $VERSION = '@VERSION@ @DATE@'; # -# check_bos -- Monitor AFS bos output for problems in Nagios. +# check_afs_bos -- Monitor AFS bos output for problems in Nagios. +# +# Given an AFS server (file or VLDB), runs bos status on each one. Checks to +# see if there is a communication failure, and also checks to see if anything +# in the output looks unusual or wrong. If either of these conditions are +# true, print that information to STDOUT. Suitable for being run inside +# Nagios. # # Written by Russ Allbery # Based on an earlier script by Neil Crellin @@ -9,12 +15,6 @@ our $VERSION = '@VERSION@ @DATE@'; # # This program is free software; you may redistribute it and/or modify it # under the same terms as Perl itself. -# -# Given an AFS server (file or VLDB), runs bos status on each one. Checks to -# see if there is a communication failure, and also checks to see if anything -# in the output looks unusual or wrong. If either of these conditions are -# true, print that information to STDOUT. Suitable for being run inside -# Nagios. ############################################################################## # Modules and declarations @@ -86,7 +86,7 @@ if ($help) { exec ('perldoc', '-t', $0) or die "Cannot fork: $!\n"; } elsif ($version) { my $version = $VERSION; - print "check_bos $version\n"; + print "check_afs_bos $version\n"; exit 0; } syntax ("extra arguments on command line") if @ARGV; @@ -166,26 +166,26 @@ exit 0; =head1 NAME -check_bos - Monitor AFS bos output for problems in Nagios +check_afs_bos - Monitor AFS bos output for problems in Nagios =head1 SYNOPSIS -B [B<-hV>] [B<-t> I] B<-H> I +B [B<-hV>] [B<-t> I] B<-H> I =head1 DESCRIPTION -B is a Nagios plugin for querying the AFS bosserver for process -status and reporting an alert if there are any unexpected lines in the bos -output. The acceptable lines of output from B are configured at the -top of this script; they should be generally suitable for most sites, but -may require some customization. - -B will always print out a single line of output. If there is a -line that isn't matched by any regexes identifying acceptable lines, it -will output the first non-matching line prefixed by C. If -the salvager is running (such as when started by C) or other -warnings are found, it will print that warning information prefixed by -C. Otherwise, it will output C. Note that this +B is a Nagios plugin for querying the AFS bosserver for +process status and reporting an alert if there are any unexpected lines in +the bos output. The acceptable lines of output from B are configured +at the top of this script; they should be generally suitable for most +sites, but may require some customization. + +B will always print out a single line of output. If there +is a line that isn't matched by any regexes identifying acceptable lines, +it will output the first non-matching line prefixed by C. +If the salvager is running (such as when started by C) or +other warnings are found, it will print that warning information prefixed +by C. Otherwise, it will output C. Note that this monitoring may not catch such things as a service being constantly restarted if it happens to be up and running normally each time the probe runs; it doesn't pay any attention to the last start time, the last error @@ -200,8 +200,8 @@ process. =item B<-H> I, B<--hostname>=I -The AFS server whose B status B should check. This option -is required. +The AFS server whose B status B should check. This +option is required. =item B<-h>, B<--help> @@ -215,17 +215,17 @@ seconds. =item B<-V>, B<--version> -Print out the version of B and quit. +Print out the version of B and quit. =back =head1 EXIT STATUS -B follows the standard Nagios exit status requirements. This -means that it will exit with status 0 if there are no problems, with +B follows the standard Nagios exit status requirements. +This means that it will exit with status 0 if there are no problems, with status 1 if the salvager is running, or with status 2 if there is a -problem detected. For other errors, such as invalid syntax, B -will exit with status 3. +problem detected. For other errors, such as invalid syntax, +B will exit with status 3. =head1 BUGS diff --git a/check_rxdebug b/check_afs_rxdebug similarity index 85% rename from check_rxdebug rename to check_afs_rxdebug index 86d388e..6f55b4b 100755 --- a/check_rxdebug +++ b/check_afs_rxdebug @@ -1,7 +1,13 @@ #!/usr/bin/perl -w our $VERSION = '@VERSION@ @DATE@'; # -# check_rxdebug -- Nagios AFS server check for waiting connections. +# check_afs_rxdebug -- Nagios AFS server check for waiting connections. +# +# Expects a file server with the -H option and runs rxdebug against that file +# server, looking for any connections that are waiting for a thread. Exits +# with status 1 if there are more than two connections in that state (a +# warning) and with status 2 if there are more than eight connections in that +# state. The thresholds can be overridden from the command line. # # Written by Quanah Gibson-Mount based on work by Neil Crellin # Updated by Russ Allbery @@ -10,12 +16,6 @@ our $VERSION = '@VERSION@ @DATE@'; # # This program is free software; you may redistribute it and/or modify it # under the same terms as Perl itself. -# -# Expects a file server with the -H option and runs rxdebug against that file -# server, looking for any connections that are waiting for a thread. Exits -# with status 1 if there are more than two connections in that state (a -# warning) and with status 2 if there are more than eight connections in that -# state. The thresholds can be overridden from the command line. ############################################################################## # Modules and declarations @@ -74,7 +74,7 @@ if ($help) { exec ('perldoc', '-t', $0) or die "Cannot fork: $!\n"; } elsif ($version) { my $version = $VERSION; - print "check_rxdebug $version\n"; + print "check_afs_rxdebug $version\n"; exit 0; } syntax ("extra arguments on command line") if @ARGV; @@ -131,26 +131,26 @@ if ($blocked >= $CRITICAL) { =head1 NAME -check_rxdebug - Check AFS servers for blocked connections in Nagios +check_afs_rxdebug - Check AFS servers for blocked connections in Nagios =head1 SYNOPSIS -B [B<-hV>] [B<-c> I] [B<-w> I] +B [B<-hV>] [B<-c> I] [B<-w> I] [B<-t> I] B<-H> I =head1 DESCRIPTION -B is a Nagios plugin for checking AFS file servers to see -if there are client connections waiting for a free thread. If there are -more than a few of these, AFS performance tends to be very slow; this is a -fairly reliable way to catch overloaded file servers. By default, -B returns a critical error if there are more than eight +B is a Nagios plugin for checking AFS file servers to +see if there are client connections waiting for a free thread. If there +are more than a few of these, AFS performance tends to be very slow; this +is a fairly reliable way to catch overloaded file servers. By default, +B returns a critical error if there are more than eight connections waiting for a free thread and a warning if there are more than two. These thresholds can be changed with the B<-c> and B<-w> options. -B will always print out a single line of output including -the number of blocked connections, displaying whether this is critical, a -warning, or okay. +B will always print out a single line of output +including the number of blocked connections, displaying whether this is +critical, a warning, or okay. =head1 OPTIONS @@ -163,8 +163,8 @@ which should be an integer. The default is 8. =item B<-H> I, B<--hostname>=I -The AFS file server whose connections B should check. This -option is required. +The AFS file server whose connections B should check. +This option is required. =item B<-h>, B<--help> @@ -178,7 +178,7 @@ seconds. =item B<-V>, B<--version> -Print out the version of B and quit. +Print out the version of B and quit. =item B<-w> I, B<--warning>=I @@ -189,11 +189,11 @@ should be an integer. The default is 2. =head1 EXIT STATUS -B follows the standard Nagios exit status requirements. +B follows the standard Nagios exit status requirements. This means that it will exit with status 0 if there are no problems, with status 1 if there is a warning, and with status 2 if there is a critical -problem. For other errors, such as invalid syntax, B will -exit with status 3. +problem. For other errors, such as invalid syntax, B +will exit with status 3. =head1 BUGS diff --git a/check_afsspace b/check_afs_space similarity index 93% rename from check_afsspace rename to check_afs_space index 091cf50..1b602aa 100755 --- a/check_afsspace +++ b/check_afs_space @@ -1,7 +1,12 @@ #!/usr/bin/perl -w our $VERSION = '@VERSION@ @DATE@'; # -# check_afsspace -- Monitor AFS disk space usage under Nagios. +# check_afs_space -- Monitor AFS disk space usage under Nagios. +# +# Expects a host with the -H option and checks the partition usage with vos +# partinfo. Exits with status 1 if the free space is below a warning +# percentage and with status 2 if the free space is above a critical +# percentage (this works with the Nagios check architecture). # # Written by Susan Feng # Updated by Russ Allbery @@ -9,11 +14,6 @@ our $VERSION = '@VERSION@ @DATE@'; # # This program is free software; you may redistribute it and/or modify it # under the same terms as Perl itself. -# -# Expects a host with the -H option and checks the partition usage with vos -# partinfo. Exits with status 1 if the free space is below a warning -# percentage and with status 2 if the free space is above a critical -# percentage (this works with the Nagios check architecture). ############################################################################## # Modules and declarations @@ -81,7 +81,7 @@ if ($help) { exec ('perldoc', '-t', $0) or die "Cannot fork: $!\n"; } elsif ($version) { my $version = $VERSION; - print "check_afsspace $version\n"; + print "check_afs_space $version\n"; exit 0; } syntax ("extra arguments on command line") if @ARGV; @@ -164,27 +164,27 @@ if (@critical) { =head1 NAME -check_afsspace - Monitor AFS disk space usage under Nagios +check_afs_space - Monitor AFS disk space usage under Nagios =head1 SYNOPSIS -B [B<-hV>] [B<-c> I] [B<-w> I] +B [B<-hV>] [B<-c> I] [B<-w> I] [B<-p> I] [B<-t> I] B<-H> I =head1 DESCRIPTION -B is a Nagios plugin for checking free space on AFS server +B is a Nagios plugin for checking free space on AFS server partitions. It uses C to obtain the free space on the partitions on an AFS server and will return an alert if the percentage of used space exceeds a threshold. By default, it returns a critical error if the used space is over 90% and a warning if it is over 85% (changaable with the B<-c> and B<-w> options). -If C doesn't return within the timeout, B +If C doesn't return within the timeout, B will return a critical error. The default timeout is 300 seconds, changeable with the B<-t> option. -B will always print out a single line of output, giving +B will always print out a single line of output, giving the critical errors if any, otherwise giving the warnings if any, otherwise listing in an abbreviated form the percentage free space for all partitions. @@ -204,7 +204,7 @@ an integer percentage. The default is 90. =item B<-H> I, B<--hostname>=I -The AFS file server whose free space B should check. This +The AFS file server whose free space B should check. This option is required. =item B<-h>, B<--help> @@ -227,7 +227,7 @@ is 300 seconds. =item B<-V>, B<--version> -Print out the version of B and quit. +Print out the version of B and quit. =item B<-w> I, B<--warning>=I @@ -238,11 +238,11 @@ an integer percentage. The default is 85. =head1 EXIT STATUS -B follows the standard Nagios exit status requirements. +B follows the standard Nagios exit status requirements. This means that it will exit with status 0 if there are no problems, with status 2 if there is at least one critical partition for that server, and with status 1 if there are no critical partitions but at least one warning -partition. For other errors, such as invalid syntax, B +partition. For other errors, such as invalid syntax, B will exit with status 3. =head1 BUGS diff --git a/check_udebug b/check_afs_udebug similarity index 87% rename from check_udebug rename to check_afs_udebug index c8df336..a48ba76 100755 --- a/check_udebug +++ b/check_afs_udebug @@ -1,18 +1,18 @@ #!/usr/bin/perl -w our $VERSION = '@VERSION@ @DATE@'; # -# check_udebug -- Check AFS database servers using udebug for Nagios. +# check_afs_udebug -- Check AFS database servers using udebug for Nagios. +# +# Takes a hostname and a port number and checks the udebug output for that +# host and port. Reports an error if the recovery state is not 1f on the sync +# site (ensuring that it considers all of the other servers up-to-date) or if +# any of the servers don't believe there is a sync site. # # Written by Russ Allbery # Copyright 2004, 2010 Board of Trustees, Leland Stanford Jr. University # # This program is free software; you may redistribute it and/or modify it # under the same terms as Perl itself. -# -# Takes a hostname and a port number and checks the udebug output for that -# host and port. Reports an error if the recovery state is not 1f on the sync -# site (ensuring that it considers all of the other servers up-to-date) or if -# any of the servers don't believe there is a sync site. ############################################################################## # Modules and declarations @@ -63,7 +63,7 @@ if ($help) { exec ('perldoc', '-t', $0) or die "Cannot fork: $!\n"; } elsif ($version) { my $version = $VERSION; - print "check_udebug $version\n"; + print "check_afs_udebug $version\n"; exit 0; } syntax ("extra arguments on command line") if @ARGV; @@ -118,16 +118,16 @@ if ($issync && !$recovery) { =head1 NAME -check_udebug - Check AFS servers for blocked connections in Nagios +check_afs_udebug - Check AFS servers for blocked connections in Nagios =head1 SYNOPSIS -B [B<-hV>] [B<-t> I] B<-H> I B<-p> I +B [B<-hV>] [B<-t> I] B<-H> I B<-p> I =head1 DESCRIPTION -B is a Nagios plugin for checking AFS database servers to -make sure the Ubik replication between the database servers is running +B is a Nagios plugin for checking AFS database servers +to make sure the Ubik replication between the database servers is running correctly. B is used to connect to the specified port on the specified server. The port should generally be one of 7002 (ptserver), 7003 (vlserver), or 7004 (kaserver). The resulting output is checked to @@ -135,9 +135,9 @@ make sure that the recovery state is 1f if that server is the sync site, or that a sync site is known if that server doesn't claim to be the sync site. -B will always print out a single line of output. That line -will be C if everything is fine, or C followed -by an error message otherwise. +B will always print out a single line of output. That +line will be C if everything is fine, or C +followed by an error message otherwise. =head1 OPTIONS @@ -145,8 +145,8 @@ by an error message otherwise. =item B<-H> I, B<--hostname>=I -The AFS database server whose Ubik status B should check. -This option is required. +The AFS database server whose Ubik status B should +check. This option is required. =item B<-h>, B<--help> @@ -166,16 +166,16 @@ seconds. =item B<-V>, B<--version> -Print out the version of B and quit. +Print out the version of B and quit. =back =head1 EXIT STATUS -B follows the standard Nagios exit status requirements. +B follows the standard Nagios exit status requirements. This means that it will exit with status 0 if there are no problems or with status 2 if there are critical problems. For other errors, such as -invalid syntax, B will exit with status 3. +invalid syntax, B will exit with status 3. =head1 BUGS -- 2.39.5