psmon - Process Table Monitoring Script
$Id: psmon,v 1.39 2005/12/30 13:26:23 nicolaw Exp $
Syntax: psmon [--help] [--version] [--dryrun] [--daemon] [--cron]
[--conf=filename] [--user=user] [--nouser]
[--adminemail=emailaddress] [--verbose]
--help Display this help
--version Display full version information
--dryrun Dry run (do not actually kill or spawn any processes)
--daemon Spawn in to background daemon
--cron Disables 'already running' errors with the --daemon option
--conf=str Specify alternative config filename
--user=str Only scan the process table for processes running as str
--nouser Force scanning for all users when not run as superuser
--adminemail=str Force all notification emails to be sent to str
--verbose Output more verbose information
Single user account crontab operation:
MAILTO="nicolaw@cpan.org"
HOME=/home/nicolaw
USER=nicolaw
*/5 * * * * psmon --daemon --cron --conf=$HOME/etc/psmon.conf --user=$USER --adminemail=$MAILTO
Regular system-wide call from cron every 10 minutes to ensure that psmon is still running as a daemon:
0,10,20,30,40,50 * * * * psmon --daemon --cron
Only check processes during working office hours:
* 9-17 * * * psmon
This script monitors the process table using Proc::ProcessTable, and
will respawn or kill processes based on a set of rules defined in an
Apache style configuration file.
Processes will be respawned if a spawn command is defined for a process,
and no occurrences of that process are running. If the --user command line
option is specified, then the process will only be spawned if no instances
are running as the specified userid.
Processes can be killed off if they have been running for too long,
use too much CPU or memory resources, or have too many concurrent
versions running. Exceptions can be made to kill rulesets using the
PIDFile and LastSafePID directives.
If a PID file is declared for a process, psmon will never kill the
process ID that is contained within the pid file. This is useful if for
example, you have a script which spawns hundreds of child processes
which you may need to automatically kill, but you do not want to kill
the parent process.
Any actions performed will be logged to the DAEMON syslog facility by default.
There is support to optionally also send notifications emails to an
administrator on a global or pre-rule basis.
- --help
-
Display this help.
- --version
-
Display full version information.
- --dryrun
-
Execute a dry-run (do not actually kill or spawn and processes).
- --daemon
-
Spawn in to background daemon.
- --cron
-
Disables already running warnings when trying to launch as another daemon.
- --conf=filename
-
Specify alternative config filename. The configuration file defaults
to /etc/psmon.conf when running as superuser, or ~/etc/psmon.conf when
running as a non-superuser.
- --user=user
-
Only scan the process table for processes running under this username.
- --nouser
-
Force scanning for all users when not run as superuser. By default psmon
will only scan processes belonging to the current user for non-superusers.
- --adminemail=emailaddress
-
Force all notification emails to be sent to this email address. This
option will override all AdminEmail directives within the configuration
file.
- --verbose
-
Output more verbose information.
In addition to Perl 5.005_03 or higher, the following Perl modules are
required:
Proc::ProcessTable
Config::General
Getopt::Long
POSIX
File::Basename
These two additional modules are not required, but will provide enhanced
functionality if present.
Net::SMTP
Unix::Syslog
The POSIX module is usually supplied with Perl as standard, as is
File::Basename. All these modules can be obtained from CPAN.
Visit http://search.span.org and http://www.cpan.org for further details.
For the lazy people reading this, you can try the following command to
install these modules:
for m in Config::General Proc::ProcessTable Net::SMTP \
Unix::Syslog Getopt::Long; do perl -MCPAN -e"install $m";done
Alternatively you can run the install.sh script which comes in the
distribution tarball. It will attempt to install the right modules,
install the script and configuration file, and generate UNIX man page
documentation.
By default psmon will look for its runtime configuration in /etc/psmon.conf,
although this can be defined as otherwise from the command line. For system
wide installations it is recommended that you install your psmon in to the
default location.
The default configuration file location is /etc/psmon.conf. A different
configuration file can be declared from the command line. You will find
an example configuration file supplied in the etc/ directory of the
distribution tarball. It is recommended that you use this as a guide to
writing your own configuration file by hand. Alternatively you can use
the psmon-config script which will interactively create a configuration
for you.
Syntax of the configuration file is based upon that which is used by
Apache. Each process to be monitored is declared with a Process scope
directive like this example which monitors the OpenSSH daemon:
<Process sshd>
spawncmd /sbin/service sshd start
pidfile /var/run/sshd.pid
instances 50
pctcpu 90
</Process>
There is a special * process scope which applies to all running
processes. This special scope should be used with extreme care. It does
not support the use of the SpawnCMD, PIDFile, Instances or TTL
directives. A typical example of this scope might be as follows:
<Process *>
pctcpu 95
pctmem 80
</Process>
Global directives which are not specific to any one process should be placed
outside of any Process scopes.
Configuration directives are not case sensitive, but the values that they
define are.
- AdminEmail
-
Defines the email address where notification emails should be sent to.
May be also be used in a process scope which will take priority over a
global declaration. Defaults to root@localhost.
- DefaultEmailMethod
-
Defines which method should be used by default to try and send notification
emails. Legal values are 'SMTP' or 'sendmail'. Defaults to 'sendmail'.
- Dryrun
-
Forces psmon to act in the same way as if the --dryrun command line switch
had specified. This is useful if you want to force a specific configuration
file to only report and never actually take any automated action.
- Facility
-
Defines which syslog facility to log to. Valid options are as follows;
LOG_KERN, LOG_USER, LOG_MAIL, LOG_DAEMON, LOG_AUTH, LOG_SYSLOG, LOG_LPR,
LOG_NEWS, LOG_UUCP, LOG_CRON, LOG_LOCAL0, LOG_LOCAL1, LOG_LOCAL2,
LOG_LOCAL3, LOG_LOCAL4, LOG_LOCAL5, LOG_LOCAL6 and LOG_LOCAL7. This
functionality requires the Unix::Syslog module. Defaults to LOG_DAEMON.
- Frequency
-
Defines the frequency of process table queries. Defaults to 60 seconds.
- KillLogLevel (previously KillPIDLogLevel)
-
The same as the loglevel directive, but only applies to process kill actions.
Takes priority over the loglevel directive. May be also be used in a
Process scope which will take priority over a global declaration.
Undefined by default.
- LastSafePID
-
When defined, psmon will never attempt to kill a process ID which is
numerically less than or equal to the value defined by lastsafepid. It
should be noted that psmon will never attempt to kill itself, or a process ID
less than or equal to 1. Defaults to 100.
- LogLevel
-
Defines the loglevel priority that notifications to syslog will be
marked as. Valid options are as follows; LOG_EMERG, LOG_ALERT, LOG_CRIT,
LOG_ERR, LOG_WARNING, LOG_NOTICE, LOG_INFO and LOG_DEBUG. The log level
used by a notification for any failed action will automatically be
raised to the next level in order to highlight the failure. May be also be used
in a Process scope which will take priority over a global declaration. This
functionality requires the Unix::Syslog module. Defaults to LOG_NOTICE.
- NeverKillPID
-
Accepts a space delimited list of PIDs which will never be killed.
Defaults to 1.
- NeverKillProcessName
-
Accepts a space delimited list of process names which will never be
killed. Defaults to 'devfsadmd kswapd kupdated mdrecoveryd pageout sched init fsflush'.
- NotifyEmailFrom
-
Defines the email address that notification email should be addresses
from. Defaults to <username>@hostname.
- SendmailCmd
-
Defines the sendmail command to use to send notification emails if there
is a failure with the SMTP connection to the host defined by SMTPHost.
Defaults to '/lib/sendmail -t' or '/usr/sbin/sendmail -t'.
- SMTPHost
-
Defines the IP address or hostname of the SMTP server to used to send
email notifications. This functionality requires the Net::SMTP module.
Defaults to localhost.
- SMTPTimeout
-
Defines the timeout in seconds to be used during SMTP connections. This
functionality requires the Net::SMTP module. Defaults to 20 seconds.
- SpawnLogLevel
-
The same as the loglevel directive, but only applies to process spawn actions.
Takes priority over the loglevel directive. May be also be used in a
Process scope which will take priority over a global declaration.
Undefined by default.
- ProtectSafePIDsQuietly
-
Accepts a boolean value of On or Off. Suppresses all notifications of
preserved process IDs when used in conjunction with the LastSafePID
directive. Defaults to Off.
- AdminEmail
-
Defines the email address where notification emails should be sent to.
Takes priority within the process scope over the global AdminEmail directive,
but not over the AdminEmail command line option.
- Instances
-
Defines a maximum number of instances of a process which may run. The
process will be killed once there are more than this number of occurrences
running, and its process ID isn't contained in the defined pid file.
- KillCmd
-
Defines the full command line to be executed in order to gracefully
shutdown or kill a rogue process. If the command returns a boolean true
exit status then it is assumed that the command failed to execute
successfully. If no KillCmd is specified or the command fails, the
process will be killed by sending a SIGKILL signal with the standard
kill() function. Undefined by default.
- NoEmail
-
Accepts a boolean value of True or False. Supresses all notification
emails for this process scope. Defaults to False.
- NoEmailOnKill
-
Accepts a boolean value of True or False. Supresses process killing
notification emails for this process scope. Defaults to False.
- NoEmailOnSpawn
-
Accepts a boolean value of True or False. Supresses process spawning
notification emails for this process scope. Defaults to False.
- PctCpu
-
Defines a maximum allowable percentage of CPU time a process may use.
The process will be killed once its CPU usage exceeds this threshold
and its process ID isn't contained in the defined pidfile.
- PctMem
-
Defines a maximum allowable percentage of total system memory a process
may use. The process will be killed once its memory usage exceeds this
threshold and its process ID isn't contained in the defined pidfile.
- PIDFile
-
Defines the full path and filename of a file created by a process which
contain its main parent process ID. Psmon will not kill the PID number
which is contained within the PIDFile.
- SpawnCmd
-
Defines the full command line to be executed in order to respawn a dead
process.
- TTL
-
Defines a maximum time to live (in seconds) of a process. The process
will be killed once it has been running longer than this value, and
its process ID isn't contained in the defined pidfile.
<Process syslogd>
spawncmd /sbin/service syslogd restart
pidfile /var/run/syslogd.pid
instances 1
pctcpu 70
pctmem 30
</Process>
Syslog is a good example of a process which can get a little full
of itself under certain circumstances, and excessively hog CPU and
memory. Here we will kill off syslogd processes if it exceeds 70%
CPU or 30% memory utilization.
Older running copies of syslogd will be killed if they are running,
while leaving the most recently spawned copy which will be listed in
the PID file defined.
<Process httpd>
spawncmd /sbin/service httpd restart
pidfile /var/run/httpd.pid
loglevel LOG_CRIT
adminemail pager@noc.company.com
</Process>
Here we are monitoring Apache to ensure that it is restarted if
it dies. The pidfile directive in this example is actually
redundant because we have not defined any rule where we should
consider killing any httpd processes.
All notifications relating to this process will be logged with the
syslog priority of critical (LOG_CRIT), and all emailed to
pager@noc.company.com which could typically forward to a pager.
Any failed attempts to kill or restart a process will automatically
be logged as a syslog priority one level higher than that specified.
If a restart of Apache were to fail in this example, a wall
notification would be broadcast to all interactive terminals
connected to the machine, since the next log priority up from
LOG_CRIT is LOG_EMERG.
Note that the functionality to log information to syslog requires
the Unix::Syslog module. In the event that Unix::Syslog is not
installed, PSMon will write all status messages that would have
been destined for syslog, to STDERR instead.
<Process find>
noemail True
ttl 3600
</Process>
Kill old find processes which have been running for over an hour.
Do not send an email notification since it's not too important.
- HUP
-
Forces an immediate reload of the configuration file. You should
send the HUP signal when you are running psmon as a background
daemon and have altered the psmon.conf file.
- USR1
-
Forces an immediate scan of the process table.
- Value 0: Exited gracefully
-
The program exited gracefully.
- Value 2: Failure to lookup UID for username
-
The username specified by the --user command line option did not resolve to a valid
UID.
- Value 3: Configuration file is disabled
-
The configuration file is disabled. (It contains an active 'Disabled' directive).
- Value 4: Configuration file does not exist
-
The specified configuration file, (default or user specified) does not exist.
- Value 5: Unable to open PID file handle
-
Failed to open a read-only file handle for the runtime PID file.
- Value 6: Failed to fork
-
An error occurred while attempting to fork the child background daemon process.
- Value 7: Unable to open PID file handle
-
Failed to open a write file handle for the runtime PID file.
psmon is not especially fast. Much of its time is spent reading the process table.
If the process table is particularly large this can take a number of seconds.
Although is rarely a major problem on todays speedy machines, I have run a few tests
so you take look at the times and decide if you can afford the wait.
Approximate figures from release 1.0.3:
CPU OS Open Files/Procs 1m Load Real Time
PIII 1.1G Mandrake 9.0 10148 / 267 0.01 0m0.430s
PIII 1.2G Mandrake 9.0 16714 / 304 0.44 0m0.640s
Celeron 500 Red Hat 6.1 1780 / 81 1.27 0m0.880s
PII 450 Red Hat 6.0 300 / 23 0.01 0m1.050s
2x Xeon 1.8G Mandrake 9.0 90530 / 750 0.38 0m1.130s
Celeron 500 Red Hat 6.1 1517 / 77 1.00 0m1.450s
PIII 866 Red Hat 8.0 3769 / 76 0.63 0m1.662s
PIII 750 Red Hat 6.2 754 / 35 3.50 0m2.170s
These production machines were running the latest patched stock distribution kernels.
I have listed the total number of open file descriptors, processes running and 1 minute
load average to give you a slightly better context of the performance.
Approximate figures from release 1.17:
CPU OS 1m Load CPU Time
UltraSPARC-IIe 500Mhz SunOS 5.9 0.10 0m0.550s
Athlon XP 2400+ 2Ghz RHEL 3.0 1.00 0m0.150s
This information is not intended for the casual user of this software. It is
here as a very rough guide to benefit anybody who wishes to modify psmon for
their own specific requirements.
- check_processtable()
-
Reads the current process table, checks and then executes any appropriate
action to be taken. Does not accept any parameters.
- slay_process()
-
Attempts to kill a process with its killcmd, or failing that using the kill() function.
Accepts the process name, syslog log level, email notification to address and a reference
to the %slay hash.
- slurp_tmplog()
-
Slurps up the contents of a temporary log file and returns it as a chomped
array after unlinking the temporary log file. This uses a rather nasty way to
slurp in a file and will be changed in the future.
- print_init_style()
-
Prints a Red Hat sysvinit style status message. Accepts an array of messages
to display in sequence.
- spawn_process()
-
Attempts to spawn a process. Accepts the process name, syslog log level, mail
notification to address and spawn command.
- isnumeric()
-
An evil bastard fudge to ensure that we're only dealing with numerics when
necessary, from the config file and Proc::ProcessTable scan.
- daemonize()
-
Launches the process in to the background. Checks to see if there is already an
instance running.
- display_version()
-
Displays complete version, author and license information.
- TRACE()
-
Prints trace information to STDOUT if the DEBUG constant has been set to
boolean true. The DEBUG constant is set to boolean true in the event that
the environment variable PSMon_DEBUG is also set to boolean true.
- DUMP()
-
See TRACE().
- new()
-
Creates a new PSMon::Config object.
- pid_file()
-
Returns the name of the PID filename which should be used for this
particular invocation of the script.
- config()
-
Returns a configuration value when passed a key, or returns the
configuration complex data structure when not passed a key.
- command_line()
-
Returns a command line value when passed a key, or returns the
command line complex data structure when not passed a key.
- parse_command_line()
-
Parses the command line auguments and stores them for future use.
- read_config()
-
Reads the configuration file, performing basic validation and default
assumptions.
- _isnumeric()
-
An evil bastard fudge to ensure that we're only dealing with numerics when
necessary. This is a private subroutine and not a method.
- new()
-
Creates a new PSMon::Logging object.
- openlog()
-
Opens a connection to syslog using Unix::Syslog.
- closelog()
-
Closes a connection to syslog.
- loglevel()
-
Accepts a syslog loglevel keyword and returns the associated constant integer.
- logfacility()
-
Accepts a syslog facility keyword and returns the associated constant integer.
- alert()
-
Logs a message to syslog using Log() and sends a notification email using
sendmail().
- Log()
-
Logs messages to DAEMON facility in syslog. Accepts a log
level and message array. Will terminate the process if it is
asked to log a message of a log level 2 or less (LOG_EMERG,
LOG_ALERT, LOG_CRIT).
- sendmail()
-
Sends email notifications of syslog messages, called by alert().
Accepts sending email address, recipient email address, short
message subject and an optional detailed message body array.
- _sendmail_sendmail()
-
Called by sendmail(), sends an email using the sendmail command.
- _sendmail_smtp()
-
Called by sendmail(), sends an email using the Net::SMTP module.
The __DATA__ section of the PSMon code contains a stub version of the
Unix::Syslog module. It is automatically loaded in the event that the
real Unix::Syslog module is not present and/or cannot be loaded. This stub
module provides very basic functionality to output the messages generated
by the PSMon::Logging module to STDERR, instead of simply dropping them.
- _timestamp()
-
Retuns a timestamp string which closely resembles timestamps
used by syslog.
- syslog()
-
Outputs a syslog formatted and timestamped message to STDERR.
- openlog()
-
Stub.
- closelog()
-
Stub.
- setlogmask()
-
Stub.
- priorityname()
-
Stub.
- facilityname()
-
Stub.
Hopefully none. ;-) Send any bug reports to me at nicolaw@cpan.org
along with any patches and details of how to replicate the problem.
Please only send reports for bugs which can be replicated in the
latest version of the software. The latest version can always be
found at http://search.cpan.org/~nicolaw/
The following functionality will be added soon:
- Code cleanup
-
The code needs to be cleaned up and made more efficient. The bulk of the
code will be moved to a separate module, and psmon as you know it now will
become a much smaller and simpler wrapper script.
- Apply contributed patches
-
Users of psmon have sent me various patches for additional functionality.
These will be incorporated in to the next major release of psmon once the
code has been properly abstracted.
- killperprocessname directive
-
Will accept a boolean value. If true, only 1 process per process scope
will ever be killed, instead of all process IDs matching kill rules.
This should be used in conjunction with the new killcmd directive. For
example, you may define that a database daemon may never take up more
than 90% CPU time, and it runs many children processes. If it exceeds
90% CPU time, you want to issue ONE restart command in order to stop and
then start all the database processes in one go.
- time period limited rules
-
Functionality to limit validity of process scopes to only be checked
between defined time periods. For example, only check that httpd is running
between the hours of 8am and 5pm on Mondays and Tuesdays.
nsmon
Nicola Worthington <nicolaw@cpan.org>
http://www.psmon.com
Copyright 2002,2003,2004,2005 Nicola Worthington.
This software is licensed under The Apache Software License, Version 2.0.
http://www.apache.org/licenses/LICENSE-2.0