Nagios + Cacti 其实在易用性上是比不上zabbix的,但是对于仅仅需要报警而无需图表的服务监控,nagios 的确比较好,之前由于IDC迁移,就把之前老的那套nagios+cacti 环境重新部署了一次。
Nagios:
- 准备工作:
apt-get install autoconf gcc libc6 build-essential bc gawk dc gettext \
libmcrypt-dev libssl-dev make unzip apache2 apache2-utils php5 libgd2-xpm-dev
/usr/sbin/useradd -m -s /bin/bash nagios #创建用户
/usr/sbin/groupadd nagcmd #创建ganioscmd 用户,用于执行一些外部命令,比如nrpe
/usr/sbin/usermod -a -G nagcmd nagios
/usr/sbin/usermod -a -G nagcmd www-data
- 安装:
tar zxvf nagios-4.3.1.tar.gz
cd nagios-4.3.1.tar.gz
./configure --prefix=/opt/nagios --with-command-group=nagcmd --with-httpd-conf=/etc/apache2/sites-enabled
make all
make install
make install-init
make install-config
make install-commandmode
update-rc.d nagios defaults #初始化各种配置以及增加开启启动
- nagios目录:
root@10.1.1.208:nagios# ls
bin etc libexec log sbin share var
其中nagios主要配置文件在etc 下,而插件主要则放在libexec下。
- 配置nagios:
公司的nagios 主要用来监控一些服务器的硬件状态,比如磁盘是否完好等等,而且均通过nrpe的方式进行监控,用于减少本地服务器负担。nagios的配置为分布式的,可以根据需要将多个配置注册在总的nagios.cfg 配置里。
# You can specify individual object config files as shown below:
cfg_file=/opt/nagios/etc/objects/commands.cfg
cfg_file=/opt/nagios/etc/objects/contacts.cfg
cfg_file=/opt/nagios/etc/objects/timeperiods.cfg
cfg_file=/opt/nagios/etc/objects/templates.cfg
#
cfg_file=/opt/nagios/etc/objects/service.cfg
cfg_file=/opt/nagios/etc/objects/group.cfg
# Definitions for monitoring the local (Linux) host
#cfg_file=/opt/nagios/etc/objects/localhost.cfg
cfg_file=/opt/nagios/etc/objects/host_debian.cfg
cfg_file=/opt/nagios/etc/objects/host_centos.cfg
然后对应编辑目录就行了,假设我要添加一台linux 服务器,用于监控硬盘信息,需要如下步骤:
1 .修改commands.cfg 配置,增加对应command:
# check hardware Disk
define command{
command_name check_storage_disk_nrpe
command_line /opt/nagios/libexec/check_storage_disk_nrpe $HOSTADDRESS$ check_storage_disk
}
libexec下放对应的脚本,大致意思就是nagios远程机器执行check_storage_disk 模块,而check_storage_disk 就是远程机器的一个监控脚本。
#!/bin/bash
PLUGINS=/opt/nagios/libexec
CHECK_NRPE=$PLUGINS/check_nrpe
host=$1
comm=$2
if [ $# -lt 2 ];then
echo "Usage: $0 host command"
exit 2
fi
#command_line $USER1$/check_snmp_traffic $HOSTADDRESS$ public 3 " > 80 " " > 90 "
res=`$CHECK_NRPE -H$host -n -p57000 -c $comm`
if [ $? -ne 0 ];then
if [ "CHECK_NRPE: Socket timeout after 10 seconds." == ${res} ];then
echo "connect failed"
exit 0
else
echo "Check Storage UNKNOWN"
exit 3
fi
fi
if [ "${res}" == "Storage Disk Normal" ];then
echo "Check Storage OK"
exit 0
else
echo "${res}"
exit 2
fi
echo $res
exit $EXIT
nrpe 插件可以在nagios.org里下载。
然后将该服务注册到service.cfg 中:
define service{
use local-service
hostgroup_name debian_servers
service_description hardware_disk_check
check_command check_storage_disk_nrpe
}
然后创建host 配置以及host group 配置:
define hostgroup{
hostgroup_name debian_servers
alias servers
members test
}
define host{
use linux-server
host_name test
alias 01
address 192.168.1.1
}
nagios 登录是通过apache htpass 做验证的,比较简单,修改对应的cgi的密码就行。修改nagios登录用户需要修改apache的htpasswd之外,还需要修改cgi.cfg 里的用户认证。
然后检查nagios 配置:
/opt/nagios/bin/nagios -v /opt/nagios/etc/nagios.cfg
然后启动nagios
nagios 编译安装默认没有在init下有启动服务的脚本:
这里贴一个:
#!/bin/sh
#
# chkconfig: 345 99 01
# description: Nagios network monitor
#
# File : nagios
#
# Author : Jorge Sanchez Aymar (jsanchez@lanchile.cl)
#
# Changelog :
#
# 1999-07-09 Karl DeBisschop <kdebisschop@infoplease.com>
# - setup for autoconf
# - add reload function
# 1999-08-06 Ethan Galstad <egalstad@nagios.org>
# - Added configuration info for use with RedHat's chkconfig tool
# per Fran Boon's suggestion
# 1999-08-13 Jim Popovitch <jimpop@rocketship.com>
# - added variable for nagios/var directory
# - cd into nagios/var directory before creating tmp files on startup
# 1999-08-16 Ethan Galstad <egalstad@nagios.org>
# - Added test for rc.d directory as suggested by Karl DeBisschop
# 2000-07-23 Karl DeBisschop <kdebisschop@users.sourceforge.net>
# - Clean out redhat macros and other dependencies
# 2003-01-11 Ethan Galstad <egalstad@nagios.org>
# - Updated su syntax (Gary Miller)
#
# Description: Starts and stops the Nagios monitor
# used to provide network services status.
#
status_nagios ()
{
if test -x $NagiosCGI/daemonchk.cgi; then
if $NagiosCGI/daemonchk.cgi -l $NagiosRunFile; then
return 0
else
return 1
fi
else
if ps -p $NagiosPID > /dev/null 2>&1; then
return 0
else
return 1
fi
fi
return 1
}
printstatus_nagios()
{
if status_nagios $1 $2; then
echo "nagios (pid $NagiosPID) is running..."
else
echo "nagios is not running"
fi
}
killproc_nagios ()
{
kill $2 $NagiosPID
}
pid_nagios ()
{
if test ! -f $NagiosRunFile; then
echo "No lock file found in $NagiosRunFile"
exit 1
fi
NagiosPID=`head -n 1 $NagiosRunFile`
}
# Source function library
# Solaris doesn't have an rc.d directory, so do a test first
if [ -f /etc/rc.d/init.d/functions ]; then
. /etc/rc.d/init.d/functions
elif [ -f /etc/init.d/functions ]; then
. /etc/init.d/functions
fi
prefix=/opt/nagios
exec_prefix=${prefix}
NagiosBin=${exec_prefix}/bin/nagios
NagiosCfgFile=${prefix}/etc/nagios.cfg
NagiosStatusFile=${prefix}/var/status.dat
NagiosRetentionFile=${prefix}/var/retention.dat
NagiosCommandFile=${prefix}/var/rw/nagios.cmd
NagiosVarDir=${prefix}/var
NagiosRunFile=${prefix}/var/nagios.lock
NagiosLockDir=/var/lock/subsys
NagiosLockFile=nagios
NagiosCGIDir=${exec_prefix}/sbin
NagiosUser=nagios
NagiosGroup=nagios
# Check that nagios exists.
if [ ! -f $NagiosBin ]; then
echo "Executable file $NagiosBin not found. Exiting."
exit 1
fi
# Check that nagios.cfg exists.
if [ ! -f $NagiosCfgFile ]; then
echo "Configuration file $NagiosCfgFile not found. Exiting."
exit 1
fi
# See how we were called.
case "$1" in
start)
echo -n "Starting nagios:"
$NagiosBin -v $NagiosCfgFile > /dev/null 2>&1;
if [ $? -eq 0 ]; then
su - $NagiosUser -c "touch $NagiosVarDir/nagios.log $NagiosRetentionFile"
rm -f $NagiosCommandFile
touch $NagiosRunFile
chown $NagiosUser:$NagiosGroup $NagiosRunFile
$NagiosBin -d $NagiosCfgFile
if [ -d $NagiosLockDir ]; then touch $NagiosLockDir/$NagiosLockFile; fi
echo " done."
exit 0
else
echo "CONFIG ERROR! Start aborted. Check your Nagios configuration."
exit 1
fi
;;
stop)
echo -n "Stopping nagios: "
pid_nagios
killproc_nagios nagios
# now we have to wait for nagios to exit and remove its
# own NagiosRunFile, otherwise a following "start" could
# happen, and then the exiting nagios will remove the
# new NagiosRunFile, allowing multiple nagios daemons
# to (sooner or later) run - John Sellens
#echo -n 'Waiting for nagios to exit .'
for i in 1 2 3 4 5 6 7 8 9 10 ; do
if status_nagios > /dev/null; then
echo -n '.'
sleep 1
else
break
fi
done
if status_nagios > /dev/null; then
echo ''
echo 'Warning - nagios did not exit in a timely manner'
else
echo 'done.'
fi
rm -f $NagiosStatusFile $NagiosRunFile $NagiosLockDir/$NagiosLockFile $NagiosCommandFile
;;
status)
pid_nagios
printstatus_nagios nagios
;;
checkconfig)
printf "Running configuration check..."
$NagiosBin -v $NagiosCfgFile > /dev/null 2>&1;
if [ $? -eq 0 ]; then
echo " OK."
else
echo " CONFIG ERROR! Check your Nagios configuration."
exit 1
fi
;;
restart)
printf "Running configuration check..."
$NagiosBin -v $NagiosCfgFile > /dev/null 2>&1;
if [ $? -eq 0 ]; then
echo "done."
$0 stop
$0 start
else
echo " CONFIG ERROR! Restart aborted. Check your Nagios configuration."
exit 1
fi
;;
reload|force-reload)
printf "Running configuration check..."
$NagiosBin -v $NagiosCfgFile > /dev/null 2>&1;
if [ $? -eq 0 ]; then
echo "done."
if test ! -f $NagiosRunFile; then
$0 start
else
pid_nagios
if status_nagios > /dev/null; then
printf "Reloading nagios configuration..."
killproc_nagios nagios -HUP
echo "done"
else
$0 stop
$0 start
fi
fi
else
echo " CONFIG ERROR! Reload aborted. Check your Nagios configuration."
exit 1
fi
;;
*)
echo "Usage: nagios {start|stop|restart|reload|force-reload|status|checkconfig}"
exit 1
;;
esac
# End of this script
然后登录检查即可。
cacti
cacti 用于监控出图,其实nagios 可以通过pnp4nagios 进行出图,就是体验不是太好,cacti 用于定制化监控图表还是很不错的,虽然大家用的都是rrdtool。
- 准备
apt-get install rrdtool php5 mysql-server
其实php5不止要装那么点包,这个之后再说。
下载cacti 后解压进入目录,登录mysql 导入cacti 对应数据表:
mysql> create database cacti;
mysql>use cacti;
Query OK, 1 row affected (0.00 sec)
mysql> source cacti.sql;
mysql> GRANT ALL PRIVILEGES ON cacti.* TO 'cacti'@'127.0.0.1' IDENTIFIED BY 'cacti';
修改配置文件:
vi include/config.php
$database_type = 'mysql';
$database_default = 'cacti';
$database_hostname = '127.0.0.1';
$database_username = 'cacti';
$database_password = 'cacti';
$database_port = '3306';
$database_ssl = false;
之后登录ip/cacti 后会出现安装配置界面:
默认用户admin 密码admin
这里会提示缺少哪些包,装上即可:
新版本的cacti 有个问题在于mysql 是时区权限。就是上图那个报错,需要修复一下:
mysql> GRANT SELECT ON mysql.time_zone_name TO cacti@'127.0.0.1';
mysql_tzinfo_to_sql /usr/share/zoneinfo/ | mysql -u root -p mysql
之后next 变安装完成。
之后就配置snmp 进行监控和出图啦。