[TOC]
安装
说明:系统为centos6.8 安装zabbix3.0LTS版本,数据库使用postgresql,proxy的数据库用sqlite
架构
五台机器关系如下
agent -> proxy -> server -> db <- web
不是很准确,web端也会访问server
用vagrant建立五台机器
config.vm.define "zabbix_server" do |zabbix_server|
zabbix_server.vm.box = "centos6"
zabbix_server.vm.network "private_network", ip: "192.168.100.10"
zabbix_server.vm.hostname = "server.zabbix"
end
config.vm.define "zabbix_db" do |zabbix_db|
zabbix_db.vm.box = "centos6"
zabbix_db.vm.network "private_network", ip: "192.168.100.11"
zabbix_db.vm.hostname = "db.zabbix"
end
config.vm.define "zabbix_web" do |zabbix_web|
zabbix_web.vm.box = "centos6"
zabbix_web.vm.network "private_network", ip: "192.168.100.12"
zabbix_web.vm.hostname = "web.zabbix"
end
config.vm.define "zabbix_proxy" do |zabbix_proxy|
zabbix_proxy.vm.box = "centos6"
zabbix_proxy.vm.network "private_network", ip: "192.168.100.13"
zabbix_proxy.vm.hostname = "proxy.zabbix"
end
config.vm.define "zabbix_agent" do |zabbix_agent|
zabbix_agent.vm.box = "centos6"
zabbix_agent.vm.network "private_network", ip: "192.168.100.14"
zabbix_agent.vm.hostname = "jsp.site.cos"
end
#box 指定使用的镜像
#nework 指定ip
#hostname指定hostname
#运行 vagrant up zabbix_server zabbix_db zabbix_web zabbix_proxy zabbix_agent
#然后就有五台机器了
安装zabbix-db
yum install postgresql-server
[root@db ~]# service postgresql start
/var/lib/pgsql/data is missing. Use "service postgresql initdb" to initialize the cluster first.
[FAILED]
#启动就报错 按照提示运行
[root@db ~]# service postgresql initdb
Initializing database: [ OK ]
默认数据会建立在/var/lib/pgsql/data
cd /var/lib/pgsql/data
默认监听本地 修改为监听所有地址
vi postgresql.conf
#listen_addresses = 'localhost' -> listen_addresses = '*'
修改访问权限的文件 本地无密码访问,192.168.100.0/32段使用密码访问,其他地方无法访问
vi pg_hba.conf
# TYPE DATABASE USER CIDR-ADDRESS METHOD
# "local" is for Unix domain socket connections only
local all all trust
# IPv4 local connections:
host all all 192.168.100.0/32 md5
修改后重启服务器
service postgresql restart
创建zabbix的用户和库
sudo -u postgres createuser zabbix
sudo -u postgres createdb -O zabbix zabbix
测试一下
[root@db data]# psql -U zabbix
psql (8.4.20)
Type "help" for help.
zabbix=# \du
List of roles
Role name | Attributes | Member of
-----------+-------------+-----------
demo | Create DB | {}
postgres | Superuser | {}
: Create role
: Create DB
zabbix | Superuser | {}
: Create role
: Create DB
zabbix-# \q
修改zabbix的密码
[root@db data]# psql -U postgres
psql (8.4.20)
Type "help" for help.
postgres=# alter user zabbix with password 'zabbix';
ALTER ROLE
以上这个数据库就可以远程访问了
安装zabbix-web
yum install http://repo.zabbix.com/zabbix/3.0/rhel/6/x86_64/zabbix-release-3.0-1.el6.noarch.rpm
安装zabbix-web-pgsql版
yum install zabbix-web-pgsql
安装php
yum install httpd php php-gd php-bcmath php-pgsql php-xml php-mbstring
为zabbix-web添加一个虚拟主机
[root@web ~]# cat /etc/httpd/conf.d/zabbix.conf
<VirtualHost *:80>
DocumentRoot "/usr/share/zabbix"
ServerName 192.168.100.12
<Directory "/usr/share/zabbix">
Options Indexes FollowSymLinks Includes ExecCGI
AllowOverride All
Order allow,deny
Allow from all
</Directory>
</VirtualHost>
重启服务,果然不能访问,报错如下
[Sat Jan 07 05:00:39 2017] [error] [client 192.168.100.1] PHP Parse error: syntax error, unexpected '[' in /usr/share/zabbix/index.php on line 29
官网上zabbix-web3.0要求php5.4以上,centos默认php5.3
当然不想编译,还是用源来安装
删除旧的php
yum remove php*
安装一个源
rpm -Uvh https://mirror.webtatic.com/yum/el6/latest.rpm
安装php
yum install php56w php56w-bcmath php56w-xml php56w-mbstring php56w-pgsql php56w-gd
修改配置文件 /etc/php.ini
php_value max_execution_time 300
php_value memory_limit 128M
php_value post_max_size 16M
php_value upload_max_filesize 2M
php_value max_input_time 300
php_value always_populate_raw_post_data -1
# php_value date.timezone Europe/Riga
重启服务后就可以访问了
安装zabbix-server
安装源
yum install http://repo.zabbix.com/zabbix/3.0/rhel/6/x86_64/zabbix-release-3.0-1.el6.noarch.rpm
安装zabbix-server-pgsql
yum install zabbix-server-pgsql
安装postgresql连接工具
yum install postgresql
导入库结构
zcat /usr/share/doc/zabbix-server-pgsql-3.0.7/create.sql.gz | psql -h 192.168.100.11 -U zabbix zabbix
修改配置文件
vi /etc/zabbix/zabbix_server.conf
DBHost=zabbix
DBName=zabbix
DBUser=zabbix
DBPassword=zabbix
重启服务
service zabbix-server restart
这时候就可以回到zabbix-web的页面完成配置
用浏览器访问zabbix-web的地址
最后的配置是这样的,需要注意的是Database schema 的值时public
安装zabbix-proxy
同样安装源
yum install http://repo.zabbix.com/zabbix/3.0/rhel/6/x86_64/zabbix-release-3.0-1.el6.noarch.rpm
安装proxy
yum install zabbix-proxy-sqlite3
修改配置文件
vi /etc/zabbix/zabbix_proxy.conf
Server=192.168.100.10 #这里指向 zabbix-server
Hostname=server.zabbix
DBName=/var/zabbix/zabbix_proxy
设置一个目录给sqlite,并设置权限
[root@proxy ~]# mkdir /var/zabbix
[root@proxy ~]# chown zabbix.zabbix /var/zabbix
重启服务
service zabbix-proxy restart
查看日志就会发现下面这个问题,因为这个主机没有加到zabbix-server
tail /var/log/zabbix/zabbix_proxy.log
3224:20170107:061458.383 cannot send heartbeat message to server at "192.168.100.10": proxy "server.zabbix" not found
3223:20170107:061458.472 cannot obtain configuration data from server at "192.168.100.10": proxy "server.zabbix" not found
在web界面添加proxy
这个写对就可以了,然后就会发现这个数据库已经写入数据了
[root@proxy ~]# ll /var/zabbix/zabbix.db
-rw-r--r--. 1 root root 544768 Jan 7 08:37 /var/zabbix/zabbix.db
安装zabbix-agent
yum install http://repo.zabbix.com/zabbix/3.0/rhel/6/x86_64/zabbix-release-3.0-1.el6.noarch.rpm
yum install zabbix-agent
vi /etc/zabbix/zabbix_agentd.conf
Server=192.168.100.13
ServerActive=192.168.100.13
Hostname=jsp.site.cos
这些就不解释了,server指向的是proxy
配置
自动发现
配置- 动作-自动注册-创建动作
这里创建两条 一条自动加入host 一条根据hostname加入群组并链接模版
监控项和触发器
这个可以在模版和单个主机添加监控项和触发器。
自定义监控项
直接写入agent的配置文件
无参数
UserParameter=ping,echo 1
有参数
UserParameter=ping[*],echo $1
调用一个脚本
[root@http zabbix_agentd.d]# cat /etc/zabbix/shell/ping.sh
echo $1
UserParameter=ping[*],/etc/zabbix/shell/ping.sh $1
自定义监控项后需要重启服务才生效,可以用zabbix-get测试是否成功
zabbix_get -s 192.168.100.15 -p 10050 -k "ping[5]"
当然自定义的监控项也是可以添加的 并且可以建立与之对应的触发器
监控项里的发现规则
zabbix默认的模版会自动发现磁盘和网卡,当然也可以自己写规则来发现运行的服务和监听的端口
zabbix的模版自带自动发现硬盘的规则,先看看它是怎么工作的
vfs.fs.discovery 是zabbix自带的发现硬盘的key,看看它的内容
[root@web ~]# zabbix_get -s 192.168.100.15 -k vfs.fs.discovery
zabbix_get [3655]: Check access restrictions in Zabbix agent configuration
又是报错,看看日志
[root@http zabbix_agentd.d]# tail /var/log/zabbix/zabbix_agentd.log
5829:20170108:022236.084 agent #5 started [active checks #1]
5826:20170108:022236.085 agent #2 started [listener #1]
5825:20170108:022236.086 agent #1 started [collector]
5828:20170108:022406.557 failed to accept an incoming connection: connection from "192.168.100.12" rejected, allowed hosts: "192.168.100.13"
5827:20170108:022538.861 failed to accept an incoming connection: connection from "192.168.100.12" rejected, allowed hosts: "192.168.100.13"
它这意思是它拒绝了你的请求,并且它只接受192.168.100.13的请求,这个是proxy的地址,上proxy执行同样的命令
[root@proxy ~]# zabbix_get -s 192.168.100.15 -k vfs.fs.discovery
{"data":[{"{#FSNAME}":"/","{#FSTYPE}":"rootfs"},{"{#FSNAME}":"/proc","{#FSTYPE}":"proc"},{"{#FSNAME}":"/sys","{#FSTYPE}":"sysfs"},{"{#FSNAME}":"/dev","{#FSTYPE}":"devtmpfs"},{"{#FSNAME}":"/dev/pts","{#FSTYPE}":"devpts"},{"{#FSNAME}":"/dev/shm","{#FSTYPE}":"tmpfs"},{"{#FSNAME}":"/","{#FSTYPE}":"ext4"},{"{#FSNAME}":"/selinux","{#FSTYPE}":"selinuxfs"},{"{#FSNAME}":"/dev","{#FSTYPE}":"devtmpfs"},{"{#FSNAME}":"/proc/bus/usb","{#FSTYPE}":"usbfs"},{"{#FSNAME}":"/boot","{#FSTYPE}":"ext4"},{"{#FSNAME}":"/proc/sys/fs/binfmt_misc","{#FSTYPE}":"binfmt_misc"},{"{#FSNAME}":"/vagrant","{#FSTYPE}":"vboxsf"}]}
有结果了,目测格式是json,{#FSNAME}这东西是zabbix里的宏
于是来写个发现端口的脚本吧,其实是网上抄的
[root@http shell]# cat ports.py
#!/usr/bin/python
#coding=utf-8
import commands
##########返回命令执行结果
def getComStr(comand):
try:
stat, proStr = commands.getstatusoutput(comand)
except:
print "command %s execute failed, exit" % comand
#将字符串转化成列表
#proList = proStr.split("\n")
return proStr
##########获取系统服务名称和监听端口
def filterList():
tmpStr = getComStr("netstat -tpln")
tmpList = tmpStr.split("\n")
del tmpList[0:2]
newList = []
for i in tmpList:
val = i.split()
del val[0:3]
del val[1:3]
#提取端口号
valTmp = val[0].split(":")
val[0] = valTmp[1]
#提取服务名称
valTmp = val[1].split("/")
val[1] = valTmp[-1]
if val[1] != '-' and val not in newList:
newList.append(val)
return newList
def main():
netInfo = filterList()
#格式化成适合zabbix lld的json数据
json_data = "{\n" + "\t" + '"data":[' + "\n"
#print netInfo
for net in netInfo:
if net != netInfo[-1]:
json_data = json_data + "\t\t" + "{" + "\n" + "\t\t\t" + '"{#PPORT}":"' + str(net[0]) + "\",\n" + "\t\t\t" + '"{#PNAME}":"' + str(net[1]) + "\"},\n"
else:
json_data = json_data + "\t\t" + "{" + "\n" + "\t\t\t" + '"{#PPORT}":"' + str(net[0]) + "\",\n" + "\t\t\t" + '"{#PNAME}":"' + str(net[1]) + "\"}]}"
print json_data
if __name__ == "__main__":
main()
运行一下
[root@http zabbix_agentd.d]# python /etc/zabbix/shell/ports.py
{
"data":[
{
"{#PPORT}":"57278",
"{#PNAME}":"rpc.statd"},
{
"{#PPORT}":"10050",
"{#PNAME}":"zabbix_agentd"},
{
"{#PPORT}":"111",
"{#PNAME}":"rpcbind"},
{
"{#PPORT}":"22",
"{#PNAME}":"sshd"},
{
"{#PPORT}":"25",
"{#PNAME}":"master"},
{
"{#PPORT}":"",
"{#PNAME}":"rpc.statd"},
{
"{#PPORT}":"",
"{#PNAME}":"zabbix_agentd"},
{
"{#PPORT}":"",
"{#PNAME}":"rpcbind"},
{
"{#PPORT}":"",
"{#PNAME}":"sshd"},
{
"{#PPORT}":"",
"{#PNAME}":"master"}]}
然后把这个加到agent的配置文件里
UserParameter=ports.discovery,python /etc/zabbix/shell/ports.py
测试一下
[root@proxy ~]# zabbix_get -s 192.168.100.15 -k ports.discovery
Traceback (most recent call last):
File "/etc/zabbix/shell/ports.py", line 44, in <module>
main()
File "/etc/zabbix/shell/ports.py", line 33, in main
netInfo = filterList()
File "/etc/zabbix/shell/ports.py", line 25, in filterList
val[0] = valTmp[1]
IndexError: list index out of range
又是错误,list超出范围,也就list没内容,也就也就是没有权限执行netstat -tpln
然后把agent的配置文件改下sudo执行
UserParameter=ports.discovery,sudo python /etc/zabbix/shell/ports.py
并且zabbix配置sudo 无密码
/etc/sudoers 最后加一行
zabbix ALL=(ALL) NOPASSWD: NOPASSWD: /usr/bin/python
这里指定zabbix执行python的时候可以无密码
再试下
[root@proxy ~]# zabbix_get -s 192.168.100.15 -k ports.discovery
{
"data":[
{
"{#PPORT}":"57278",
"{#PNAME}":"rpc.statd"},
{
"{#PPORT}":"10050",
"{#PNAME}":"zabbix_agentd"},
{
"{#PPORT}":"111",
"{#PNAME}":"rpcbind"},
{
"{#PPORT}":"22",
"{#PNAME}":"sshd"},
{
"{#PPORT}":"25",
"{#PNAME}":"master"},
{
"{#PPORT}":"",
"{#PNAME}":"rpc.statd"},
{
"{#PPORT}":"",
"{#PNAME}":"zabbix_agentd"},
{
"{#PPORT}":"",
"{#PNAME}":"rpcbind"},
{
"{#PPORT}":"",
"{#PNAME}":"sshd"},
{
"{#PPORT}":"",
"{#PNAME}":"master"}]}
现在已经成功一半了,后面都是在web里设置了
创建一个模版,添加一个发现规则
过滤规则,{#PPORT} 符合后面的正则才会被发现,也就是1到4位的数字
PS:下面的正则写错了 应该是^[0-9]{1,4}$
在发现规则里添加监控项原型
添加触发器原型
把模版链接到主机结果大概是这样
为什么有10050 正则好像还是不对,不过监控项自动发现基本是成功了。
警报通知
报警通知类似自动发现是在配置-动作-时间源选触发器,创建规则,这里也可以运行自己预先写好的脚本
整个流程
安装以后,建立群组,建立监控模版,写好报警通知和自动发现,安装好agent,注意指向的ip是proxy还是server,hostname符合发现规则,就自动添加了