1.简介
Redis(Remote Dictionary Server)哨兵模式是一种用于提供Redis高可用性的架构模式。它通过引入哨兵节点来监控Redis服务器的状态,并在主节点发生故障时自动进行故障转移。哨兵模式的核心目标是保证Redis服务的连续可用性和数据的一致性。
- 引入哨兵节点监控Redis服务器状态
- 在主节点发生故障时进行故障转移
Redis 哨兵模式中,主要包含以下几个角色和概念:
- 主节点(Master):主节点是 Redis 的核心节点,负责处理客户端的写操作和数据的持久化。哨兵模式下只有一个主节点。
- 从节点(Slave):从节点是主节点的复制品,负责复制主节点的数据并处理客户端的读操作。从节点可以有零个或多个。
- 哨兵节点(Sentinel):哨兵节点是独立于 Redis 的特殊节点,用于监控 Redis 主节点和从节点的状态。它们会定期检查节点的健康状况,并在主节点故障时触发故障转移过程。哨兵节点能够自动选择一个从节点升级为新的主节点,并将其他从节点重新配置为新主节点的从节点。
- 故障转移(Failover):当主节点故障或不可用时,哨兵节点会通过一系列的选举和决策过程选择出新的主节点,并通知客户端进行更新。故障转移的目标是尽可能快速地切换到新的主节点,保证服务的连续性。
2.搭建过程
本文搭建三个Redis服务器节点(一台作为主服务器,另外两台作为从服务器,从服务器自动从主服务器同步数据),三个Redis哨兵节点。部分配置如下:
- 环境:Debian
- Redis版本:7.0.5
- 环境目录:/work/redis-sentinel-0905(为简化文章,本文后续示例中此路径替换为{base})
- redis-server位置:{base}/redis-7.0.5/src/redis-server(用于Redis服务节点和哨兵节点的启动,只需一个脚本,以指定配置文件的方式启动)
2.1环境目录
在{base}目录下创建s1、s2、s3三个目录,s1用于主节点,s2和s3用于从节点搭建
每个目录下分别放置一份服务节点配置文件和一份哨兵节点配置文件,如下:
特别提醒:如果根据此教程搭建哨兵过程中,遇到问题想重新搭建,需要在/var/run/目录下清理对应的.pid文件(笔者Linux较弱,但感觉删除还是必须滴)
2.2主节点配置(s1)
s1下创建redis-s1.conf文件,内容如下:
# Redis绑定的网卡
bind 172.17.254.21
# 关闭保护模式
protected-mode no
# 以后台方式启动
daemonize yes
# 密码
requirepass 123456
databases 16
# maxclients 10000
# 端口
port 6379
# 进程文件
pidfile /var/run/redis_6379.pid
# replicaof 172.17.254.21 6379
# 主从同步时访问的密码(主从最好都设置)
masterauth 123456
# 以下配置为Redis的默认配置,本文并未对其修改。另外两台Redis从服务器以下与主节点相同,从节点将不再描述
loglevel notice
logfile ""
always-show-logo no
set-proc-title yes
proc-title-template "{title} {listen-addr} {server-mode}"
tcp-backlog 511
timeout 0
tcp-keepalive 300
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
rdb-del-sync-files no
dir ./
replica-serve-stale-data yes
replica-read-only yes
repl-diskless-sync yes
repl-diskless-sync-delay 5
repl-diskless-sync-max-replicas 0
repl-diskless-load disabled
repl-disable-tcp-nodelay no
replica-priority 100
acllog-max-len 128
lazyfree-lazy-eviction no
lazyfree-lazy-expire no
lazyfree-lazy-server-del no
replica-lazy-flush no
lazyfree-lazy-user-del no
lazyfree-lazy-user-flush no
oom-score-adj no
oom-score-adj-values 0 200 800
disable-thp yes
appendonly no
appendfilename "appendonly.aof"
appenddirname "appendonlydir"
# appendfsync always
appendfsync everysec
# appendfsync no
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
aof-use-rdb-preamble yes
aof-timestamp-enabled no
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-listpack-entries 512
hash-max-listpack-value 64
list-max-listpack-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-listpack-entries 128
zset-max-listpack-value 64
hll-sparse-max-bytes 3000
stream-node-max-bytes 4096
stream-node-max-entries 100
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit replica 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
dynamic-hz yes
aof-rewrite-incremental-fsync yes
rdb-save-incremental-fsync yes
jemalloc-bg-thread yes
2.3从节点配置(s2、s3)
s2下创建redis-s2.conf;s3下创建redis-s3.conf
redis-s2.conf内容如下(默认配置请参考redis-s1的默认部分):
# 端口
port 6380
# 进程文件
pidfile /var/run/redis_6380.pid
# Redis主从配置,代表此服务器会定期从172.17.254.21:6379的Redis服务器同步数据
replicaof 172.17.254.21 6379
# 访问172.17.254.21:6379的密码
masterauth 123456
bind 172.17.254.21
protected-mode no
daemonize yes
requirepass 123456
databases 16
# maxclients 10000
# 默认配置(参考redis-s1)
redis-s3.conf内容如下(默认配置请参考redis-s1的部分):
# 端口
port 6381
# 进程文件
pidfile /var/run/redis_6381.pid
# Redis主从配置,代表此服务器会定期从172.17.254.21:6379的Redis服务器同步数据
replicaof 172.17.254.21 6379
# 访问172.17.254.21:6379的密码
masterauth 123456
bind 172.17.254.21
protected-mode no
daemonize yes
requirepass 123456
databases 16
# maxclients 10000
# 默认配置(参考redis-s1)
2.4哨兵配置(s1、s2、s3)
在s1、s2、s3下分别创建sentinel-s1.conf、sentinel-s2.conf、sentinel-s3.conf文件,内容如下:
sentinel-s1.conf:
protected-mode no
daemonize yes
# 哨兵端口
port 26379
# 进程文件
pidfile /var/run/redis-sentinel-s1.pid
# 目录
dir /work/redis-sentinel-0905/s1
# 哨兵监控主节点名称、地址、推选新节点最小数量
sentinel monitor mymaster 172.17.254.21 6379 2
# 哨兵访问的主节点名称mymaster、密码123456
sentinel auth-pass mymaster 123456
# 哨兵日志文件
logfile "/work/redis-sentinel-0905/s1/sentinel.log"
acllog-max-len 128
sentinel down-after-milliseconds mymaster 30000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000
sentinel deny-scripts-reconfig yes
SENTINEL resolve-hostnames no
SENTINEL announce-hostnames no
SENTINEL master-reboot-down-after-period mymaster 0
sentinel-s2.conf:
protected-mode no
daemonize yes
port 26380
pidfile /var/run/redis-sentinel-s2.pid
dir /work/redis-sentinel-0905/s2
sentinel monitor mymaster 172.17.254.21 6379 2
sentinel auth-pass mymaster 123456
logfile "/work/redis-sentinel-0905/s2/sentinel.log"
acllog-max-len 128
sentinel down-after-milliseconds mymaster 30000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000
sentinel deny-scripts-reconfig yes
SENTINEL resolve-hostnames no
SENTINEL announce-hostnames no
SENTINEL master-reboot-down-after-period mymaster 0
sentinel-s3.conf:
protected-mode no
daemonize yes
port 26381
pidfile /var/run/redis-sentinel-s3.pid
dir /work/redis-sentinel-0905/s3
sentinel monitor mymaster 172.17.254.21 6379 2
sentinel auth-pass mymaster 123456
logfile "/work/redis-sentinel-0905/s3/sentinel.log"
acllog-max-len 128
sentinel down-after-milliseconds mymaster 30000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000
sentinel deny-scripts-reconfig yes
SENTINEL resolve-hostnames no
SENTINEL announce-hostnames no
SENTINEL master-reboot-down-after-period mymaster 0
2.5启动Redis服务
# 切换至redis-server目录
root@aa:/# cd {base}/redis-7.0.5/src
# 启动1台主节点服务
root@aa:{base}/redis-7.0.5/src# ./redis-server ../../s1/redis-s1.conf
# 启动2台从节点服务
root@aa:{base}/redis-7.0.5/src# ./redis-server ../../s2/redis-s2.conf
root@aa:{base}/redis-7.0.5/src# ./redis-server ../../s3/redis-s3.conf
# 查看Redis服务
root@aa:{base}/redis-7.0.5/src# ps -ef|grep redis
root 481 9 0 14:38 ? 00:00:03 ./redis-server 172.17.254.21:6379
root 439 9 0 14:28 ? 00:00:04 ./redis-server 172.17.254.21:6380
root 503 9 0 14:51 ? 00:00:01 ./redis-server 172.17.254.21:6381
root 301 13 0 11:53 pts/0 00:00:00 grep redis
# 启动3台哨兵服务
root@aa:{base}/redis-7.0.5/src# ./redis-server ../../s1/sentinel-s1.conf --sentinel
root@aa:{base}/redis-7.0.5/src# ./redis-server ../../s2/sentinel-s2.conf --sentinel
root@aa:{base}/redis-7.0.5/src# ./redis-server ../../s3/sentinel-s3.conf --sentinel
# 查看Redis服务
root@aa:{base}/redis-7.0.5/src# ps -ef|grep redis
root 481 9 0 14:38 ? 00:00:03 ./redis-server 172.17.254.21:6379
root 439 9 0 14:28 ? 00:00:04 ./redis-server 172.17.254.21:6380
root 503 9 0 14:51 ? 00:00:01 ./redis-server 172.17.254.21:6381
root 451 9 0 14:29 ? 00:00:05 ./redis-server *:26379 [sentinel]
root 457 9 0 14:29 ? 00:00:05 ./redis-server *:26380 [sentinel]
root 463 9 0 14:29 ? 00:00:05 ./redis-server *:26381 [sentinel]
root 301 13 0 11:53 pts/0 00:00:00 grep redis
2.6哨兵验证
依次访问6379和6380的客户端,执行info replication,查看信息可知此结点为主节点(role:master),连接的从节点有172.17.254.21:6380、172.17.254.21:6381
172.17.254.21:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=172.17.254.21,port=6380,state=online,offset=9099295,lag=1
slave1:ip=172.17.254.21,port=6381,state=online,offset=9099295,lag=0
master_failover_state:no-failover
master_replid:fade34f2efecc0342ba1e961d5a0b9d479fa6c45
master_replid2:0b8721c1bf692380f0e08f7851d0cbce86e1883c
master_repl_offset:9099295
second_repl_offset:8923953
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:8746585
repl_backlog_histlen:352711
6380的信息,可知此结点为从节点(role:slave)
172.17.254.21:6380> info replication
# Replication
role:slave
master_host:172.17.254.21
master_port:6379
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_read_repl_offset:9123506
slave_repl_offset:9123506
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:fade34f2efecc0342ba1e961d5a0b9d479fa6c45
master_replid2:0b8721c1bf692380f0e08f7851d0cbce86e1883c
master_repl_offset:9123506
second_repl_offset:8923953
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:8064202
repl_backlog_histlen:1059305
杀死6379主服务,重新查看redis服务
# kill 6379对应的进程
root@aa:{base}/redis-7.0.5/src# kill -9 481
# 下线6379后的redis服务:
root@aa:{base}/redis-7.0.5/src# ps -ef|grep redis
root 439 9 0 14:28 ? 00:00:05 ./redis-server 172.17.254.21:6380
root 503 9 0 14:51 ? 00:00:01 ./redis-server 172.17.254.21:6381
root 451 9 0 14:29 ? 00:00:06 ./redis-server *:26379 [sentinel]
root 457 9 0 14:29 ? 00:00:06 ./redis-server *:26380 [sentinel]
root 463 9 0 14:29 ? 00:00:06 ./redis-server *:26381 [sentinel]
root 522 13 0 15:03 pts/0 00:00:00 grep redis
重新访问6380的客户端,执行info replication,发现此时6380升级为主节点(role:master),连接从节点有172.17.254.21:6381(不一定是6380,也可能是6381升级为主节点)
172.17.254.21:6380> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=172.17.254.21,port=6381,state=online,offset=9170187,lag=1
master_failover_state:no-failover
master_replid:650e57f1869293fc5b23d58216eb8687d2e65aed
master_replid2:fade34f2efecc0342ba1e961d5a0b9d479fa6c45
master_repl_offset:9170328
second_repl_offset:9129448
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:8084642
repl_backlog_histlen:1085687
重新上线6379服务
root@aa:{base}/redis-7.0.5/src# ./redis-server ../../s1/redis-s1.conf
# 所有redis服务
root@aa:{base}/redis-7.0.5/src# ps -ef|grep redis
root 531 9 0 15:12 ? 00:00:00 ./redis-server 172.17.254.21:6379
root 439 9 0 14:28 ? 00:00:06 ./redis-server 172.17.254.21:6380
root 503 9 0 14:51 ? 00:00:02 ./redis-server 172.17.254.21:6381
root 451 9 0 14:29 ? 00:00:08 ./redis-server *:26379 [sentinel]
root 457 9 0 14:29 ? 00:00:08 ./redis-server *:26380 [sentinel]
root 463 9 0 14:29 ? 00:00:08 ./redis-server *:26381 [sentinel]
root 539 13 0 15:12 pts/0 00:00:00 grep redis
重新访问6380的客户端,执行info replication,发现此时6379作为从节点已上线,验证完毕
172.17.254.21:6380> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=172.17.254.21,port=6381,state=online,offset=9297137,lag=0
slave1:ip=172.17.254.21,port=6379,state=online,offset=9297137,lag=0
master_failover_state:no-failover
master_replid:650e57f1869293fc5b23d58216eb8687d2e65aed
master_replid2:fade34f2efecc0342ba1e961d5a0b9d479fa6c45
master_repl_offset:9297137
second_repl_offset:9129448
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:8227842
repl_backlog_histlen:1069296
2.7哨兵客户端相关验证
可以访问哨兵的客户端,执行相关命令进行哨兵的验证,一些示例命令如下:
# 通过主节点名称获取主节点地址
sentinel get-master-addr-by-name mymaster
# 查看从节点
sentinel slaves mymaster
# 查看其他哨兵节点
sentinel sentinels mymaster
3.关于Spring应用
当Spring应用连接Redis哨兵时,如果主节点故障下线而发生故障转移时,会导致应用在几十秒内不可用,且后台会有部分报错(笔者本人环境出现)。不再报错后,应用可正常使用