RocketMQ作为一款优秀的分布式消息中间件,分布式系统的一个很重要的特点就是要保证系统的高可用(HA),RocketMQ则是通过主从同步机制保证系统的高可用。
1、概述
主从同步同步的是啥作为消息中间件,无疑是消息相当于给数据做”备份“,主节点服务器Broker宕机后,消费者可以从从节点的服务消费消息,可以保证业务的正常运行。 主从同步的原理图
增加slave从节点的优点:
数据备份:保证了两/多台机器上的数据冗余,特别是在主从同步复制的情况下,一定程度上保证了Master出现不可恢复的故障以后,数据不丢失。 高可用性:即使Master掉线, Consumer会自动重连到对应的Slave机器,不会出现消费停滞的情况。 提高性能:主要表现为可分担Master读的压力,当从Master拉取消息,拉取消息的最大物理偏移与本地存储的最大物理偏移的差值超过一定值,会转向Slave(默认brokerId=1)进行读取,减轻了Master压力。 消费实时:master宕机后消费者可以从slave上消费保证消息的实时性,但是slave不能接收producer发送的消息,slave只能同步master数据(RocketMQ4.5版本之前),4.5版本开始增加多副本机制,根据RAFT算法,master宕机会自动选择其中一个副本节点作为master保证消息可以正常的生产消费。
主从数据同步有两种方式同步复制、异步复制
复制方式 | 优点 | 缺点 | 适应场景 |
---|---|---|---|
同步复制 | slave保证了与master一致的数据副本,如果master宕机,数据依然在slave中找到其数据和master的数据一致 | 由于需要slave确认效率上会有一定的损失 | 数据可靠性要求很高的场景 |
异步复制 | 无需等待slave确认消息是否存储成功效率上要高于同步复制 | 如果master宕机,由于数据同步有延迟导致slave和master存在一定程度的数据不一致问题 | 数据可靠性要求一般的场景 |
我们在前面章节中 RocketMQ存储文件介绍过消息存储相关的文件信息,从节点同步commitlog时同样需要同步相关的配置信息,主题列表信息、消费组信息、消费进度信息等元数据信息。下面从源码的角度具体分析下。
2、元数据复制
2.1、Broker启动时元数据同步
元数据就是基础信息,如主题信息、消费者信息、消费进度信息等。我们分析下broker启动时的业务逻辑处理,broker服务启动时会创建BrokerController对象并将其初始化initialize()分析其方法
//如果Broker是Slave
if (BrokerRole.SLAVE == this.messageStoreConfig.getBrokerRole()) {
if (this.messageStoreConfig.getHaMasterAddress() != null && this.messageStoreConfig.getHaMasterAddress().length() >= 6) {
this.messageStore.updateHaMasterAddress(this.messageStoreConfig.getHaMasterAddress());
this.updateMasterHAServerAddrPeriodically = false;
} else {
this.updateMasterHAServerAddrPeriodically = true;
}
//启动一个定时的单线程池
this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {
@Override
public void run() {
try {
BrokerController.this.slaveSynchronize.syncAll();
} catch (Throwable e) {
log.error("ScheduledTask syncAll slave exception", e);
}
}
}, 1000 * 10, 1000 * 60, TimeUnit.MILLISECONDS);
} else {//如果是master
//定时打印master与slave的差距
this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {
@Override
public void run() {
try {
BrokerController.this.printMasterAndSlaveDiff();
} catch (Throwable e) {
log.error("schedule printMasterAndSlaveDiff error.", e);
}
}
}, 1000 * 10, 1000 * 60, TimeUnit.MILLISECONDS);
}
当前broker为slave是执行slaveSynchronize.syncAll();,每隔60秒同步master中的元数据信息
public void syncAll() {
//同步主题配置信息
this.syncTopicConfig();
//同步消费者偏移量信息
this.syncConsumerOffset();
//同步延迟消费的偏移量信息
this.syncDelayOffset();
//同步订阅的消息组信息
this.syncSubscriptionGroupConfig();
}
我们通过图分析其执行原理,四种文件的同步方式是相同的,我们分析一个主题信息同步
2.2、syncTopicConfig主题信息同步原理
slave端启动时初始化
private void syncTopicConfig() {
String masterAddrBak = this.masterAddr;
if (masterAddrBak != null) {
try {
//调用getAllTopicConfig从master中拉取TopicConfig配置信息
TopicConfigSerializeWrapper topicWrapper =
this.brokerController.getBrokerOuterAPI().getAllTopicConfig(masterAddrBak);
//比较版本号是否一致,不一致则更新
if (!this.brokerController.getTopicConfigManager().getDataVersion()
.equals(topicWrapper.getDataVersion())) {
//更新TopicConfigManager中的版本号
this.brokerController.getTopicConfigManager().getDataVersion()
.assignNewOne(topicWrapper.getDataVersion());
//清空TopicConfigManager中TopicConfig信息
this.brokerController.getTopicConfigManager().getTopicConfigTable().clear();
//赋值新的信息
this.brokerController.getTopicConfigManager().getTopicConfigTable()
.putAll(topicWrapper.getTopicConfigTable());
//进行持久化
this.brokerController.getTopicConfigManager().persist();
log.info("Update slave topic config from master, {}", masterAddrBak);
}
} catch (Exception e) {
log.error("SyncTopicConfig Exception, {}", masterAddrBak, e);
}
}
}
查看其getAllTopicConfig方法调用master中的broker获取topic配置信息,解码返回的数据封装成TopicConfigSerializeWrapper,里面包含主题的配置信息(topicConfigTable)、拉取的当前数据的版本(dataVersion),slave判断拉取的数据版本相同时就不需要更新topicConfig信息。
public TopicConfigSerializeWrapper getAllTopicConfig(
final String addr) throws RemotingConnectException, RemotingSendRequestException,
RemotingTimeoutException, InterruptedException, MQBrokerException {
//创建request
RemotingCommand request = RemotingCommand.createRequestCommand(RequestCode.GET_ALL_TOPIC_CONFIG, null);
//调用底层通信模块remotingClient进行请求返回response
RemotingCommand response = this.remotingClient.invokeSync(MixAll.brokerVIPChannel(true, addr), request, 3000);
assert response != null;
switch (response.getCode()) {
case ResponseCode.SUCCESS: {
//若返回成功包装成TopicConfigSerializeWrapper进行返回
return TopicConfigSerializeWrapper.decode(response.getBody(), TopicConfigSerializeWrapper.class);
}
default:
break;
}
throw new MQBrokerException(response.getCode(), response.getRemark());
}
调用master端的请求数据
private RemotingCommand getAllTopicConfig(ChannelHandlerContext ctx, RemotingCommand request) {
final RemotingCommand response = RemotingCommand.createResponseCommand(GetAllTopicConfigResponseHeader.class);
//对获取的TopicConfig信息进行编码
String content = this.brokerController.getTopicConfigManager().encode();
if (content != null && content.length() > 0) {
try {
//二进制数据返回
response.setBody(content.getBytes(MixAll.DEFAULT_CHARSET));
} catch (UnsupportedEncodingException e) {
log.error("", e);
response.setCode(ResponseCode.SYSTEM_ERROR);
response.setRemark("UnsupportedEncodingException " + e);
return response;
}
} else {
log.error("No topic in this broker, client: {}", ctx.channel().remoteAddress());
response.setCode(ResponseCode.SYSTEM_ERROR);
response.setRemark("No topic in this broker");
return response;
}
response.setCode(ResponseCode.SUCCESS);
response.setRemark(null);
return response;
}
我们发现只同步了基础的配置信息,没有同步consumequeue信息,我们slave在同步commitlog时会根据其文件内的信息构建consumequeue队列信息。
3、CommitLog复制
commitlog文件复制流程图
从图中我们发现包含两部分1、slave的broker向master连接心跳包,报告master当前slave的数据偏移量,等待master发送最新的消息。2、master启动时开启一个监听等待slave发送心跳包,接收到slave发送的请求封装成HAConnection对象中两个属性,WriteSocketService处理slave发送的心跳包,ReadSocketService发送slave的数据请求。
3.1、SLAVE发送心跳包
slave发送心跳包主要是HAClient发送相关信息到master上我们首先分析下
public void run() {
log.info(this.getServiceName() + " service started");
while (!this.isStopped()) {
try {
//和Master建立连接
if (this.connectMaster()) {
//判断是否向Master反馈当前的消息拉取的偏移量,默认5秒发送一次
if (this.isTimeToReportOffset()) {
//重新发送slaveoffset
boolean result = this.reportSlaveMaxOffset(this.currentReportedOffset);
if (!result) {
this.closeMaster();
}
}
//I/O复用,检查是否有读事件
this.selector.select(1000);
//核心方法,处理Master返回的待处理的消息
boolean ok = this.processReadEvent();
if (!ok) {
this.closeMaster();
}
//处理完读事件后,若slaveoffset更新,需要再次发送新的slaveoffset
if (!reportSlaveMaxOffsetPlus()) {
continue;
}
long interval =
HAService.this.getDefaultMessageStore().getSystemClock().now()
- this.lastWriteTimestamp;
if (interval > HAService.this.getDefaultMessageStore().getMessageStoreConfig()
.getHaHousekeepingInterval()) {
log.warn("HAClient, housekeeping, found this connection[" + this.masterAddress
+ "] expired, " + interval);
this.closeMaster();
log.warn("HAClient, master not response some time, so close connection");
}
} else {
//等待5秒,再次连接Master
this.waitForRunning(1000 * 5);
}
} catch (Exception e) {
log.warn(this.getServiceName() + " service has exception. ", e);
this.waitForRunning(1000 * 5);
}
}
log.info(this.getServiceName() + " service end");
}
通过代码中的注解我们发现slave发送请求的最大消息偏移量给master,处理master发回的数据核心方法是processReadEvent()
private boolean processReadEvent() {
//连续读取数据大小是0次数
int readSizeZeroTimes = 0;
while (this.byteBufferRead.hasRemaining()) {
try {
int readSize = this.socketChannel.read(this.byteBufferRead);
if (readSize > 0) {
lastWriteTimestamp = HAService.this.defaultMessageStore.getSystemClock().now();
readSizeZeroTimes = 0;
//调用dispatchReadRequest分发读请求
boolean result = this.dispatchReadRequest();
if (!result) {
log.error("HAClient, dispatchReadRequest error");
return false;
}
} else if (readSize == 0) {
//读取三次都没有内容则跳出循环
if (++readSizeZeroTimes >= 3) {
break;
}
} else {
log.info("HAClient, processReadEvent read socket < 0");
return false;
}
} catch (IOException e) {
log.info("HAClient, processReadEvent read socket exception", e);
return false;
}
}
return true;
}
调用dispatchReadRequest分发读请求
private boolean dispatchReadRequest() {
final int msgHeaderSize = 8 + 4; // phyoffset + size
int readSocketPos = this.byteBufferRead.position();
while (true) {
int diff = this.byteBufferRead.position() - this.dispatchPostion;
if (diff >= msgHeaderSize) {
long masterPhyOffset = this.byteBufferRead.getLong(this.dispatchPostion);
int bodySize = this.byteBufferRead.getInt(this.dispatchPostion + 8);
//获取当前本地的commitlog偏移量
long slavePhyOffset = HAService.this.defaultMessageStore.getMaxPhyOffset();
if (slavePhyOffset != 0) {
if (slavePhyOffset != masterPhyOffset) {
log.error("master pushed offset not equal the max phy offset in slave, SLAVE: "
+ slavePhyOffset + " MASTER: " + masterPhyOffset);
return false;
}
}
if (diff >= (msgHeaderSize + bodySize)) {
byte[] bodyData = new byte[bodySize];
this.byteBufferRead.position(this.dispatchPostion + msgHeaderSize);
this.byteBufferRead.get(bodyData);
HAService.this.defaultMessageStore.appendToCommitLog(masterPhyOffset, bodyData);
//回到读的位置
this.byteBufferRead.position(readSocketPos);
this.dispatchPostion += msgHeaderSize + bodySize;
if (!reportSlaveMaxOffsetPlus()) {
return false;
}
continue;
}
}
//如果bytebufferread没有剩余空间,调用reallocateByteBuffer
if (!this.byteBufferRead.hasRemaining()) {
this.reallocateByteBuffer();
}
break;
}
return true;
}
[图片上传失败...(image-21baf2-1586594953123)]
消息头是8+4字节,是请求头信息,需要保证读取的消息是完整的,需要判断读取的buffer的长度满足是才能进行commitlog文件的数据追加。
public boolean appendToCommitLog(long startOffset, byte[] data) {
if (this.shutdown) {
log.warn("message store has shutdown, so appendToPhyQueue is forbidden");
return false;
}
boolean result = this.commitLog.appendData(startOffset, data);
if (result) {
//唤醒线程根据CommitLog文件构建ConsumeQueue、IndexFile文件
this.reputMessageService.wakeup();
} else {
log.error("appendToPhyQueue failed " + startOffset + " " + data.length);
}
return result;
}
追加消息时需要唤醒构建ConsumeQueue、IndexFile文件的线程ReputMessageService,这就是我们前面提到的为什么元数据复制的时候没有复制ConsumeQueue、IndexFile文件的原因。
3.2、MASTER接收请求进行处理
我们分为两部分查看监听salve的心跳包,master的broker接收到producer发送的消息时唤醒HA相关的线程来进行数据同步。
3.2.1、MASTER初始化监听slave心跳
Master构建消息存储的核心类DefaultMessageStore初始化时,启动了HAService.start()
public void start() throws Exception {
this.acceptSocketService.beginAccept();
this.acceptSocketService.start();
this.groupTransferService.start();
this.haClient.start();
}
其中Master端有两个核心类AcceptSocketService、GroupTransferService,
AcceptSocketService:在特定端口默认10912(可在Broker配置文件中配置)上监听从服务器的连接。
GroupTransferService:主从同步通知实现类
3.2.1.1、AcceptSocketService开启监听
public void beginAccept() throws Exception {
//创建ServerSocketChannel
this.serverSocketChannel = ServerSocketChannel.open();
this.selector = RemotingUtil.openSelector();
this.serverSocketChannel.socket().setReuseAddress(true);
//绑定监听端口
this.serverSocketChannel.socket().bind(this.socketAddressListen);
//非阻塞
this.serverSocketChannel.configureBlocking(false);
this.serverSocketChannel.register(this.selector, SelectionKey.OP_ACCEPT);
}
Java NIO中的 ServerSocketChannel 是一个可以监听新进来的TCP连接的通道, 就像标准IO中的ServerSocket一样。
ServerSocketChannel可以设置成非阻塞模式。在非阻塞模式下,accept() 方法会立刻返回,如果还没有新进来的连接,返回的将是null。 因此,需要检查返回的SocketChannel是否是null
AcceptSocketService是一个多线程类,this.acceptSocketService.start()我们查看其核心方法run();
public void run() {
log.info(this.getServiceName() + " service started");
while (!this.isStopped()) {
try {
this.selector.select(1000);
Set<SelectionKey> selected = this.selector.selectedKeys();
if (selected != null) {
for (SelectionKey k : selected) {
if ((k.readyOps() & SelectionKey.OP_ACCEPT) != 0) {
SocketChannel sc = ((ServerSocketChannel) k.channel()).accept();
if (sc != null) {
HAService.log.info("HAService receive new connection, "
+ sc.socket().getRemoteSocketAddress());
try {
HAConnection conn = new HAConnection(HAService.this, sc);
conn.start();
HAService.this.addConnection(conn);
} catch (Exception e) {
log.error("new HAConnection exception", e);
sc.close();
}
}
} else {
log.warn("Unexpected ops in select " + k.readyOps());
}
}
selected.clear();
}
} catch (Exception e) {
log.error(this.getServiceName() + " service has exception.", e);
}
}
log.info(this.getServiceName() + " service end");
}
选择器没1000毫秒(1秒)处理一次连接就绪事件,连接就绪后调用ServerSocketChannel的accept()方法创建SocketChannel,然后为每一个连接创建HAConnection对象。conn.start()
public void start() {
this.readSocketService.start();
this.writeSocketService.start();
}
启动了两个读写线程ReadSocketService处理slave发送的请求、WriteSocketService发送commitlog消息到slave中
3.2.1.2、ReadSocketService处理slave发送的请求
public void run() {
HAConnection.log.info(this.getServiceName() + " service started");
while (!this.isStopped()) {
try {
//等待读事件
this.selector.select(1000);
//处理读事件
boolean ok = this.processReadEvent();
if (!ok) {
HAConnection.log.error("processReadEvent error");
break;
}
//两次读事件的间隔时间超过了既定的值,则master和slave连接失效,跳出循环
long interval = HAConnection.this.haService.getDefaultMessageStore().getSystemClock().now() - this.lastReadTimestamp;
if (interval > HAConnection.this.haService.getDefaultMessageStore().getMessageStoreConfig().getHaHousekeepingInterval()) {
log.warn("ha housekeeping, found this connection[" + HAConnection.this.clientAddr + "] expired, " + interval);
break;
}
} catch (Exception e) {
HAConnection.log.error(this.getServiceName() + " service has exception.", e);
break;
}
}
this.makeStop();
writeSocketService.makeStop();
haService.removeConnection(HAConnection.this);
HAConnection.this.haService.getConnectionCount().decrementAndGet();
SelectionKey sk = this.socketChannel.keyFor(this.selector);
if (sk != null) {
sk.cancel();
}
try {
this.selector.close();
this.socketChannel.close();
} catch (IOException e) {
HAConnection.log.error("", e);
}
HAConnection.log.info(this.getServiceName() + " service end");
}
我们查看其处理slave发送的心跳包的核心方法processReadEvent()
private boolean processReadEvent() {
int readSizeZeroTimes = 0;
//若byteBufferRead没有剩余
if (!this.byteBufferRead.hasRemaining()) {
this.byteBufferRead.flip();
this.processPostion = 0;
}
while (this.byteBufferRead.hasRemaining()) {
try {
//读取
int readSize = this.socketChannel.read(this.byteBufferRead);
if (readSize > 0) {
//更新readSizeZeroTimes=0和lastReadTimestamp
readSizeZeroTimes = 0;
this.lastReadTimestamp = HAConnection.this.haService.getDefaultMessageStore().getSystemClock().now();
//超过8个字节处理,因为slave的broker发送的就是8个字节的slave的offset的心跳
if ((this.byteBufferRead.position() - this.processPostion) >= 8) {
//获取离byteBufferRead.position()最近的8的整除数(获取最后一个完整的包)
int pos = this.byteBufferRead.position() - (this.byteBufferRead.position() % 8);
//读取能读取到的最后一个有效的8个字节的心跳包
long readOffset = this.byteBufferRead.getLong(pos - 8);
this.processPostion = pos;
//更新slave broker反馈的已经拉取完的offset偏移量
HAConnection.this.slaveAckOffset = readOffset;
//若是首次获取slave 反馈的偏移量
if (HAConnection.this.slaveRequestOffset < 0) {
//将slave broker请求的拉取消息的偏移量也更新为该值
HAConnection.this.slaveRequestOffset = readOffset;
log.info("slave[" + HAConnection.this.clientAddr + "] request offset " + readOffset);
}
//通知slaveAckOffset已经更新
HAConnection.this.haService.notifyTransferSome(HAConnection.this.slaveAckOffset);
}
} else if (readSize == 0) {
//readSize连续3次都为0跳出循环
if (++readSizeZeroTimes >= 3) {
break;
}
} else {
log.error("read socket[" + HAConnection.this.clientAddr + "] < 0");
return false;
}
} catch (IOException e) {
log.error("processReadEvent exception", e);
return false;
}
}
return true;
}
}
注释中详细介绍了其处理过程
3.2.1.3、WriteSocketService发送commitlog消息到slave
public void run() {
HAConnection.log.info(this.getServiceName() + " service started");
while (!this.isStopped()) {
try {
//等待写事件
this.selector.select(1000);
//如果slaveRequestOffset等于-1说明 master还未接收到slave broker的拉取请求,放弃本次处理
//slaveRequestOffset在收到slave broker请求时更新
if (-1 == HAConnection.this.slaveRequestOffset) {
Thread.sleep(10);
continue;
}
//如果nextTransferFromWhere为-1说明是第一次进行数据传输,需要计算要传输的物理偏移量
if (-1 == this.nextTransferFromWhere) {
//如果slaveRequestOffset为0则从当前最后一个commitlog文件传输,否则根据slave broker的拉取请求偏移量开始
if (0 == HAConnection.this.slaveRequestOffset) {
long masterOffset = HAConnection.this.haService.getDefaultMessageStore().getCommitLog().getMaxOffset();
masterOffset =
masterOffset
- (masterOffset % HAConnection.this.haService.getDefaultMessageStore().getMessageStoreConfig()
.getMapedFileSizeCommitLog());
if (masterOffset < 0) {
masterOffset = 0;
}
this.nextTransferFromWhere = masterOffset;
} else {
this.nextTransferFromWhere = HAConnection.this.slaveRequestOffset;
}
log.info("master transfer data from " + this.nextTransferFromWhere + " to slave[" + HAConnection.this.clientAddr
+ "], and slave request " + HAConnection.this.slaveRequestOffset);
}
if (this.lastWriteOver) {
long interval =
HAConnection.this.haService.getDefaultMessageStore().getSystemClock().now() - this.lastWriteTimestamp;
if (interval > HAConnection.this.haService.getDefaultMessageStore().getMessageStoreConfig()
.getHaSendHeartbeatInterval()) {
// Build Header
this.byteBufferHeader.position(0);
this.byteBufferHeader.limit(headerSize);
this.byteBufferHeader.putLong(this.nextTransferFromWhere);
this.byteBufferHeader.putInt(0);
this.byteBufferHeader.flip();
this.lastWriteOver = this.transferData();
if (!this.lastWriteOver)
continue;
}
} else {//上次传输为结束则继续传输
//传输数据
this.lastWriteOver = this.transferData();
if (!this.lastWriteOver)
continue;
}
//获取从nextTransferFromWhere开始的commitlog数据
SelectMappedBufferResult selectResult =
HAConnection.this.haService.getDefaultMessageStore().getCommitLogData(this.nextTransferFromWhere);
if (selectResult != null) {
int size = selectResult.getSize();
//如果超过了getHaTransferBatchSize(默认32k)最多传输32k
if (size > HAConnection.this.haService.getDefaultMessageStore().getMessageStoreConfig().getHaTransferBatchSize()) {
size = HAConnection.this.haService.getDefaultMessageStore().getMessageStoreConfig().getHaTransferBatchSize();
}
long thisOffset = this.nextTransferFromWhere;
this.nextTransferFromWhere += size;
selectResult.getByteBuffer().limit(size);
this.selectMappedBufferResult = selectResult;
// Build Header
this.byteBufferHeader.position(0);
this.byteBufferHeader.limit(headerSize);
this.byteBufferHeader.putLong(thisOffset);
this.byteBufferHeader.putInt(size);
this.byteBufferHeader.flip();
//传输数据
this.lastWriteOver = this.transferData();
} else {
//若没有获取到commitlog数据则等待应用层追加
HAConnection.this.haService.getWaitNotifyObject().allWaitForRunning(100);
}
} catch (Exception e) {
HAConnection.log.error(this.getServiceName() + " service has exception.", e);
break;
}
}
HAConnection.this.haService.getWaitNotifyObject().removeFromWaitingThreadTable();
if (this.selectMappedBufferResult != null) {
this.selectMappedBufferResult.release();
}
this.makeStop();
readSocketService.makeStop();
haService.removeConnection(HAConnection.this);
SelectionKey sk = this.socketChannel.keyFor(this.selector);
if (sk != null) {
sk.cancel();
}
try {
this.selector.close();
this.socketChannel.close();
} catch (IOException e) {
HAConnection.log.error("", e);
}
HAConnection.log.info(this.getServiceName() + " service end");
}
3.2.2、MASTER消息接收是唤醒HA相关的线程
消息发送时会通知HA
public void handleHA(AppendMessageResult result, PutMessageResult putMessageResult, MessageExt messageExt) {
//同步复制
if (BrokerRole.SYNC_MASTER == this.defaultMessageStore.getMessageStoreConfig().getBrokerRole()) {
HAService service = this.defaultMessageStore.getHaService();
if (messageExt.isWaitStoreMsgOK()) {
// Determine whether to wait
//觉得是否等待,如果slave和master之间的offset的差值超过一定值,则不再同步,返回SLAVE_NOT_AVAILABLE
if (service.isSlaveOK(result.getWroteOffset() + result.getWroteBytes())) {
//组装request
GroupCommitRequest request = new GroupCommitRequest(result.getWroteOffset() + result.getWroteBytes());
service.putRequest(request);
//唤醒WriteSocketService,等待commitlog追加
service.getWaitNotifyObject().wakeupAll();
//线程在request上wait
boolean flushOK =
request.waitForFlush(this.defaultMessageStore.getMessageStoreConfig().getSyncFlushTimeout());
//若flushOK是false则同步失败
if (!flushOK) {
log.error("do sync transfer other node, wait return, but failed, topic: " + messageExt.getTopic() + " tags: "
+ messageExt.getTags() + " client address: " + messageExt.getBornHostNameString());
putMessageResult.setPutMessageStatus(PutMessageStatus.FLUSH_SLAVE_TIMEOUT);
}
}
// Slave problem
else {
// Tell the producer, slave not available
putMessageResult.setPutMessageStatus(PutMessageStatus.SLAVE_NOT_AVAILABLE);
}
}
}
}
GroupTransferService查看其业务处理
public void run() {
log.info(this.getServiceName() + " service started");
while (!this.isStopped()) {
try {
this.waitForRunning(10);
this.doWaitTransfer();
} catch (Exception e) {
log.warn(this.getServiceName() + " service has exception. ", e);
}
}
log.info(this.getServiceName() + " service end");
}
private void doWaitTransfer() {
synchronized (this.requestsRead) {
if (!this.requestsRead.isEmpty()) {
for (CommitLog.GroupCommitRequest req : this.requestsRead) {
//比较两个offset如果push2SlaveMaxOffset>=req.getNextOffset()则transferOK为true
boolean transferOK = HAService.this.push2SlaveMaxOffset.get() >= req.getNextOffset();
for (int i = 0; !transferOK && i < 5; i++) {
this.notifyTransferObject.waitForRunning(1000);
transferOK = HAService.this.push2SlaveMaxOffset.get() >= req.getNextOffset();
}
if (!transferOK) {
log.warn("transfer messsage to slave timeout, " + req.getNextOffset());
}
//唤醒写线程
req.wakeupCustomer(transferOK);
}
this.requestsRead.clear();
}
}
}
GroupTransferService负责主从同步复制结束后通知由于等待HA同步结果的而阻塞的消息发送者线程。判断主从同步是否完成的依据是Slave中已成功复制的最大消息偏移量是否大于等于消息生产者发送消息后消息服务端返回的下一条消息的起始偏移量。如果大于等于说明主从同步完成,否则等待1秒后继续检查,每一批任务中循环5次加上初始的一次一共6次。
4、读写分离
那我们就从消息的消费来开始分析,消息从主节点还会从节点消费,以及他们之间的机制,什么条件下从主节点拉取,满足什么条件从从节点消费,我们知道同一组的Master-Slave的broker服务器上的BrokerName相同,brokerId不同,主服务器的brokerId=0,从服务器的brokerId>0。我们知道消息的消费有两种方式主动拉取和被动推送,但是被动推送内部机制还是拉取,我们就分析下消息拉取的源码,分析其内部机制。
4.1、Producer端发送请求
DefaultMQPullConsumer.pullBlockIfNotFound(MessageQueue, String, long, int),发现其核心方法是DefaultMQPullConsumerImpl.pullSyncImpl()内部的PullAPIWrapper.pullKernelImpl(),这是客户端调用broker拉取消息的核心方法。
public PullResult pullKernelImpl(...) {
//获取消费消息的broker是master还是slave
FindBrokerResult findBrokerResult =
this.mQClientFactory.findBrokerAddressInSubscribe(mq.getBrokerName(),
this.recalculatePullFromWhichNode(mq), false);
if (null == findBrokerResult) {
this.mQClientFactory.updateTopicRouteInfoFromNameServer(mq.getTopic());
findBrokerResult =
this.mQClientFactory.findBrokerAddressInSubscribe(mq.getBrokerName(),
this.recalculatePullFromWhichNode(mq), false);
}
if (findBrokerResult != null) {
//省略代码...封装请求头
//调用broker获取消息
PullResult pullResult = this.mQClientFactory.getMQClientAPIImpl().pullMessage(
brokerAddr,
requestHeader,
timeoutMillis,
communicationMode,
pullCallback);
return pullResult;
}
}
查看获取Broker的方法
//获取broker端推荐的broker的id是从master还是slave中拉取消息
public long recalculatePullFromWhichNode(final MessageQueue mq) {
if (this.isConnectBrokerByUser()) {
return this.defaultBrokerId;
}
AtomicLong suggest = this.pullFromWhichNodeTable.get(mq);
if (suggest != null) {
return suggest.get();
}
return MixAll.MASTER_ID;
}
我们可能有疑问pullFromWhichNodeTable从哪获取的,一般这种建议应该在服务端给的,我们查看一下接收到的消息的处理逻辑DefaultMQPullConsumerImpl.pullSyncImpl()内部的PullAPIWrapper.pullKernelImpl()接收到broker消息后对其进行处理,MQClientAPIImpl.processPullResponse(),封装成PullResultExt对象;进行返回PullAPIWrapper.processPullResult()中有个方法updatePullFromWhichNode(),更新从哪个节点获取。
4.2、Broker接收客户端请求
PullMessageProcessor.processRequest()是接收客户端处理方法中核心拉取消息的方法是DefaultMessageStore.getMessage()中
long diff = maxOffsetPy - maxPhyOffsetPulling;
long memory = (long) (StoreUtil.TOTAL_PHYSICAL_MEMORY_SIZE
* (this.messageStoreConfig.getAccessMessageInMemoryMaxRatio() / 100.0));
getResult.setSuggestPullingFromSlave(diff > memory);
maxOffsetPy:当前主服务器消息存储的文件的最大偏移量
maxPhyOffsetPulling:此次拉取的消息的最大偏移量
diff:未被消费端消费消息的长度
TOTAL_PHYSICAL_MEMORY_SIZE:RocketMQ所在服务器总内存的大小,accessMessageInMemoryMaxRatio模式40%表示RocketMQ最大使用的内存比例,超过该内存,消息被置换出内存memory表示常驻内存大小,超过该内存消息被存储至磁盘。
diff > memory:说明当前未消费的消息长度已经超过了常驻内存的大小,表示主服务繁忙,此时建议从从服务其拉取
PullMessageProcessor.processRequest()中消息拉取的后返回给client时封装下次建议拉取的brokerId
if (getMessageResult.isSuggestPullingFromSlave()) {
responseHeader.setSuggestWhichBrokerId(subscriptionGroupConfig.getWhichBrokerWhenConsumeSlowly());
} else {
responseHeader.setSuggestWhichBrokerId(MixAll.MASTER_ID);
}
主服务器繁忙建议从从服务器获取消息设置SuggestWhichBrokerId属性,默认是1,如果一个Master有多个slave只会从一台上拉取消息。
RocketMQ读写分离是根据主服务的负载压力与主从同步情况,向客户端建议从主服务还是从服务器拉取消息。