内部维护一个zookeeper集群,近期有用户反馈集群不稳定,导致业务侧有波动。根据用户提供的IP定位到异常并提供了解决方案。
具体异常:
2018-03-20 23:34:01,887 [myid:99] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.121.82.229:33749
2018-03-20 23:34:01,887 [myid:99] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@793] - Connection request from old client /10.121.82.229:33749; will be dropped if server is in r-o mode
2018-03-20 23:34:01,887 [myid:99] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@839] - Client attempting to establish new session at /10.121.82.229:33749
2018-03-20 23:34:01,890 [myid:99] - INFO [CommitProcessor:99:ZooKeeperServer@595] - Established session 0x6362257b44e5068d with negotiated timeout 10000 for client /10.121.82.229:33749
2018-03-20 23:34:21,859 [myid:99] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x6362257b44e5068d, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2018-03-20 23:34:21,860 [myid:99] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /10.121.82.229:33749 which had sessionid 0x6362257b44e5068d
核心报错信息:
EndOfStreamException: Unable to read additional data from client sessionid 0x6362257b44e5068d, likely client has closed socket
具体问题所在:
客户端连接Zookeeper时,配置的超时时长过短。致使Zookeeper还没有读完Consumer的数据,连接就被Consumer断开了。
解决方案:
初始化Zookeeper连接时,将接收超时参数值调整大一些即可,默认是毫秒(ms)
在C++中,在设置第三个参数recv_timeout
时,设置大一些,比如10000ms就可以解决这里的问题。
zhandle_t *zookeeper_init(const char *host, watcher_fn fn, int recv_timeout, const clientid_t *clientid, void *context, int flags)