Impala主要有三个组件,分别是statestore,catalog和impalad,对于Impalad节点,每⼀个节点都可以接收客户端的查询请求,并且对于连接到该Impalad的查询还要作为Coordinator节点(需要消耗⼀定的内存和CPU)存在,为了保证每⼀个节点的资源开销的平衡需要对于集群中的Impalad节点做⼀下负载均衡.
本文使用Cloudera官⽅推荐的代理⽅案:HAProxy;
⽣产中应该选择⼀个⾮Impalad节点作为HAProxy的安装节点
具体步骤
- 安装haproxy
yum install haproxy -y
-
配置⽂件
vim /etc/haproxy/haproxy.cfg
具体配置内容
#---------------------------------------------------------------------
# Example configuration for a possible web application. See the
# full configuration options online.
#
# http://haproxy.1wt.eu/download/1.4/doc/configuration.txt
#
#---------------------------------------------------------------------
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
log 127.0.0.1 local2
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon
# turn on stats unix socket
stats socket /var/lib/haproxy/stats
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
mode http#mode { tcp|http|health },tcp 表示4层,http表示7层,health仅作为健康检查使⽤
log global
option httplog
option dontlognull
#option http-server-close
#option forwardfor except 127.0.0.0/8
#option abortonclose##连接数过⼤⾃动关闭
option redispatch#如果失效则强制转换其他服务器
retries 3#尝试3次失败则从集群摘除
timeout http-request 10s
timeout queue 1m
#timeout connect 10s
#timeout client 1m
#timeout server 1m
timeout connect 1d#连接超时时间,重要,hive查询数据能返回结果的保证
timeout client 1d#同上
timeout server 1d#同上
timeout http-keep-alive 10s
timeout check 10s#健康检查时间
maxconn 3000#最⼤连接数
listen status#定义管理界⾯
bind 0.0.0.0:1080#管理界⾯访问IP和端⼝
mode http#管理界⾯所使⽤的协议
option httplog
maxconn 5000#最⼤连接数
stats refresh 30s#30秒⾃动刷新
stats uri /stats
listen impalashell
bind 0.0.0.0:25003#ha作为proxy所绑定的IP和端⼝
mode tcp#以4层⽅式代理,重要
option tcplog
balance roundrobin#调度算法 'leastconn' 最少连接数分配,或者 'roundrobin',轮询分
server impalashell_1 linux121:21000 check
server impalashell_2 linux122:21000 check
server impalashell_3 linux123:21000 check
listen impalajdbc
bind 0.0.0.0:25004#ha作为proxy所绑定的IP和端⼝
mode tcp#以4层⽅式代理,重要
option tcplog
balance roundrobin #调度算法 'leastconn' 最少连接数分配,或者 'roundrobin',轮询分
server impalajdbc_1 linux121:21050 check
server impalajdbc_2 linux122:21050 check
server impalajdbc_3 linux122:21050 check
#---------------------------------------------------------------------
# main frontend which proxys to the backends
#---------------------------------------------------------------------
frontend main *:5000
acl url_static path_beg -i /static /images /javascript/stylesheets
acl url_static path_end -i .jpg .gif .png .css .js
use_backend static if url_static
default_backend app
#---------------------------------------------------------------------
# static backend for serving up images, stylesheets and such
#---------------------------------------------------------------------
backend static
balance roundrobin
server static 127.0.0.1:4331 check
#---------------------------------------------------------------------
# round robin balancing between the various backends
#---------------------------------------------------------------------
backend app
balance roundrobin
server app1 127.0.0.1:5001 check
server app2 127.0.0.1:5002 check
server app3 127.0.0.1:5003 check
server app4 127.0.0.1:5004 check
-启动
开启: service haproxy start
关闭: service haproxy stop
重启: service haproxy restart
可能会启动失败,遇到Starting proxy status: cannot bind socket [0.0.0.0:1080]问题
解决方法如下:
执行语句
setsebool -P haproxy_connect_any=1
- 使用
impala-shell -i hadoop3:25003