一、1.11.0版本及以前
以前的方式是先编译flink-shaded-hadoop这个包,将hadoop和hive指定你对应生产的版本编译出flink-shaded-hadoop-2-uber_xxx包,然后将这个包放在lib的目录下,flink启动任务的时候去lib加载。
想用这种方式可以参考两个链接:
https://blog.csdn.net/weixin_44628586/article/details/107106547
https://blog.csdn.net/guiyifei/article/details/109325980#comments_14400773
flink-shade官网源码地址:https://github.com/apache/flink-shaded
二、1.11.0版本以后
Flink官方为了让Flink变得Hadoop Free,现在能支持hadoop2和hadoop3,同时可以指定不同的Hadoop环境。
为了达到这一目标,通过设置export HADOOP_CLASSPATH=hadoop classpath
即可,不用编译flink-shaded包。
重点编译好的Flink的jar里面是没有包含Hadoop和Hive的代码。当Flink任务启动的时候,JM和TM都是通过HADOOP_CLASSPATH环境变量获取Hadoop的相关变量。
刚开始小菜鸡以为是
hadoop classpath
只是某个随便写的某个路径,后面多亏了渣渣瑞普及小白知识,``里面是命令,之前记得后面给忘了,所以hadoop classpath是个命令,执行完之后会看到hadoop所依赖的环境变量:
[yujianbo@qzcs86 ~]$ hadoop classpath
/etc/hadoop/conf:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/libexec/../../hadoop/lib/:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/libexec/../../hadoop/.//:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/libexec/../../hadoop-hdfs/./:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/libexec/../../hadoop-hdfs/lib/:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/libexec/../../hadoop-hdfs/.//:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/.//:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/libexec/../../hadoop-yarn/lib/:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/libexec/../../hadoop-yarn/.//*
具体来处可以参考:
社区的信箱:http://apache-flink.147419.n8.nabble.com/flink-shaded-hadoop-2-uber-td9345.html
1.12官网的依赖Yarn的准备以及Hadoop的版本支持都有提及:
https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/resource-providers/yarn.html#preparation
https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/resource-providers/yarn.html#supported-hadoop-versions1.11版本官网Hadoop集成:
https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/hadoop.html#providing-hadoop-classes
这个链接里面有这么一段话就可以说明
Flink will use the environment variable HADOOP_CLASSPATH to augment the classpath that is used when starting Flink components such as the Client, JobManager, or TaskManager. Most Hadoop distributions and cloud environments will not set this variable by default so if the Hadoop classpath should be picked up by Flink the environment variable must be exported on all machines that are running Flink components.
Flink将使用环境变量HADOOP CLASSPATH来扩展启动Flink组件(如客户机、JobManager或TaskManager)时使用的类路径。大多数Hadoop发行版和云环境在默认情况下不会设置这个变量,因此,如果应该由Flink获取Hadoop类路径,则必须在运行Flink组件的所有机器上导出环境变量。