Hadoop与Spark权限问题

最近在项目中遇到了集群的权限问题，搞得有点头大，特意花了点时间研究一下hadoop和 spark的权限问题。
首先是hadoop的超级用户问题，官网是这么说的：

> The super-user is the user with the same identity as the NameNode process itself. Loosely, if you started the NameNode, then you are the super-user. The super-user can do anything in that permissions checks never fail for the super-user. There is no persistent notion of who was the super-user; when the NameNode is started the process identity determines who is the super-user for now. The HDFS super-user does not have to be the super-user of the NameNode host, nor is it necessary that all clusters have the same super-user. Also, an experimenter running HDFS on a personal workstation, conveniently becomes that installation’s super-user without any configuration.
In addition, the administrator may identify a distinguished group using a configuration parameter. If set, members of this group are also super-users.

- 简单来说就是谁启动了namenode谁就是超级用户，管理员可以使用配置参数来标识一个不同的组。如果设置，这个组的成员也是超级用户。
- 知道了谁是超级用户，再来说一下普通用户的问题，我们再写代码的时候可能会出现这个问题：

1、当使用Kerberos模式时，从Kerberos获取。
2、非Kerberos模式时，优先获取System.getenv(HADOOP_USER_NAME)。
3、然后是System.getproperties(HADOOP_USER_NAME)。
4、最后获取的是当前系统用户。
所以，如果我们需要“伪装”自己是hdfs，只需要在代码中写上“System.setproperties(“HADOOP_USER_NAME”,”hdfs”);”就Ok啦！

我这里使用的lch用户提交任务，System.getenv(“SPARK_USER”)为lch。
但是我需要用hdfs用户来跑Spark任务，我们能不能“伪装”成hdfs呢？
答案是：可以的。
我们可以在脚本中export HADOOP_USER_NAME=hdfs；

当然，在脚本中直接切换用户肯定也是可以的啦。如果你能使用hdfs用户来提交任务就更好了啦(^▽^)。
对了，在解决问题的时候看到网上有很多“资料”说在代码中设置System.setproperty(“user.name”,”hdfs”)可以切换Sparkuser，我还高兴了一小会，可是写上之后，发现，没有变啊，哪里切换了，真的是醉醉的。
没办法了还是得靠自己，啃源码吧！
在SecurityManager中确实是会获取System.setproperty(“user.name”,””)

然后把defaultUsers,adminAcls,allowedUsers统一成viewAcls而已，那这个viewAcls是干什么的呢?
这是一个读取访问控制列表（View Access Control Lists），用于其他需要做权限校验读取权限时调用。
同样的还有modifyAcls修改访问控制列表（Modify Access Control Lists），用于其他需要做权限校验读修改权限时调用。
跟我们Spark_user真的关系不大呀o(╥﹏╥)o。

最后编辑于：2019.08.24 15:27:49

Hadoop与Spark权限问题

推荐阅读更多精彩内容