Apache Hadoop 3.0.0-alpha1 初识

Hadoop 3.0 在之前的 2.X 版本上做出了很多重要的改进。由于这只是一个测试版本,目前不能保证它的任何特性和效率。

Apache Hadoop 3.0.0-alpha1 incorporates a number of significant enhancements over the previous major release line (hadoop-2.x).

This is an alpha release to facilitate testing and the collection of feedback from downstream application developers and users. There are no guarantees regarding API stability or quality.

Overview 概述

Minimum required Java version increased from Java 7 to Java 8 请至少使用Java 8

Hadoop中使用的Jar包只能与Java 8 兼容,请务必升级。

Support for erasure encoding in HDFS

在Hdfs中支持纠删码

Erasure coding is a method for durably storing data with significantspace savings compared to replication. Standard encodings like Reed-Solomon (10,4) have a 1.4x space overhead, compared to the 3x overhead of standard HDFS replication.

纠删码是用于保证数据的可靠存储,并且相对于简单的复制,只需更少的空间。标准的纠删码只需要1.4倍的存储空间,相比于3倍的HDFS副本。

Since erasure coding imposes additional overhead during reconstruction and performs mostly remote reads, it has traditionally been used for storing colder, less frequently accessed data. Users should consider the network and CPU overheads of erasure coding when deploying this feature.

由于纠删码在使用中会有多余的开销,而且多用于远程读取。因此它通常用于存储不常用的数据。用户应该考虑到这些多余的开销对网络的带宽和CPU性能的影响。

具体问题可以参考Apache Hadoop 3.0.0-alpha1 – HDFS Erasure Coding


Shell script rewrite

Shell 脚本重写

The Hadoop shell scripts have been rewritten to fix many long-standing bugs and include some new features. While an eye has been kept towards compatibility, some changes may break existing installations. Incompatible changes are documented in the release notes, with related discussion onHADOOP-9902. More details are available in theUnix Shell Guidedocumentation. Power users will also be pleased by theUnix Shell APIdocumentation, which describes much of the new functionality, particularly related to extensibility.

Hadoop的Shell脚本已经重写,添加了很多功能,并修改了很多遗留已久的Bug。一方面,强调兼容性问题,另一方面也打破了很多界限。具体可以参考Unix shell 指导 Apache Hadoop 3.0.0-alpha1 – Unix Shell Guide 


MapReduce task-level native optimization

Mapreduce 任务级别优化

MapReduce has added support for a native implementation of the map output collector. For shuffle-intensive jobs, this can lead to a performance improvement of 30% or more.

Mapreduce增加了对map结果管理的支持。对于混洗任务较多的job,这样可以提升30%左右的性能。


Support for more than 2 NameNodes

支持超过两个NameNode

The initial implementation of HDFS NameNode high-availability provided for a single active NameNode and a single Standby NameNode. By replicating edits to a quorum of three JournalNodes, this architecture is able to tolerate the failure of any one node in the system.

最初的HDFS高可用性包括了一个NameNode节点和一个备用NameNode节点。若将被选节点增大到3个,那么这一个架构将保证在任意一个节点故障的时候,整个系统依然能够正常运行。

However, some deployments require higher degrees of fault-tolerance. This is enabled by this new feature, which allows users to run multiple standby NameNodes. For instance, by configuring three NameNodes and five JournalNodes, the cluster is able to tolerate the failure of two nodes rather than just one.

但是,一些部署要求更高的可靠性。为了实现这个功能,我们可以增加备用NameNode的数量。例如,若将NN数量设为3,JN数量设为5,这个集群可以承受两个Node宕机,而不是一个。


Default ports of multiple services have been changed.

服务的默认端口有变

Previously, the default ports of multiple Hadoop services were in the Linux ephemeral port range (32768-61000). This meant that at startup, services would sometimes fail to bind to the port due to a conflict with another application.

之前,Hadoop服务的临时端口在32768和61000之间。这也带来了端口冲突的隐患。

These conflicting ports have been moved out of the ephemeral range, affecting the NameNode, Secondary NameNode, DataNode, and KMS. Our documentation has been updated appropriately, but see the release notes forHDFS-9427andHADOOP-12811for a list of port changes.

这些可能待考冲突的端口已经被排除了临时端口的范围,对NN,SNN,DN和KMS带来影响。具体的变化可以参考相关文档。


Support for Microsoft Azure Data Lake filesystem connector

支持连接到Azure数据湖文件系统

Hadoop now supports integration with Microsoft Azure Data Lake as an alternative Hadoop-compatible filesystem.

现在可以把Azure data lake 文件系统作为Hadoop 文件系统的备用。


Intra-datanode balancer

DN节点间负载均衡

A single DataNode manages multiple disks. During normal write operation, disks will be filled up evenly. However, adding or replacing disks can lead to significant skew within a DataNode. This situation is not handled by the existing HDFS balancer, which concerns itself with inter-, not intra-,DN skew.

若一个单独的DN使用多快硬盘。在正常的写操作中,硬盘将被均匀的读写。但是,增加或移出硬盘将会给一个DN带来数据倾斜。这个问题不在HDFS均衡器的任务范围内,HDFS主要关心DN之间的数据倾斜。

This situation is handled by the new intra-DataNode balancing functionality, which is invoked via thehdfs diskbalancerCLI. See the disk balancer section in theHDFS Commands Guidefor more information.

这个问题已经被新的DN节点内均衡机制解决。这个任务由hdfs diskbalancerCLI 触发。具体的命令参见HDFS 命令行。


Reworked daemon and task heap management

重构后台进程和堆栈管理

A series of changes have been made to heap management for Hadoop daemons as well as MapReduce tasks.

HADOOP-10950introduces new methods for configuring daemon heap sizes. Notably, auto-tuning is now possible based on the memory size of the host, and theHADOOP_HEAPSIZEvariable has been deprecated. See the full release notes of HADOOP-10950 for more detail.

使用新的方法管理后台任务的堆栈大小。尤其是目前自适应可以基于内存的大小。

MAPREDUCE-5785simplifies the configuration of map and reduce task heap sizes, so the desired heap size no longer needs to be specified in both the task configuration and as a Java option. Existing configs that already specify both are not affected by this change. See the full release notes of MAPREDUCE-5785 for more details.

简化了Mapreduce任务的配置,无须指定Heap大小。

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 204,793评论 6 478
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 87,567评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 151,342评论 0 338
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,825评论 1 277
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,814评论 5 368
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,680评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,033评论 3 399
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,687评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 42,175评论 1 300
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,668评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,775评论 1 332
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,419评论 4 321
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,020评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,978评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,206评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,092评论 2 351
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,510评论 2 343

推荐阅读更多精彩内容

  • 首先,我们在使用前先看看HDFS是什麽?这将有助于我们是以后的运维使用和故障排除思路的获得。 HDFS采用mast...
    W_Bousquet阅读 4,171评论 0 2
  • HDFS的设计目标 通过上一篇文章的介绍我们已经了解到HDFS到底是怎样的东西,以及它是怎样通过多副本机制来提供高...
    陌上疏影凉阅读 1,437评论 0 3
  • 前几天去辅导机构兼职,一群马上要小中考的初中生在做地生模拟试题,我负责监考。 机构临时调整,发卷时间比预定时间晚了...
    小玫姑娘阅读 257评论 2 2
  • smalltail阅读 252评论 0 0
  • 1 人是喜群的,但他往往在人群中感到不可堪的寂寞,有如在庙会时挤在潮水般的人丛里,特别像一片树叶,与一切绝缘而孤立...
    四叠半主义者阅读 529评论 0 0