Check pointing
Check pointing is Flink's backbone for providing consistent fault tolerance. It keeps on taking consistent snapshots for distributed data streams and executor states. It is inspired by the Chandy-Lamport algorithum but has been modified for Flink's tailored requirement.The details about the Chandy-Lamport algorithm can be found at: http://research.microsoft.com/en-us/um/people/lamport/pubs/chandy-pdf.
check pointing
是Flink为了实现容错的核心功能。它负责为分布式数据流(steam)和运行器状态拍照。它的灵感来源于Chandy-Lamport
算法,不过已经为了Flink的定制需求做一些修改。关于Chandy-Lamport
算法请参见论文。
The exact implementation details about snapshotting are provided in the following research paper: Lightiweight Asynchronous Snapshots for Distributed Dataflows
(http://arxiv.org/ab:/1506.08603)
关于快照的确切实现请参考以下论文(针对于分布式数据流的轻量级异步快照)
The fault-tolerant mechanism keeps on creating lightweight snapshots for the data flows .They therefore continue the functionality without any significant over-burden. Generally the state of the data flow is kept in a configured place such as HDFS
(这个容错机制会为数据流持续创建lightweight snapshots
,因此,它们会在没有重大负担的情况下继续运行它们的功能。通常情况下,这个数据流的状态的配置会被放在HDFS里)
In case of any failure, Flink stops the executors and resets them and starts executing from the latest available checkpoint
(在失败的情况下,Flink 停止exectors
并重置它们的状态,然后从最近可用的checkpoint
开始执行)
Stream barriers are core elements of Flink's snapshots. They are ingested into data streams without affecting the flow. Barriers never overtake the records. They group sets of records into a snapshot. Each barrier carries a unique ID. The following diagram shows how the barriers are injected into the data stream for snapshots:
Stream barriers
是Flink 快照的核心项。它们被嵌入到数据流中,并不会对流产生任何影响。barriers
不会超过records.它们将记录集分成一组快照。每个barrier 带一个unique ID.
下图显示了barriers
为实现快照而将barriers
嵌入到data stream
中。
Each snapshot state is reported to the Flink Job Manager's checkpoint coordinator. While drawing snapshots, Flink handles the alignment of records in order to avoid re-processing the same records because of any failure. This alignment generally takes some milliseconds.But for some intense applications, where even millisecond latency is not acceptable, we have an option to choose low latency over exactly a single record processing. By default Flink processes each record exactly once. If any application needs low latency and is fine with at least a single delivery, we can switch off that trigger. This will skip the alignment and will improve the latency.
( 每个snapshot
状态都被上报至Flink Job Manager的checkpoint
协调器中。画快照时,Flink 处理记录的对齐以避免因为失败而导致的相同的记录被重复处理。这个对齐通常会花费几毫秒时间。但是对于一些对延迟反应强烈的应用程序来讲,这也是无法接受的,我们提供一个选项可以在一个确切的记录上选择是否开启低延迟。默认的Flink
处理每条记录都是exactly once
。如果应用程序需要低延迟并且在at least
分发的情况下也能很好的工作,那么可以关闭这个触发器。那么将跳过对齐并会降低延迟(提高性能)。)
Task manager
Task managers are worker nodes that execute the tasks in one or more threads in JVM. Parallelism of task execution is determined by the task slots available on each Task Manager. Each task
represents a set of resources allocated to the task slot. For example, if a Task Manager has four slots then it will allocate 25% of the memory to each slot. There could be one or more threads running in a task slot. Threads in the same slot share the same JVM. Tasks in the same JVM share TCP connections and heart beat messages:
(Task manager
就是工作节点,这些节点在JVM中以单线程或在多线程模式运行。任务运行的并行性由每个Task Manager
中的可用slots
来确定的。每个任务代表着分配给task slot
的一组资源(译者注:译者觉得书中描述与官方文档描述不太一致。 flink 官方解释:Each task slot represents a fixed subset of resources of the TaskManager.
A task in Flink is the basic unit of execution
https://ci.apache.org/projects/flink/flink-docs-release-1.7/internals/job_scheduling.html)。举个例子:如果Task Manager
有4个slots,那么每个slot
将分配 25%内存。可以有一个或多个线程运行在task slot
中。在相同slot
中的多个线程共享JVM,在相同JVM中的任务共享TCP 连接和心跳消息。)
Job client
The Job client is not an internal part of Flink's program execution but it is the starting point of the execution. The Job client is responsible for accepting the program from the user and then creating a data flow and then submitting the data flow to the Job Manager for further execution. Once the execution is completed, the job client provides the results back to the user.
(Job Client
不在Flink的程序内部运行,但它是程序的执行的起点。Job Client
负责接受用户的程序然后创建data flow
,然后提交data flow
给job manager
以进一步执行。一旦程序执行结束。job client
向用户返回结果。)
A data flow is a plan of execution. Consider a very simple word count program:
(data flow
就是一个执行计划(译者注:与steam是不同的,steam 指具体的数据),下面是一个非常简单的word count
程序)
var text=env.readTextFile("input.txt") //Source
var counts=text.flatMap{_.toLowerCase.split("\\W+") fliter{_.notEmpty}}
.map{(_.1)}
.groupBy(0)
.sum(1) //Transformation
counts.writeAsCsv("output.txt","\n"," ") //Sink
When a client accepts the program from the user, it then transforms it into a data flow. The Data flow for the aforementioned program may look like this:
(当client
从用户接收到程序时,会被转换为data flow
(执行计划),那么上述的data flow
看起来象这个样子:)
The preceding diagram shows how a program gets transformed into a data flow. Flink data flows are parallel and distributed by default. For parallel data processing, Flink partitions the operators and streams. Operator partitions are called sub-tasks. Streams can distribute the data in a one-to-one or a re-distributed manner.The data flows directly from the source to the map operators as there is no need to shuffle the data. But for a GroupBy operation Flink may need to redistribute the data by keys in order to get the correct results:(上图展示了一个程序转换为data flow
。Flink data flow
默认是并行的并且是分布式的。对于并行的数据处理,Flink对operators
和streams
进行分区。Operator
分区叫sub-tasks
。Streams
可以以一对一或重分布的方式分布数据。)
The data flows directly from the source to the map operators as there is no need to shuffle the data. But for a GroupBy Operation Flink may need to redistribute the data by keys in order to get the correct results:
(data flow
可以直接从source
映射到operators
,因此不需要shuffle
数据。但对于GroupBy
操作,Flink也许需要通过key redistribute
数据,以便获取正确的结果。)
Features
In the earlier sections, we tried to understand the Flink architecture and its execution model. Because of its robust architecture, Flink is full of various features.
(前几节,我们已经理解Flink的架构和执行模型。因为它的鲁棒架构,Flink具有多种特性。)
High performance
Flink is designed to achieve high performance and low latency. Unlike other streaming frameworks such as Spark, you don't need to do many manual configurations to get the best performance. Flink's pipelined data processing gives better performance compared to its counterparts.
(Flink 被设计成具有高性能和低延迟的架构。不象其他的流处理框架(比如spark),你不需要手动配置获得最佳的性能。Flink的pipelined 数据处理比其他流处理框架(spark streamming)具有更好的性能。)
Exactly-once stateful computation
As we discussed in the previous section, Flink's distributed checkpoint processing helps to guarantee processing each record exactly once. In the case of high-throughput applications, Flink provides us with a switch to allow at least once processing.
( 我们上一节已经讨论过,Flink 分布式的checkpoint
处理有助于保证每条记录处理的exactly once
,那么在高吞吐的应用程序中,Flink提供允许我们以at least
的方式处理的选项。)
Flexible streaming windows
Flink supports data-driven windows. This means we can design a window based on time, counts, or sessions. A window can also be customized which allows us to detect specific pattens in event streams.
(Flink 支持data-driver
的窗口。这意味着我们可以设计一个基于时间,计数或会话的窗口。一个窗口可以被定制,它允许我们检测事件流中的特定模式。)
Fault tolerance
Flink's distributed, lightweight snapshot mechanism helps in achieving a great degree of fault tolerance. It allows Flink to provide high-throughput performance with guaranteed delivery.(Fink的分布式的,轻量级的快照机制有助于得到最好的容错性。它允许Flink在保证分发的情况下具有高吞吐量。)
Memory management
Flink is supplied with its own memory management inside a JVM which makes it independent of Java's default garbage collector. It efficiently does memory management by using hashing, indexing, caching, and sorting.
(在JVM内部,Flink 提供它自己的内存管理,这使得它独立于JAVA默认的GC.用hashing,indexing caching 和sorting 高效地对内存进行管理)
Optimizer
Flink's batch data processing API is optimized in order to avoid memory-consuming operations such as shuffle, sort, and so on. It also makes sure that caching is used in order to avoid heavy disk IO operations.
(Flink的批处理API是被优化过的,以便可以避免内存消耗的操作,比如shuffle
sort
等。它保证使用缓存以避免大量的磁盘IO操作。)
Stream and batch in one platform
Flink provides APIs for both batch and stream data processing. So once you set up the Flink environment, it can host stream and batch processing applications easily. In fact Flink works on Streaming first principle and considers batch processing as the special case of streaming.
(Flink提供的API同时支持批处理和流处理。所以一旦你安装 了Flink环境,它可以容易地同时承载流和批处理应用程序。事实上,Flink是以流优先原则工作的,而将批处理看作是特殊的流。)
Libraries
Flink has a rich set of libraries to do machine learning, graph processing, relational data processing, and so on. Because of its architecture, it is very easy to perform complex event processing and alerting. We are going to see more about these libraries in subsequent chapters.
(Flink 有非常丰富的包来支持机器学习,图处理,关系型数据处理等待。因为它的架构,它是很容易去完成复杂的事件处理和警告。我们将在后续章节中看到更多关于这些包的介绍)
Event time semantics
Flink supports event time semantics. This helps in processing streams where events arrive out of order. Sometimes events may come delayed. Flink's architecture allows us to define windows based on time, counts, and sessions, which helps in dealing with such scenarios.
(Flink 支持event time
语义。这有帮我们处理以乱序到达的流。有时事件可能会延迟到达。Flink 架构允许我们定义基于时间,计数和会话的窗口,这些窗口有助于处理上面说的这些场景。)