Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data S...

摘要

Apache Calcite is a foundational software framework that provides query processing, optimization, and query language support to many popular open-source data processing systems such as Apache Hive, Apache Storm, Apache Flink, Druid, and MapD. Calcite’s architecture consists of a modular and extensible query optimizer
with hundreds of built-in optimization rules, a query processor capable of processing a variety of query languages, an adapter architecture designed for extensibility, and support for heterogeneous data models and stores (relational, semi-structured, streaming, and geospatial). This flexible, embeddable, and extensible architecture is what makes Calcite an attractive choice for adoption in bigdata frameworks. It is an active project that continues to introduce
support for the new types of data sources, query languages, and approaches to query processing and optimization

Apache Calcite是一个基础的框架,它提供查询处理,优化器,拓展查询语言,这些拓展语言可以支持许多流行的开源数据处理系统,例如 Apache Hive, Apache Storm, Apache Flink, Druid, and MapD。Apache Calcite

主要组成部分

模块 描述
查询解析器 处理多样化查询语言
查询优化器 模块化,可拓展,包含几百个内建优化规则
适配器 可拓展,支持异构多种数据模型(关系型,半结构型,流式,地理hash)

关键词

Apache Calcite,关系语法,数据管理,查询关系代数,模块化优化器,存储适配器

1. 引言

Following the seminal System R, conventional relational database engines dominated the data processing landscape. Yet, as far back as 2005, Stonebraker and Çetintemel [49] predicted that we would see the rise a collection of specialized engines such as column stores,stream processing engines, text search engines, and so forth. Theyargued that specialized engines can offer more cost-effective performance and that they would bring the end of the “one size fits all” paradigm. Their vision seems today more relevant than ever.Indeed, many specialized open-source data systems have since become popular such as Storm [50] and Flink [16] (stream processing),
Elasticsearch [15] (text search), Apache Spark [47], Druid [14], etc.As organizations have invested in data processing systems tailored towards their specific needs, two overarching problems have
arisen:
• The developers of such specialized systems have encountered related problems, such as query optimization [4, 25]
or the need to support query languages such as SQL and related extensions (e.g., streaming queries [26]) as well as
language-integrated queries inspired by LINQ [33]. Without a unifying framework, having multiple engineers independently develop similar optimization logic and language support wastes engineering effort.
• Programmers using these specialized systems often have to integrate several of them together. An organization might rely on Elasticsearch, Apache Spark, and Druid. We need to build systems capable of supporting optimized queries across heterogeneous data sources [55].
Apache Calcite was developed to solve these problems. It isa complete query processing system that provides much of the common functionality—query execution, optimization, and query languages—required by any database management system, except for data storage and management, which are left to specialized engines. Calcite was quickly adopted by Hive, Drill [13], Storm, and many other data processing engines, providing them with advanced query optimizations and query languages.1 For example,Hive [24] is a popular data warehouse project built on top of ApacheHadoop. As Hive moved from its batch processing roots t owards an interactive SQL query answering platform, it became clear that the project needed a powerful optimizer at its core. Thus, Hive adopted Calcite as its optimizer and their integration has been growing since.Many other projects and products have followed suit, including Flink, MapD [12], etc.
Furthermore, Calcite enables cross-platform optimization by exposing a common interface to multiple systems. To be efficient, the optimizer needs to reason globally, e.g., make decisions across different systems about materialized view selection.

看到一系列专业方向引擎,例如列数据库,流处理引擎,文本搜索引擎等等,将会逐步发展起来,结束“one size fits all” 的时代

随着组织为了满足特定的诉求开发数据处理系统时候,出现了两个难题
● 定制系统的开发工程师几乎遇到了相同类似的问题,例如查询优化器[4,25],或者查询语言(SQL[26])和拓展语言(流式查询,集成语言的查询LINQ[33])的支持,如果没有一个统一的框架,许多工程师各自开发了一套相似的优化逻辑和语言支持,就会浪费大量的精力,不可复用。
● 用定制数据系统的开发者经常不得不把他们集成在一起,一个团队也许依赖Elasticsearch, Apache Spark, and Druid,我们需要构建跨越异构数据源支持优化查询的能力

开发Apache Calcite是为了解决以上的问题,它是一个完整的查询处理系统,提供许多核心的通用能力,比如查询执行器,优化器,查询语言,它不支持数据存储和数据管理,这些留给了定制数据系统各自去实现。

Building a common framework does not come without challenges. In particular, the framework needs to be extensible and flexible enough to accommodate the different types of systems requiring integration.We believe that the following features have contributed to Calcite’s wide adoption in the open source community and industry:

  • Open source friendliness. Many of the major data processing platforms of the last decade have been either open
    source or largely based on open source. Calcite is an opensource framework, backed by the Apache Software Foundation (ASF) [5], which provides the means to collaboratively develop the project. Furthermore, the software is written in Java making it easier to interoperate with many of the latest data processing systems [12, 13, 16, 24, 28, 44] that are often written themselves in Java (or in the JVM-based Scala), especially those in the Hadoop ecosystem.
  • Multiple data models. Calcite provides support for query optimization and query languages using both streaming
    and conventional data processing paradigms. Calcite treats streams as time-ordered sets of records or events that are not persisted to the disk as they would be in conventional data processing systems.
  • Flexible query optimizer. Each component of the optimizer is pluggable and extensible, ranging from rules to cost
    models. In addition, Calcite includes support for multiple planning engines. Hence, the optimization can be broken down into phases handled by different optimization engines depending on which one is best suited for the stage.
  • Cross-system support. The Calcite framework can run and optimize queries across multiple query processing systems and database backends.
  • Reliability. Calcite is reliable, as its wide adoption over many years has led to exhaustive testing of the platform.
    Calcite also contains an extensive test suite validating all components of the system including query optimizer rules
    and integration with backend data sources.
  • Support for SQL and its extensions. Many systems do not provide their own query language, but rather prefer to rely
    on existing ones such as SQL. For those, Calcite provides support for ANSI standard SQL, as well as various SQL dialects and extensions, e.g., for expressing queries on streaming or nested data. In addition, Calcite includes a driver conforming to the standard Java API (JDBC). The remainder is organized as follows. Section

构造一个通用框架有很多挑战,这个框架需要是足够灵活的,可拓展的去支持不同系统的集成需求,下面的特性帮助Calcite被广大社区和生产环境使用:

  • 友好的开源氛围
  • 多种数据模型
  • 灵活的优化器
  • 跨系统支持
  • 可靠性
  • 支持SQL语法和可拓展

The remainder is organized as follows. Section 2 discusses related work. Section 3 introduces Calcite’s architecture and its main components. Section 4 describes the relational algebra at the core of Calcite. Section 5 presents Calcite’s adapters, an abstraction to define how to read external data sources. In turn, Section 6 describes Calcite’s optimizer and its main features, while Section 7 presents the extensions to handle different query processing paradigms. Section 8 provides an overview of the data processing systems already using Calcite. Section 9 discusses possible future extensions for the framework before we conclude in Section 10.

剩余部分组织如下
第2部分:讨论相关工作
第3部分:介绍Calcite的架构和它的主要组件
第4部分:介绍Calcite的核心-关系代数
第5部分:展示Calcite的适配器,定义了怎样读取外部数据源的一种抽象
第6部分:反过来描述Calcite优化器和它的核心功能
第7部分:处理不同查询解析的拓展
第8部分:统计和概览已经使用Calcite的引擎
第9部分:讨论未来框架可能的功能拓展
第10部分:结论


2. 相关工作

虽然Calcite现在被很多Hadoop生态系统的大数据分析引擎使用,但是它背后许多的技术点都不是新颖的,比如
查询优化的想法来源于

  • Volcano【The Volcano Optimizer Generator- Extensibility and Efficient Search】
  • Cascades【The Cascades Framework for Query Optimization】
    物化重写
  • 【Optimizing Queries with Materialized Views. In Proceedings of the Eleventh International Conference on Data Engineering】
  • 【Optimizing Queries Using Materialized Views: A Practical, Scalable Solution】
  • 【Implementing Data Cubes Efficiently

Though Calcite is currently the most widely adopted optimizer for big-data analytics in the Hadoop ecosystem, many of the ideas that lie behind it are not novel. For instance, the query optimizer builds on ideas from the Volcano [20] and Cascades [19] frameworks,incorporating other widely used optimization techniques such as materialized view rewriting [10, 18, 22]. There are other systems that try to fill a similar role to Calcite.
   Orca [45] is a modular query optimizer used in data management products such as Greenplum and HAWQ. Orca decouples the optimizer from the query execution engine by implementing a framework for exchanging information between the two known as Data eXchange Language. Orca also provides tools for verifying the correctness and performance of generated query plans. In contrast to Orca, Calcite can be used as a standalone query execution engine that federates multiple storage and processing backends, including pluggable planners, and optimizers.
   Spark SQL [3] extends Apache Spark to support SQL query executionwhich can also execute queries over multiple data sourcesas in Calcite. However, although the Catalyst optimizer in Spark SQL also attempts to minimize query execution cost, it lacks the dynamic programming approach used by Calcite and risks falling
into local minima.
   Algebricks [6] is a query compiler architecture that provides a data model agnostic algebraic layer and compiler framework for big data query processing. High-level languages are compiled to Algebricks logical algebra. Algebricks then generates an optimized job targeting the Hyracks parallel processing backend. While Calcite shares a modular approach with Algebricks, Calcite also includes a support for cost-based optimizations. In the current version of Calcite, the query optimizer architecture uses dynamic programming-based planning based on Volcano [20] with extensions for multi-stage optimizations as in Orca [45]. Though in principle Algebricks could support multiple processing backends (e.g.,Apache Tez, Spark), Calcite has provided well-tested support for
diverse backends for many years.
  Garlic [7] is a heterogeneous data management system which represents data from multiple systems under a unified object model. However, Garlic does not support query optimization across different systems and relies on each system to optimize its own queries.
  FORWARD [17] is a federated query processor that implementsa superset of SQL called SQL++ [38]. SQL++ has a semi-structured data model that integrate both JSON and relational data models whereas Calcite supports semi-structured data models by representing them in the relational data model during query planning.
FORWARD decomposes federated queries written in SQL++ into subqueries and executes them on the underlying databases according to the query plan. The merging of data happens inside the
FORWARD engine.
  Another federated data storage and processing system is Big-DAWG, which abstracts a wide spectrum of data models including relational, time-series and streaming. The unit of abstraction in
BigDAWG is called an island of information. Each island of information has a query language, data model and connects to one or more storage systems. Cross storage system querying is supported within the boundaries of a single island of information. Calcite instead provides a unifying relational abstraction which allows querying
across backends with different data models.
   Myria is a general-purpose engine for big data analytics, with
advanced support for the Python language [21]. It produces query
plans for other backend engines such as Spark and PostgreSQL.

3. 架构

Calcite contains many of the pieces that comprise a typical database management system. However, it omits some key components, e.g., storage of data, algorithms to process data, and a repository for storing metadata. These omissions are deliberate: it makes Calcite an excellent choice for mediating between applications having one
or more data storage locations and using multiple data processing engines. It is also a solid foundation for building bespoke data processing systems.
Figure 1 outlines the main components of Calcite’s architecture. Calcite’s optimizer uses a tree of relational operators as its internal representation. The optimization engine primarily consists of three components: rules, metadata providers, and planner engines. We discuss these components in more detail in Section 6. In the figure, the dashed lines represent possible external interactions with the framework. There are different ways to interact with Calcite.

图1 Apache Calcite架构和交互

图1大概描述了Calcite架构的主要组件。Calcite优化器使用关系表达式树来作为它的内在表示。优化器主要由规则,元数据提供者和计划引擎三部分组成,虚线表示外部和框架的交互,有许多不同的和Calcite交互的方式。

  First, Calcite contains a query parser and validator that can translate a SQL query to a tree of relational operators. As Calcite does not contain a storage layer, it provides a mechanism to define
table schemas and views in external storage engines via adapters(described in Section 5), so it can be used on top of these engines.
  Second, although Calcite provides optimized SQL support to systems that need such database language support, it also provides optimization support to systems that already have their own language
parsing and interpretation:

  • Some systems support SQL queries, but without or with limited query optimization. For example, both Hive and Spark initially offered support for the SQL language, but they did not include an optimizer. For such cases, once the query has been optimized, Calcite can translate the relational expression back to SQL. This feature allows Calcite to work as a stand-alone system on top of any data management system
    with a SQL interface, but no optimizer.
  • The Calcite architecture is not only tailored towards optimizing SQL queries. It is common that data processing
    systems choose to use their own parser for their own query language. Calcite can help optimize these queries as well.
    Indeed, Calcite also allows operator trees to be easily constructed by directly instantiating relational operators. One
    can use the built-in relational expressions builder interface. For instance, assume that we want to express the following Apache Pig [41] script using the expression builder
    This interface exposes the main constructs necessary for building relational expressions. After the optimization phase
    is finished, the application can retrieve the optimized relational expression which can then be mapped back to the
    system’s query processing unit.

第一,Calcite包含一个查询解析器和校验器,它能转化一个SQL查询到一个关系表达式树。由于Calcite并没有自己的存储层,它通过适配器来表达外部存储引擎
第二,Calcite也支持框架有自己sql解析器,之后进行优化
举个例子,对于有自己sql解析器的框架的sql语句如下所示



可以用builder代码转化为calcite的 relNode关系表达式



Calcite可以对relNode进行优化,之后可以再把关系表达式再转换为SQL

4.查询关系代数

  Operators. Relational algebra [11] lies at the core of Calcite. In addition to the operators that express the most common data manipulation operations, such as filter, project, join etc., Calcite includes additional operators that meet different purposes, e.g., being able to concisely represent complex operations, or recognize optimization
opportunities more efficiently. For instance, it has become common for OLAP, decision making, and streaming applications to use window definitions to express complex analytic functions such as moving average of a quantity
over a time period or number or rows. Thus, Calcite introduces a window operator that encapsulates the window definition, i.e., upper and lower bound, partitioning etc., and the aggregate functions to execute on each window.
   Traits. Calcite does not use different entities to represent logical and physical operators. Instead, it describes the physical properties associated with an operator using traits. These traits help the optimizer evaluate the cost of different alternative plans. Changing a trait value does not change the logical expression being evaluated, i.e., the rows produced by the given operator will still be the same.
   During optimization, Calcite tries to enforce certain traits on relational expressions, e.g., the sort order of certain columns. Relational operators can implement a converter interface that indicates how to convert traits of an expression from one value to another.
   Calcite includes common traits that describe the physical properties of the data produced by a relational expression, such as ordering, grouping, and partitioning. Similar to the SCOPE optimizer [57], the Calcite optimizer can reason about these properties and exploit them to find plans that avoid unnecessary operations. For example, if the input to the sort operator is already correctly ordered— possibly because this is the same order used for rows in the backend system—then the sort operation can be removed.
   In addition to these properties, one of the main features of Calcite is the calling convention trait. Essentially, the trait represents the data processing system where the expression will be executed. Including the calling convention as a trait allows Calcite to meet its goal of optimizing transparently queries whose execution might
span over different engines i.e., the convention will be treated as any other physical property.
   For example, consider joining a Products table held in MySQL to an Orders table held in Splunk (see Figure 2). Initially, the scan of Orders takes place in the splunk convention and the scan of Products is in the jdbc-mysql convention. The tables have to be scanned inside their respective engines. The join is in the logical convention,
meaning that no implementation has been chosen. Moreover, the SQL query in Figure 2 contains a filter (where clause) which is pushed into splunk by an adapter-specific rule (see Section 5). One possible implementation is to use Apache Spark as an external engine: the join is converted to spark convention, and its inputs are converters from jdbc-mysql and splunk to spark convention. But there is a more efficient implementation: exploiting the fact that Splunk can perform lookups into MySQL via ODBC, a planner rule pushes the join through the splunk-to-spark converter, and the join is now in splunk convention, running inside the Splunk engine.

4.1 Operators

关系代数[11]是Calcite的核心,Calcite不仅仅包含可以表示大部分通用数据操作的标识符(例如filter, project, join等),还包括额外的操作符,这些操作符可以满足不同的目的,比如简明的表达复杂操作,或者辨识更有效的优化机会

4.2 Traits

Calcite没有使用不同的实体来表示逻辑和物理操作符,替代的,它用带有Traits的操作符来描述物理属性,这些特性帮助优化器来评估不同替代计划的代价。改变一个特质的值不会改变逻辑表达式。例如这些由操作符产生的行依然不变。
RelTrait 与 RelTraitDef
RelTrait 表示 RelNode 的一种性质,用于指定物理执行属性,比如是否需要排序,数据的分布(distribution),它使用时由 RelTraitDef 来定义,目前分为三类:

  • ConventionTraitDef,表示由何种数据处理引擎处理,对应的 RelTrait 类型为 Convention,逻辑执行计划中,其值默认为 org.apache.calcite.plan.Convention#NONE,物理执行计划的对应的值在优化前,通过方法 org.apache.calcite.plan.RelOptPlanner#changeTraits 指定,Calcite 已经定义的有 EnumerableConvention,BindableConvention,JdbcConvention 等。如果为 EnumerableConvention,那么生成的物理执行计划将由 Calcite 的 linq4j 引擎执行,此外每种 Convention 都对应具体的关系表达式的转换规则。

  • RelCollationTraitDef,表示排序规则的定义,对应的 RelTrait 为 RelCollation。比如对于排序表达式(也称算子) org.apache.calcite.rel.core.Sort,就存在一个 RelCollation 类型的属性 collation。

  • RelDistributionTraitDef,表示数据在物理存储上的分布方式,对应的 RelTrait 为 RelDistribution。
    另外,对于每个 RelNode 对象,都会有 RelTraitSet,这是 RelTrait 的一个有序集合,RelNode 的 RelTrait 都是保存在该集合中的。
    在优化过程中,Calcite尝试实施确定的特质在关系表达式上,例如确定列的排序顺序,关系表达式可以实现
    convert接口来指示怎样把一个表达式的特质转换为另一个。

图2 查询优化过程

  上面的查询描述了ConventionTrait,扫描Orders表发生在splunk convention,扫描Products表发生在jdbc-mysql convention,各自引擎进行扫描。
  一个可能的实现就是用Apache Spark作为一个外部引擎,这个连接join被转化为spark convention,join的输入为jdbc-mysql和splunk.
  但是也有一个更有效的实现。利用Splunk可以使用ODBC访问MySQL的原理,使用splunk convention查询。再通过splunk-to-spark converter运行在Splunk引擎。

5.适配器

An adapter is an architectural pattern that defines how Calcite incorporates diverse data sources for general access. Figure 3 depicts its components. Essentially, an adapter consists of a model, a schema, and a schema factory. The model is a specification of the physical properties of the data source being accessed. A schema is
the definition of the data (format and layouts) found in the model.The data itself is physically accessed via tables. Calcite interfaces with the tables defined in the adapter to read the data as the query is being executed. The adapter may define a set of rules that are added to the planner. For instance, it typically includes rules to convert various types of logical relational expressions to the corresponding relational expressions of the adapter’s convention. The schema factory component acquires the metadata information from the model and generates a schema.


图三 Calcite数据源适配器设计

As discussed in Section 4, Calcite uses a physical trait known as the calling convention to identify relational operators which correspond to a specific database backend. These physical operators implement the access paths for the underlying tables in each adapter.When a query is parsed and converted to a relational algebra expression,
an operator is created for each table representing a scan of the data on that table. It is the minimal interface that an adapter must implement. If an adapter implements the table scan operator, the Calcite optimizer is then able to use client-side operators such as sorting, filtering, and joins to execute arbitrary SQL queries against these tables.
 This table scan operator contains the necessary information theadapter requires to issue the scan to the adapter’s backend database.To extend the functionality provided by adapters, Calcite defines an enumerable calling convention. Relational operators with the enumerable calling convention simply operate over tuples via an iterator interface. This calling convention allows Calcite to implement operators which may not be available in each adapter’s
backend. For example, the EnumerableJoin operator implements joins by collecting rows from its child nodes and joining on the desired attributes.
 For queries which only touch a small subset of the data in a table, it is inefficient for Calcite to enumerate all tuples. Fortunately, the same rule-based optimizer can be used to implement adapter-specific rules for optimization. For example, suppose a query involves filtering and sorting on a table. An adapter which can perform filtering on the backend can implement a rule which matches a LogicalFilter and converts it to the adapter’s calling convention. This rule converts the LogicalFilter into another Filter instance. This new Filter node has a lower associated cost that allows Calcite to optimize queries across adapters.
 The use of adapters is a powerful abstraction that enables not only optimization of queries for a specific backend, but also across multiple backends. Calcite is able to answer queries involving tables across multiple backends by pushing down all possible logic to each backend and then performing joins and aggregations on the resulting data. Implementing an adapter can be as simple as providing a table scan operator or it can involve the design of many advanced optimizations. Any expression represented in the relational algebra can be pushed down to adapters with optimizer rules.

  本质上,适配器由模型,模式,模式工厂三部分组成,模型是接触的数据源物理属性的明确说明,在calcite中它是一个json配置文件,模式是在模型中发现的数据(格式和布局)的定义。
这部分可以看下源码demo

  对于只能接触一个表中小部分数据的查询,对于Calcite来说迭代查询所有元组是低效,基于规则的优化器可以用来实现适配器定制的规则来用于优化。例如,假设对于一张表执行排序和过滤查询,
可以针对后台数据库实行过滤的适配器可以实现一个规则,这个规则的触发条件是匹配到LogicalFilter,之后将它转换成适配器的calling convention,这个规则把LogicalFilter转换成另外的过滤器实例,这个新的过滤器实例拥有更小的关联代价,允许Calcite跨优化器执行优化(条件下推)。

6.查询过程和优化器

The query optimizer is the main component in the framework.Calcite optimizes queries by repeatedly applying planner rules to a relational expression. A cost model guides the process, and the planner engine tries to generate an alternative expression that has the same semantics as the original but a lower cost.
Every component in the optimizer is extensible. Users can add relational operators, rules, cost models, and statistics.

  查询优化器是框架里的核心组件,Calcite通过对一个关系表达式不断地执行计划规则来优化查询,一个代价估算的模型指引优化过程,计划引擎尝试生成一个可以替代的关系表达式,拥有和原关系表达式一样的语义和更低的代价。每一个组件都是可以拓展的,用户可以增加过站关系表达式操作符,规则,代价模型和统计信息。

6.1 计划规则(Planner rules)

  Calcite includes a set of planner rules to transform expression trees. In particular, a rule matches a given pattern in
the tree and executes a transformation that preserves semantics of that expression. Calcite includes several hundred optimization rules. However, it is rather common for data processing systems relying on Calcite for optimization to include their own rules to allow specific rewritings.
  For example, Calcite provides an adapter for Apache Cassandra [29], a wide column store which partitions data by a subset of columns in a table and then within each partition, sorts rows based on another subset of columns. As discussed in Section 5, it is beneficial for adapters to push down as much query processing as
possible to each backend for efficiency. A rule to push a Sort into Cassandra must check two conditions:
  (1) the table has been previously filtered to a single partition
(since rows are only sorted within a partition) and
  (2) the sorting of partitions in Cassandra has some commonprefix with the required sort.
  This requires that a LogicalFilter has been rewritten to aCassandraFilter to ensure the partition filter is pushed down to the database. The effect of the rule is simple (convert a LogicalSort into a CassandraSort) but the flexibility in rule matching enables backends to push down operators even in complex scenarios.
  For an example of a rule with more complex effects, consider the following query:



  The query corresponds to the relational algebra expression presented in Figure 4a. Because the WHERE clause only applies to the sales table, we can move the filter before the join as in Figure 4b. This optimization can significantly reduce query execution time since we do not need to perform the join for rows which do match the predicate. Furthermore, if the sales and products tables were contained in separate backends, moving the filter before
the join also potentially enables an adapter to push the filter into the backend. Calcite implements this optimization via
FilterIntoJoinRule which matches a filter node with a join node as a parent and checks if the filter can be performed by the join. This optimization illustrates the flexibility of the Calcite approach to optimization.
这个查询可以表示为图4a


图4 FilterIntoJoinRule 应用

  Calcite包含了一组计划规则可以转换关系表达式树。对于符合关系表达式树的规则,会对关系表达式书进行转换,转换会保持原语义。Calcite包含几百个优化规则。对于基于Calcite优化器的数据库系统(dremio)可以包含自己特有的优化规则来实现重写.
例如,Calcite为Apache Cassandra提供一个适配器,Apache Cassandra是一个多列存储引擎,通过一组表中的列进行分区,在每个分区内,行根据列组进行排序。就像在第5部分讨论的,对于适配器来说,下推尽可能多的查询处理到背后的数据引擎是很有效的。下推Sort到Cassandra必须满足两个条件:
(1)这个表已经过滤出来一个单独的分区,因为rows排序只在每个分区内有效
(2)Cassandra中分区的排序根据需要的排序规则有一些通用的前缀
这就需要LogicalFilter被重写成CassandraFilter来确保分区的filter已经下推到数据库。这个规则的影响是简单的(转换LogicalSort到CassandraSort),灵活的规则命中机制可以确保在复杂的情形中依然可以下推操作符到后台数据库。

  因为WHER从句仅仅作用于sales表,我们可以移动这个filter到join之前,就像图4表达的那样。这个优化可以显著地减少查询时间,因为我们可以没有必要针对不符合断言的数据进行连接操作。此外,如果sales和products表属于不同的后台数据库,允许适配器将过滤下推到后台服务器。Calcite通过FilterIntoJoinRule规则实现了这项优化。规则的命中机制是关系表达式节点的父节点是join节点和检查join是否可以应用于filter。上面这个优化例子说明了Calcite优化的灵活性。

6.2 元数据提供者(Metadata providers)

Metadata is an important part of Calcite’s optimizer, and it serves two main purposes: (i) guiding the planner
towards the goal of reducing the cost of the overall query plan, and (ii) providing information to the rules while they are being applied.

元数据对于Calcite的优化器来说是一个很重要的部分,他有两个主要目的
(1)指引计划朝着减少代价的目标进行
(2)为规则的执行提供必要的信息

  Metadata providers are responsible for supplying that information to the optimizer. In particular, the default metadata providers implementation in Calcite contains functions that return the overall cost of executing a subexpression in the operator tree, the number of rows and the data size of the results of that expression, and the maximum degree of parallelism with which it can be executed. In turn, it can also provide information about the plan structure, e.g., filter conditions that are present below a certain tree node.
  Calcite provides interfaces that allow data processing systems toplug their metadata information into the framework. These systems may choose to write providers that override the existing functions, or provide their own new metadata functions that might be used during the optimization phase. However, for many of them, it is
sufficient to provide statistics about their input data, e.g., number of rows and size of a table, whether values for a given column are unique etc., and Calcite will do the rest of the work by using its default implementation.
As the metadata providers are pluggable, they are compiled and instantiated at runtime using Janino [27], a Java lightweight compiler. Their implementation includes a cache for metadata results, which yields significant performance improvements, e.g., when we need to compute multiple types of metadata such as cardinality,\ average row size, and selectivity for a given join, and all these computations rely on the cardinality of their inputs.

  元数据提供者负责给优化器提供必要的信息。

6.3 计划引擎(Planner engines)

The main goal of a planner engine is to trigger the rules provided to the engine until it reaches a given objective. At
the moment, Calcite provides two different engines. New engines are pluggable in the framework.
计划引擎的主要目的是触发提供给引擎的规则直到达到了优化目标。于此,Calcite提供了两种不同的引擎

6.3.1 基于代价的引擎(CBO)

The first one, a cost-based planner engine, triggers the input ruleswith the goal of reducing the overall expression cost. The engineuses a dynamic programming algorithm, similar to Volcano [20],to create and track different alternative plans created by firing the rules given to the engine. Initially, each expression is registered with the planner, together with a digest based on the expression attributes and its inputs. When a rule is fired on an expression e1
and the rule produces a new expression e2, the planner will add e2 to the set of equivalence expressions Sa that e1 belongs to. In addition, the planner generates a digest for the new expression, which is compared with those previously registered in the planner. If a similar digest associated with an expression e3 that belongs to a set Sb is found, the planner has found a duplicate and hence will merge Sa and Sb into a new set of equivalences. The process
continues until the planner reaches a configurable fix point. In particular, it can (i) exhaustively explore the search space until all rules have been applied on all expressions, or (ii) use a heuristicbased approach to stop the search when the plan cost has not improved by more than a given threshold δ in the last planner iterations. The cost function that allows the optimizer to decide which plan to choose is supplied through metadata providers. The default cost function implementation combines estimations for CPU, IO, and memory resources used by a given expression.

触发规则,这些规则的目标是减少全面表达式的代价。使用类似于Volcano的动态规划算法,通过激活的引擎规则创建和追踪不同可替代的计划。默认的代价函数包含一个表达式使用的CPU,IO,内存资源.

6.3.2 启发式优化器(RBO)

The second engine is an exhaustive planner, which triggers rules exhaustively until it generates an expression that is no longer modified by any rules. This planner is useful to quickly execute rules without taking into account the cost of each expression.

穷举地触发规则,直到它生成一个表达式,表达式再也不能被规则修改

Users may choose to use one of the existing planner engines depending on their concrete needs, and switching from one to another, when their system requirements change, is straightforward. Alternatively, users may choose to generate multi-stage optimization logic, in which different sets of rules are applied in consecutive phases of the optimization process. Importantly, the existence of two planners allows Calcite users to reduce the overall optimization
time by guiding the search for different query plans.

6.4 物化结果

  One of the most powerful techniques to accelerate query processing in data warehouses is the precomputation of
relevant summaries or materialized views. Multiple Calcite adapters and projects relying on Calcite have their own notion of materialized views. For instance, Cassandra allows the user to define materialized views based on existing tables which are automatically maintained by the system.
  These engines expose their materialized views to Calcite. The optimizer then has the opportunity to rewrite incoming queries to use these views instead of the original tables. In particular, Calcite provides an implementation of two different materialized view based rewriting algorithms.
  The first approach is based on view substitution [10, 18]. The aim is to substitute part of the relational algebra tree with an equivalent expression which makes use of a materialized view, and the algorithm proceeds as follows: (i) the scan operator over the materialized view and the materialized view definition plan are registered with the planner, and (ii) transformation rules that try to unify expressions in the plan are triggered. Views do not need to exactly match expressions in the query being replaced, as the rewriting algorithm in Calcite can produce partial rewritings that include additional operators to compute the desired expression, e.g., filters with residual predicate conditions.
  The second approach is based on lattices [22]. Once the data sources are declared to form a lattice, Calcite represents each of the materializations as a tile which in turn can be used by the optimizer
to answer incoming queries. On the one hand, the rewriting algorithm is especially efficient in matching expressions over data sources organized in a star schema, which are common in OLAP applications. On the other hand, it is more restrictive than view substitution, as it imposes restrictions on the underlying schema.
数据仓库处理查询最有效的加速技术之一就是预计算相关聚合结果或者是结果物化。多种Calcite适配器或者基于Calcite的项目有自己的物化视图概念。
Calcite提供了两种物化视图重写算法的实现。
(1)基于view substitution。这个实现的目的就是使用等效的物化视图替换部分关系表达式树。
(2)基于lattices,当数据源被声明来源于lattice,Calcite可以使用tile表达每个物化结果。这些tile可以被优化器使用响应查询。一方面,重写算法对于星型查询的数据源的表达式依然有效。

7.Calcite拓展

As we have mentioned in the previous sections, Calcite is not only tailored towards SQL processing. In fact, Calcite provides extensions to SQL expressing queries over other data abstractions, such as semistructured,
streaming and geospatial data. Its internal operators adapt to these queries. In addition to extensions to SQL, Calcite also includes a language-integrated query language. We describe these extensions throughout this section and provide some examples.

像我们在前面章节里提到的,Calcite对于SQL处理不是定制的。事实上,Calcite通过数据抽象提供了SQL查询语言拓展,这些抽象包含半结构化数据,流式,和地理位置数据。Calcite内部的操作符可以适配这些查询。除了对SQL查询进行拓展,Calcite也可以包含整合语言的查询语言,在下面描述这些拓展和举例。

7.1 半结构化数据

Calcite supports several complex column data types that enable a hybrid of relational and semi-structured data to be stored in tables. Specifically, columns can be of type ARRAY, MAP, or MULTISET. Furthermore, these complex types can be nested so it is possible for example, to have a MAP where the values are of type ARRAY. Data
within the ARRAY and MAP columns (and nested data therein) can be extracted using the [] operator. The specific type of values stored in any of these complex types need not be predefined.
  For example, Calcite contains an adapter for MongoDB [36], a document store which stores documents consisting of data roughly equivalent to JSON documents. To expose MongoDB data to Calcite, a table is created for each document collection with a single column named _MAP: a map from document identifiers to their data. In many cases, documents can be expected to have a common structure. A collection of documents representing zip codes may each contain columns with a city name, latitude and longitude. It can be useful to expose this data as a relational table. In Calcite, this is achieved by creating a view after extracting the desired values and casting
them to the appropriate type:



With views over semi-structured data defined in this manner, it becomes easier to manipulate data from different semi-structured sources in tandem with relational data.

Calcite支持几种复杂的列数据类型,可以支持存储在表里的关系和半结构化的数据的混合。数据类型可以是ARRAY, MAP, or MULTISET,这些复杂的类型可是嵌套的,例如可以拥有MAP,它的值是ARRAY.在ARRAY和MAP列中的数据可以使用[]操作符提取出来。

7.2 流式数据

Calcite provides first-class support for streaming queries [26] based on a set of streaming-specific extensions to standard SQL, namely STREAM extensions, windowing extensions, implicit references to streams via window expressions in joins, and others. These extensions were inspired by the Continuous Query Language [2]
while also trying to integrate effectively with standard SQL. The primary extension, the STREAM directive tells the system that the user is interested in incoming records, not existing ones.



  In the absence of the STREAM keyword when querying a stream, the query becomes a regular relational query, indicating the system should process existing records which have already been received from a stream, not the incoming ones.
  Due to the inherently unbounded nature of streams, windowing is used to unblock blocking operators such as aggregates and joins. Calcite’s streaming extensions use SQL analytic functions to express sliding and cascading window aggregations, as shown in the following example.



Tumbling, hopping and session windows2 are enabled by the TUMBLE, HOPPING, SESSION functions and related utility functions such as TUMBLE_END and HOP_END that can be used respectively in GROUP BY clauses and projections.

  Streaming queries involving window aggregates require the presence of monotonic or quasi-monotonic expressions in the GROUP BY clause or in the ORDER BY clause in case of sliding and cascading

window queries. Streaming queries which involve more complex stream-to-stream joins can be expressed using an implicit (time) window expression in the JOIN clause.



In the case of an implicit window, Calcite’s query planner validates that the expression is monotonic.

Calcite通过对于标准SQL的拓展对于流式查询提供了很好的支持,这些拓展包含STREAM拓展,窗口函数拓展,通过joins里窗口表达式隐式使用流式。这些拓展的灵感来源于Continuous Query Language(CQL),最主要的拓展STREAM指令告诉系统用户操作流入的数据,而不是存在的数据。

7.3 地理数据查询

Geospatial support is preliminary in Calcite, but is being implemented using Calcite’s relational algebra. The core of this implementation consists in adding a new GEOMETRY data type which encapsulates different geometric objects such as points, curves, and polygons. It is expected that Calcite will be fully compliant with the OpenGIS Simple Feature Access [39] specification which defines a standard for SQL interfaces to access geospatial data. An example
query finds the country which contains the city of Amsterdam:


地理位置查询在Calcite中初步支持,使用Calcite的关系代数来支持,这个实现的核心部分在于添加了一个新的数据类型GEOMETRY,这个类型可以表示不同的地理几何对象,比如点,曲线,多边形等,Calcite将完全遵从OpenGIS Simple Feature Access的标准定义。

7.4 整合语言查询

Calcite can be used to query multiple data sources, beyond just relational databases. But it also aims to support more than just the SQL language. Though SQL remains the primary database language,
many programmers favour language-integrated languages like LINQ [33]. Unlike SQL embedded within Java or C++ code, language-integrated query languages allow the programmer to write all of her code using a single language. Calcite provides Language-Integrated Query for Java (or LINQ4J, in short) which closely follows the convention set forth by Microsoft’s LINQ for the .NET languages.

Calcite可以用来查询多种数据源,不仅仅是关系型数据库。它不仅仅用来支持SQL语言。虽然SQL语言依然是数据库中的主要语言,需要称需要倾向于LINQ(Language-Integrated Query)的查询语言,不像SQL语言,LINQ可以集成java或者C++代码,整合语言的查询语言允许开发者使用一种语言来编写代码。Calcite支持Language-Integrated Query for Java (LINQ4J),遵从微软为.NET语言设计的LINQ的语法

8.工业和学术应用

Calcite enjoys wide adoption, specially among open-source projects used in industry. As Calcite provides certain integration flexibility, these projects have chosen to either (i) embed Calcite within their core, i.e., use it as a library, or (ii) implement an adapter to allow Calcite to federate query processing. In addition, we see a growing interest in the research community to use Calcite as the cornerstone of the development of data management projects. In the following, we describe how different systems are using Calcite.

Calcite被广泛使用,尤其是在开源的工业级项目中。因为Calcite提供了集成的灵活性,项目集成Calcite成他们的核心模块,或者使用它作为库,或者实现了一个适配器来允许Calcite进行联邦查询。另外,我们看到研究型社区使用Calcite作为数据引擎的基础模块,下面我们来描述下不同的系统是怎样使用Calcite的

8.1 集成calcite

表1 集成Calcite的应用列表

Table 1 provides a list of software that incorporates Calcite as a library, including (i) the query language interface that they expose to users, (ii) whether they use Calcite’s JDBC driver (called Avatica),
(iii) whether they use the SQL parser and validator included in Calcite, (iv) whether they use Calcite’s query algebra to represent their operations over data, and (v) the engine that they rely on for execution, e.g., their own native engine, Calcite’s operators (referred to as enumerable), or any other project.
  Drill [13] is a flexible data processing engine based on the Dremelsystem [34] that internally uses a schema-free JSON document datamodel. Drill uses its own dialect of SQL that includes extensions to express queries on semi-structured data, similar to SQL++ [38].
  Hive [24] first became popular as a SQL interface on top of the MapReduce programming model [52]. It has since moved towards being an interactive SQL query answering engine, adoptingCalcite as its rule and cost-based optimizer. Instead of relying onCalcite’s JDBC driver, SQL parser and validator, Hive uses its own implementation of these components. The query is then translated into Calcite operators, which after optimization are translated into Hive’s physical algebra. Hive operators can be executed by multiple engines, the most popular being Apache Tez [43, 51] and Apache Spark [47, 56].
  Apache Solr [46] is a popular full-text distributed search platform built on top of the Apache Lucene library [31]. Solr exposes multiple query interfaces to users, including REST-like HTTP/XML and JSON APIs. In addition, Solr integrates with Calcite to provide SQL compatibility.
   Apache Phoenix [40] and Apache Kylin [28] both work on top of Apache HBase [23], a distributed key-value store modeled after Bigtable [9]. In particular, Phoenix provides a SQL interface and orchestration layer to query HBase. Kylin focuses on OLAP-style
  SQL queries instead, building cubes that are declared as materializedviews and stored in HBase, and hence allowing Calcite’s optimizer to rewrite the input queries to be answered using those cubes. In Kylin, query plans are executed using a combination of Calcite native operators and HBase.
  Recently Calcite has become popular among streaming systems too. Projects such as Apache Apex [1], Flink [16], Apache Samza [44], and Storm [50] have chosen to integrate with Calcite, using its components to provide a streaming SQL interface to their users. Finally, other commercial systems have adopted Calcite,

表1提供了一个集成Calcite作为库的软件列表,使用方面包括
(1)暴露给用户的查询接口
(2)使用Calcite的 JDBC driver(avatica)
(3)使用Calcite的sql解析和校验器
(4)使用Calcite的关系代数表达式
(5)执行引擎,例如他们自己的本地引擎,使用Calcite的关系表达式或者其他的项目

8.2 Calcite适配器

Instead of using Calcite as a library, other systems integrate with Calcite via adapters which read their data sources. Table 2 provides the list of available adapters in Calcite. One of the main key components of the implementation of these adapters is the converter responsible for translating the algebra expression to be pushed to the system into the query language supported by that system. Table 2 also shows the languages that Calcite translates into for
each of these adapters.
  The JDBC adapter supports the generation of multiple SQL dialects, including those supported by popular RDBMSes such as PostgreSQL and MySQL. In turn, the adapter for Cassandra [8] generates

its own SQL-like language called CQL whereas the adapter for Apache Pig [41] generates queries expressed in Pig Latin [37]. The adapter for Apache Spark [47] uses the Java RDD API. Finally, Druid [14], Elasticsearch [15] and Splunk [48] are queried through REST HTTP API requests. The queries generated by Calcite for these systems are expressed in JSON or XML.
表2 Calcite适配器列表

对于集成Calcite作为库,有些系统选择使用Calcite作为适配器来读取他们的数据源。

8.3 学术研究应用

In a research setting, Calcite has been considered [54] as a polystorealternative for precision medicine and clinical analysis scenarios. In those scenarios, heterogeneous medical data has to be logically assembled and aligned to assess the best treatments based on the comprehensive medical history and the genomic profile of the patient.
The data comes from relational sources representing patients’ electronic medical records, structured and semi-structured sources representing various reports (oncology, psychiatry,laboratory tests, radiology, etc.), imaging, signals, and sequence data, stored in scientific databases. In those circumstances, Calcite represents a good
foundation with its uniform query interface, and flexible adapter architecture, but the ongoing research efforts are aimed at (i) introduction of the new adapters for array, and textual sources, and (ii) support efficient joining of heterogeneous data sources.

9.未来工作

The future work on Calcite will focus on the development of the new features, and the expansion of its adapter architecture:

  • Enhancements to the design of Calcite to further support its use a standalone engine, which would require a support for data definition languages (DDL), materialized views, indexes and constraints.
  • Ongoing improvements to the design and flexibility of the planner, including making it more modular, allowing users Calcite to supply planner programs (collections of rules organized into planning phases) for execution.
  • Incorporation of new parametric approaches [53] into the design of the optimizer.
  • Support for an extended set of SQL commands, functions, and utilities, including full compliance with OpenGIS.
  • New adapters for non-relational data sources such as array databases for scientific computing.
  • Improvements to performance profiling and instrumentation.
    Calcite未来的工作将聚焦在拓展新功能和适配器架构的拓展:
    ● Calcite增强设计来支持它作为一个独立引擎来使用,支持DDL,物化视图,索引和约束
    ● 持续提升Calcite计划引擎的灵活性,包括使它更加模块化,允许用户提供计划引擎programs(计划阶段的一组规则)
    ● 引入多参数并行化优化器的设计思想到优化器的设计
    ● 支持符合OpenGIS标准的SQL命令,函数和基础设施
    ● 对于非关系型数据源的适配器支持,例如array数据库和科学计算
    ● 改进性能,资料收集和图表化

9.1 性能测试和评估

Though Calcite contains a performance testing module, it does not evaluate query execution. It would be useful to assess the performance of systems built with Calcite. For example, we could compare the performance of Calcite with similar frameworks. Unfortunately, it might be difficult to craft fair comparisons. For example, like Calcite, Algebricks optimizes queries for Hive. Borkar et al. [6] compared Algebricks with the Hyracks scheduler against Hive version 0.12 (without Calcite). The work of Borkar et al. precedes significant engineering and architectural changes into Hive. Comparing Calcite against Algebricks in a fair manner in terms of timings does not seem feasible, as one would need to ensure that each uses the same execution engine. Hive applications rely mostly on either Apache Tez or Apache Spark as execution engines whereas Algebricks is tied to its own framework (including Hyracks).
  Moreover, to assess the performance of Calcite-based systems, we need to consider two distinct use cases. Indeed, Calcite can be used either as part of a single system—as a tool to accelerate the construction of such a system—or for the more difficult task of combining several distinct systems—as a common layer. The former is tied to the characteristics of the data processing system, and because Calcite is so versatile and widely used, many distinct benchmarks are needed. The latter is limited by the availability of existing heterogeneous benchmarks. BigDAWG [55] has been used to integrate PostgreSQL with Vertica, and on a standard benchmark, one gets that the integrated system is superior to a baseline where entire tables are copied from one system to another to answer specific queries. Based on real-world experience, we believe that more ambitious goals are possible for integrated multiple systems: they should be superior to the sum of their parts.

10.结论

Emerging data management practices and associated analytic uses of data continue to evolve towards an increasingly diverse, and heterogeneous spectrum of scenarios. At the same time, relational data sources, accessed through the SQL, remain an essential means to how enterprises work with the data. In this somewhat dichotomous
space, Calcite plays a unique role with its strong support for both traditional, conventional data processing, and for its support of other data sources including those with semi-structured, streaming and geospatial models. In addition, Calcite’s design philosophy with a focus on flexibility, adaptivity, and extensibility, has been another factor in Calcite becoming the most widely adopted query optimizer, used in a large number of open-source frameworks. Calcite’s dynamic and flexible query optimizer, and its adapter architecture allows it to be embedded selectively by a variety of data management frameworks such as Hive, Drill, MapD, and Flink. Calcite’s support for heterogeneous data processing, as well as for the extended set of relational functions will continue to improve, in both functionality and performance.

引自
Apache Calcite: A Foundational Framework for Optimized
Query Processing Over Heterogeneous Data Sources

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 206,482评论 6 481
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 88,377评论 2 382
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 152,762评论 0 342
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 55,273评论 1 279
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 64,289评论 5 373
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 49,046评论 1 285
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,351评论 3 400
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,988评论 0 259
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 43,476评论 1 300
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,948评论 2 324
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,064评论 1 333
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,712评论 4 323
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,261评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,264评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,486评论 1 262
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,511评论 2 354
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,802评论 2 345

推荐阅读更多精彩内容