spark sql 2.3 源码解读 - Optimizer (4)

得到 Resolved Logical Plan 后,将进入优化阶段。后续执行逻辑如下:

// 如果缓存中有查询结果,则直接替换为缓存的结果,逻辑不复杂,这里不再展开讲了。
lazy val withCachedData: LogicalPlan = {
  assertAnalyzed()
  assertSupported()
  sparkSession.sharedState.cacheManager.useCachedData(analyzed)
}
// 对Logical Plan 优化
lazy val optimizedPlan: LogicalPlan = {
  sparkSession.sessionState.optimizer.execute(withCachedData)
}

下面看一下Optimizer:

/**
 * Abstract class all optimizers should inherit of, contains the standard batches (extending
 * Optimizers can override this.
 */
abstract class Optimizer(sessionCatalog: SessionCatalog)
  extends RuleExecutor[LogicalPlan] {

看到Optimizer也是继承自RuleExecutor,我们就开心了,和Analyzer一个套路,也是遍历tree,并对每个节点应用rule。下面直接看rules就好了:

def batches: Seq[Batch] = {
  val operatorOptimizationRuleSet =
    Seq(
      // Operator push down
      PushProjectionThroughUnion,
      ReorderJoin,
      EliminateOuterJoin,
      PushPredicateThroughJoin,
      PushDownPredicate,
      LimitPushDown,
      ColumnPruning,
      InferFiltersFromConstraints,
      // Operator combine
      CollapseRepartition,
      CollapseProject,
      CollapseWindow,
      CombineFilters,
      CombineLimits,
      CombineUnions,
      // Constant folding and strength reduction
      NullPropagation,
      ConstantPropagation,
      FoldablePropagation,
      OptimizeIn,
      ConstantFolding,
      ReorderAssociativeOperator,
      LikeSimplification,
      BooleanSimplification,
      SimplifyConditionals,
      RemoveDispensableExpressions,
      SimplifyBinaryComparison,
      PruneFilters,
      EliminateSorts,
      SimplifyCasts,
      SimplifyCaseConversionExpressions,
      RewriteCorrelatedScalarSubquery,
      EliminateSerialization,
      RemoveRedundantAliases,
      RemoveRedundantProject,
      SimplifyCreateStructOps,
      SimplifyCreateArrayOps,
      SimplifyCreateMapOps,
      CombineConcats) ++
      extendedOperatorOptimizationRules

  val operatorOptimizationBatch: Seq[Batch] = {
    val rulesWithoutInferFiltersFromConstraints =
      operatorOptimizationRuleSet.filterNot(_ == InferFiltersFromConstraints)
    Batch("Operator Optimization before Inferring Filters", fixedPoint,
      rulesWithoutInferFiltersFromConstraints: _*) ::
    Batch("Infer Filters", Once,
      InferFiltersFromConstraints) ::
    Batch("Operator Optimization after Inferring Filters", fixedPoint,
      rulesWithoutInferFiltersFromConstraints: _*) :: Nil
  }

  (Batch("Eliminate Distinct", Once, EliminateDistinct) ::
  // Technically some of the rules in Finish Analysis are not optimizer rules and belong more
  // in the analyzer, because they are needed for correctness (e.g. ComputeCurrentTime).
  // However, because we also use the analyzer to canonicalized queries (for view definition),
  // we do not eliminate subqueries or compute current time in the analyzer.
  Batch("Finish Analysis", Once,
    EliminateSubqueryAliases,
    EliminateView,
    ReplaceExpressions,
    ComputeCurrentTime,
    GetCurrentDatabase(sessionCatalog),
    RewriteDistinctAggregates,
    ReplaceDeduplicateWithAggregate) ::
  //////////////////////////////////////////////////////////////////////////////////////////
  // Optimizer rules start here
  //////////////////////////////////////////////////////////////////////////////////////////
  // - Do the first call of CombineUnions before starting the major Optimizer rules,
  //   since it can reduce the number of iteration and the other rules could add/move
  //   extra operators between two adjacent Union operators.
  // - Call CombineUnions again in Batch("Operator Optimizations"),
  //   since the other rules might make two separate Unions operators adjacent.
  Batch("Union", Once,
    CombineUnions) ::
  Batch("Pullup Correlated Expressions", Once,
    PullupCorrelatedPredicates) ::
  Batch("Subquery", Once,
    OptimizeSubqueries) ::
  Batch("Replace Operators", fixedPoint,
    ReplaceIntersectWithSemiJoin,
    ReplaceExceptWithFilter,
    ReplaceExceptWithAntiJoin,
    ReplaceDistinctWithAggregate) ::
  Batch("Aggregate", fixedPoint,
    RemoveLiteralFromGroupExpressions,
    RemoveRepetitionFromGroupExpressions) :: Nil ++
  operatorOptimizationBatch) :+
  Batch("Join Reorder", Once,
    CostBasedJoinReorder) :+
  Batch("Decimal Optimizations", fixedPoint,
    DecimalAggregates) :+
  Batch("Object Expressions Optimization", fixedPoint,
    EliminateMapObjects,
    CombineTypedFilters) :+
  Batch("LocalRelation", fixedPoint,
    ConvertToLocalRelation,
    PropagateEmptyRelation) :+
  // The following batch should be executed after batch "Join Reorder" and "LocalRelation".
  Batch("Check Cartesian Products", Once,
    CheckCartesianProducts) :+
  Batch("RewriteSubquery", Once,
    RewritePredicateSubquery,
    ColumnPruning,
    CollapseProject,
    RemoveRedundantProject)
}

优化的rule很多,需要sql优化经验才能看懂了。

咱们以sql中最常见的优化谓词下推为例,谓词下推的介绍可以看这里:https://cloud.tencent.com/developer/article/1005925

执行的sql为:"SELECT A1.B FROM A1 JOIN A2 ON A1.B = A2.B WHERE A1.B = 'Andy'"

优化前:

`Project [B#6]`
`+- Filter (B#6 = Andy)`
   +- Join Inner, (B#6 = B#8)
      :- SubqueryAlias a1
      :  +- Relation[B#6] json
      +- SubqueryAlias a2
         `+- Relation[B#8] json`

优化后:

`Project [B#6]`
`+- Join Inner, (B#6 = B#8)`
   :- Filter (isnotnull(B#6) && (B#6 = Andy))
   :  +- Relation[B#6] json
   +- Filter (isnotnull(B#8) && (B#8 = Andy))
      `+- Relation[B#8] json`

明显可以看到Filter的下推优化。起作用的rule是PushPredicateThroughJoin和InferFiltersFromConstraints

下面着重看一下PushPredicateThroughJoin的关键代码:

def apply(plan: LogicalPlan): LogicalPlan = plan transform {
  // push the where condition down into join filter
  // match 这个结构
  case f @ Filter(filterCondition, Join(left, right, joinType, joinCondition)) =>
    val (leftFilterConditions, rightFilterConditions, commonFilterCondition) =
      split(splitConjunctivePredicates(filterCondition), left, right)
    joinType match {
     // 是 inner join
      case _: InnerLike =>
        // left下推为Filter
        // push down the single side `where` condition into respective sides
        val newLeft = leftFilterConditions.
          reduceLeftOption(And).map(Filter(_, left)).getOrElse(left)
        // right下推为Filter
        val newRight = rightFilterConditions.
          reduceLeftOption(And).map(Filter(_, right)).getOrElse(right)
        val (newJoinConditions, others) =
          commonFilterCondition.partition(canEvaluateWithinJoin)
        val newJoinCond = (newJoinConditions ++ joinCondition).reduceLeftOption(And)
        // 最终的优化结果
        val join = Join(newLeft, newRight, joinType, newJoinCond)
        if (others.nonEmpty) {
          Filter(others.reduceLeft(And), join)
        } else {
          join
        }

Optimizer就介绍到这里,感兴趣大家可以多看看其他的优化规则,对sql肯定有更深刻的理解。

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 204,530评论 6 478
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 86,403评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 151,120评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,770评论 1 277
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,758评论 5 367
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,649评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,021评论 3 398
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,675评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,931评论 1 299
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,659评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,751评论 1 330
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,410评论 4 321
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,004评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,969评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,203评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,042评论 2 350
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,493评论 2 343

推荐阅读更多精彩内容