ART世界探险(19) - 优化编译器的编译流程

ART世界探险(19) - 优化编译器的编译流程

前面,我们对于快速编译器的知识有了一点了解,对于CompilerDriver,MIRGraph等都有了初步的印象。
下面,我们回头看一下优化编译器的编译过程。有了前面的基础,后面的学习过程会更顺利一些。

下面我们先看个地图,看看我们将遇到哪些新的对象:

OptimizingCompiler

OptimizingCompiler::Compile

我们先来看看优化编译的入口点,Compile函数:

CompiledMethod* OptimizingCompiler::Compile(const DexFile::CodeItem* code_item,
                                            uint32_t access_flags,
                                            InvokeType invoke_type,
                                            uint16_t class_def_idx,
                                            uint32_t method_idx,
                                            jobject jclass_loader,
                                            const DexFile& dex_file) const {
  CompilerDriver* compiler_driver = GetCompilerDriver();
  CompiledMethod* method = nullptr;
  if (compiler_driver->IsMethodVerifiedWithoutFailures(method_idx, class_def_idx, dex_file) &&
      !compiler_driver->GetVerifiedMethod(&dex_file, method_idx)->HasRuntimeThrow()) {

校验成功的话,下面就调用TryCompile方法来完成编译。

     method = TryCompile(code_item, access_flags, invoke_type, class_def_idx,
                         method_idx, jclass_loader, dex_file);
  } else {
    if (compiler_driver->GetCompilerOptions().VerifyAtRuntime()) {
      MaybeRecordStat(MethodCompilationStat::kNotCompiledVerifyAtRuntime);
    } else {
      MaybeRecordStat(MethodCompilationStat::kNotCompiledClassNotVerified);
    }
  }

如果编译成功了,则直接返回。如果不成功,则再调用其他方式编译。

  if (method != nullptr) {
    return method;
  }
  method = delegate_->Compile(code_item, access_flags, invoke_type, class_def_idx, method_idx,
                              jclass_loader, dex_file);

  if (method != nullptr) {
    MaybeRecordStat(MethodCompilationStat::kCompiledQuick);
  }
  return method;
}

OptimizingCompiler::TryCompile

CompiledMethod* OptimizingCompiler::TryCompile(const DexFile::CodeItem* code_item,
                                               uint32_t access_flags,
                                               InvokeType invoke_type,
                                               uint16_t class_def_idx,
                                               uint32_t method_idx,
                                               jobject class_loader,
                                               const DexFile& dex_file) const {
  UNUSED(invoke_type);
  std::string method_name = PrettyMethod(method_idx, dex_file);
  MaybeRecordStat(MethodCompilationStat::kAttemptCompilation);
  CompilerDriver* compiler_driver = GetCompilerDriver();

对于ARM32位的架构,我们默认使用Thumb2指令集。
关于ARM指令集,我们之前在《ART世界探险(3) - ARM 64位CPU的架构快餐教程》中有过介绍。

  InstructionSet instruction_set = compiler_driver->GetInstructionSet();
  // Always use the thumb2 assembler: some runtime functionality (like implicit stack
  // overflow checks) assume thumb2.
  if (instruction_set == kArm) {
    instruction_set = kThumb2;
  }

如果无法优化,或者要求不优化,返直接返回

  // `run_optimizations_` is set explicitly (either through a compiler filter
  // or the debuggable flag). If it is set, we can run baseline. Otherwise, we
  // fall back to Quick.
  bool should_use_baseline = !run_optimizations_;
  bool can_optimize = CanOptimize(*code_item);
  if (!can_optimize && !should_use_baseline) {
    // We know we will not compile this method. Bail out before doing any work.
    return nullptr;
  }

指令集都不支持,那也就不用编了

  // Do not attempt to compile on architectures we do not support.
  if (!IsInstructionSetSupported(instruction_set)) {
    MaybeRecordStat(MethodCompilationStat::kNotCompiledUnsupportedIsa);
    return nullptr;
  }

如果不值得编的话,就不编了。
怎么算值不值得编呢?指令太多了的,或者是虚拟寄存器使用比较多的,就不编了,我们来看IsPathologicalCase的代码:

bool Compiler::IsPathologicalCase(const DexFile::CodeItem& code_item,
                                  uint32_t method_idx,
                                  const DexFile& dex_file) {
  /*
   * Skip compilation for pathologically large methods - either by instruction count or num vregs.
   * Dalvik uses 16-bit uints for instruction and register counts.  We'll limit to a quarter
   * of that, which also guarantees we cannot overflow our 16-bit internal Quick SSA name space.
   */
  if (code_item.insns_size_in_code_units_ >= UINT16_MAX / 4) {
    LOG(INFO) << "Method exceeds compiler instruction limit: "
              << code_item.insns_size_in_code_units_
              << " in " << PrettyMethod(method_idx, dex_file);
    return true;
  }
  if (code_item.registers_size_ >= UINT16_MAX / 4) {
    LOG(INFO) << "Method exceeds compiler virtual register limit: "
              << code_item.registers_size_ << " in " << PrettyMethod(method_idx, dex_file);
    return true;
  }
  return false;
}

UINT16_MAX是两个字节的最大整数,为65535,除以4大约是16384。

  if (Compiler::IsPathologicalCase(*code_item, method_idx, dex_file)) {
    MaybeRecordStat(MethodCompilationStat::kNotCompiledPathological);
    return nullptr;
  }

下面处理空间选项,大于128单元的代码项不编译,以节省空间。

  // Implementation of the space filter: do not compile a code item whose size in
  // code units is bigger than 128.
  static constexpr size_t kSpaceFilterOptimizingThreshold = 128;
  const CompilerOptions& compiler_options = compiler_driver->GetCompilerOptions();
  if ((compiler_options.GetCompilerFilter() == CompilerOptions::kSpace)
      && (code_item->insns_size_in_code_units_ > kSpaceFilterOptimizingThreshold)) {
    MaybeRecordStat(MethodCompilationStat::kNotCompiledSpaceFilter);
    return nullptr;
  }

下面我们的老朋友DexComplicationUnit又出来了。


DexComplicationUnit
  DexCompilationUnit dex_compilation_unit(
    nullptr, class_loader, art::Runtime::Current()->GetClassLinker(), dex_file, code_item,
    class_def_idx, method_idx, access_flags,
    compiler_driver->GetVerifiedMethod(&dex_file, method_idx));

然后我们构造一个HGraph对象,作用相当于quick Compiler中的MIRGraph.

  ArenaAllocator arena(Runtime::Current()->GetArenaPool());
  HGraph* graph = new (&arena) HGraph(
      &arena, dex_file, method_idx, compiler_driver->GetInstructionSet(),
      compiler_driver->GetCompilerOptions().GetDebuggable());

  // For testing purposes, we put a special marker on method names that should be compiled
  // with this compiler. This makes sure we're not regressing.
  bool shouldCompile = method_name.find("$opt$") != std::string::npos;
  bool shouldOptimize = method_name.find("$opt$reg$") != std::string::npos && run_optimizations_;

  std::unique_ptr<CodeGenerator> codegen(
      CodeGenerator::Create(graph,
                            instruction_set,
                            *compiler_driver->GetInstructionSetFeatures(),
                            compiler_driver->GetCompilerOptions()));
  if (codegen.get() == nullptr) {
    CHECK(!shouldCompile) << "Could not find code generator for optimizing compiler";
    MaybeRecordStat(MethodCompilationStat::kNotCompiledNoCodegen);
    return nullptr;
  }
  codegen->GetAssembler()->cfi().SetEnabled(
      compiler_driver->GetCompilerOptions().GetGenerateDebugInfo());

  PassInfoPrinter pass_info_printer(graph,
                                    method_name.c_str(),
                                    *codegen.get(),
                                    visualizer_output_.get(),
                                    compiler_driver);

下面构造一个HGraphBuilder对象,然后调用HGraphBuilder的BuildGraph方法构造这个图。

  HGraphBuilder builder(graph,
                        &dex_compilation_unit,
                        &dex_compilation_unit,
                        &dex_file,
                        compiler_driver,
                        compilation_stats_.get());

  VLOG(compiler) << "Building " << method_name;

  {
    PassInfo pass_info(HGraphBuilder::kBuilderPassName, &pass_info_printer);
    if (!builder.BuildGraph(*code_item)) {
      DCHECK(!(IsCompilingWithCoreImage() && shouldCompile))
          << "Could not build graph in optimizing compiler";
      return nullptr;
    }
  }

下一步,尝试对HGraph构造SSA。

  bool can_allocate_registers = RegisterAllocator::CanAllocateRegistersFor(*graph, instruction_set);

  if (run_optimizations_ && can_optimize && can_allocate_registers) {
    VLOG(compiler) << "Optimizing " << method_name;

    {
      PassInfo pass_info(SsaBuilder::kSsaBuilderPassName, &pass_info_printer);
      if (!graph->TryBuildingSsa()) {
        // We could not transform the graph to SSA, bailout.
        LOG(INFO) << "Skipping compilation of " << method_name << ": it contains a non natural loop";
        MaybeRecordStat(MethodCompilationStat::kNotCompiledCannotBuildSSA);
        return nullptr;
      }
    }

    return CompileOptimized(graph,
                            codegen.get(),
                            compiler_driver,
                            dex_file,
                            dex_compilation_unit,
                            &pass_info_printer);
  } else if (shouldOptimize && can_allocate_registers) {
    LOG(FATAL) << "Could not allocate registers in optimizing compiler";
    UNREACHABLE();
  } else if (should_use_baseline) {

下面是不优化的情况:

    VLOG(compiler) << "Compile baseline " << method_name;

    if (!run_optimizations_) {
      MaybeRecordStat(MethodCompilationStat::kNotOptimizedDisabled);
    } else if (!can_optimize) {
      MaybeRecordStat(MethodCompilationStat::kNotOptimizedTryCatch);
    } else if (!can_allocate_registers) {
      MaybeRecordStat(MethodCompilationStat::kNotOptimizedRegisterAllocator);
    }

    return CompileBaseline(codegen.get(), compiler_driver, dex_compilation_unit);
  } else {
    return nullptr;
  }
}

OptimizingCompiler::CompileOptimized

下面我们再看看优化编译的部分:

CompiledMethod* OptimizingCompiler::CompileOptimized(HGraph* graph,
                                                     CodeGenerator* codegen,
                                                     CompilerDriver* compiler_driver,
                                                     const DexFile& dex_file,
                                                     const DexCompilationUnit& dex_compilation_unit,
                                                     PassInfoPrinter* pass_info_printer) const {
  StackHandleScopeCollection handles(Thread::Current());

下面开始调用RunOptimizations进行优化,一共15轮优化。
AllocateRegisters是第16轮优化,liveness优化。

  RunOptimizations(graph, compiler_driver, compilation_stats_.get(),
                   dex_file, dex_compilation_unit, pass_info_printer, &handles);

  AllocateRegisters(graph, codegen, pass_info_printer);

下面是生成目标代码:

  CodeVectorAllocator allocator;
  codegen->CompileOptimized(&allocator);

  DefaultSrcMap src_mapping_table;
  if (compiler_driver->GetCompilerOptions().GetGenerateDebugInfo()) {
    codegen->BuildSourceMap(&src_mapping_table);
  }

  std::vector<uint8_t> stack_map;
  codegen->BuildStackMaps(&stack_map);

  MaybeRecordStat(MethodCompilationStat::kCompiledOptimized);

最后分配空间,将编译的方法保存起来:

  return CompiledMethod::SwapAllocCompiledMethod(
      compiler_driver,
      codegen->GetInstructionSet(),
      ArrayRef<const uint8_t>(allocator.GetMemory()),
      // Follow Quick's behavior and set the frame size to zero if it is
      // considered "empty" (see the definition of
      // art::CodeGenerator::HasEmptyFrame).
      codegen->HasEmptyFrame() ? 0 : codegen->GetFrameSize(),
      codegen->GetCoreSpillMask(),
      codegen->GetFpuSpillMask(),
      &src_mapping_table,
      ArrayRef<const uint8_t>(),  // mapping_table.
      ArrayRef<const uint8_t>(stack_map),
      ArrayRef<const uint8_t>(),  // native_gc_map.
      ArrayRef<const uint8_t>(*codegen->GetAssembler()->cfi().data()),
      ArrayRef<const LinkerPatch>());
}

RunOptimizations

优化的过程:

  1. IntrinsicsRecognizer
  2. 第一次HConstantFolding
  3. 第二次InstructionSimplifier
  4. HDeadCodeElimination
  5. HInliner
  6. HBooleanSimplifier
  7. 第二次HConstantFolding
  8. SideEffectsAnalysis
  9. GVNOptimization
  10. LICM
  11. BoundsCheckElimination
  12. ReferenceTypePropagation
  13. 第二次InstructionSimplifier
  14. 第二次HDeadCodeElimination
  15. 第三次InstructionSimplifier
static void RunOptimizations(HGraph* graph,
                             CompilerDriver* driver,
                             OptimizingCompilerStats* stats,
                             const DexFile& dex_file,
                             const DexCompilationUnit& dex_compilation_unit,
                             PassInfoPrinter* pass_info_printer,
                             StackHandleScopeCollection* handles) {
  HDeadCodeElimination dce1(graph, stats,
                            HDeadCodeElimination::kInitialDeadCodeEliminationPassName);
  HDeadCodeElimination dce2(graph, stats,
                            HDeadCodeElimination::kFinalDeadCodeEliminationPassName);
  HConstantFolding fold1(graph);
  InstructionSimplifier simplify1(graph, stats);
  HBooleanSimplifier boolean_simplify(graph);

  HInliner inliner(graph, dex_compilation_unit, dex_compilation_unit, driver, stats);

  HConstantFolding fold2(graph, "constant_folding_after_inlining");
  SideEffectsAnalysis side_effects(graph);
  GVNOptimization gvn(graph, side_effects);
  LICM licm(graph, side_effects);
  BoundsCheckElimination bce(graph);
  ReferenceTypePropagation type_propagation(graph, dex_file, dex_compilation_unit, handles);
  InstructionSimplifier simplify2(graph, stats, "instruction_simplifier_after_types");
  InstructionSimplifier simplify3(graph, stats, "instruction_simplifier_before_codegen");

  IntrinsicsRecognizer intrinsics(graph, dex_compilation_unit.GetDexFile(), driver);

  HOptimization* optimizations[] = {
    &intrinsics,
    &fold1,
    &simplify1,
    &dce1,
    &inliner,
    // BooleanSimplifier depends on the InstructionSimplifier removing redundant
    // suspend checks to recognize empty blocks.
    &boolean_simplify,
    &fold2,
    &side_effects,
    &gvn,
    &licm,
    &bce,
    &type_propagation,
    &simplify2,
    &dce2,
    // The codegen has a few assumptions that only the instruction simplifier can
    // satisfy. For example, the code generator does not expect to see a
    // HTypeConversion from a type to the same type.
    &simplify3,
  };

  RunOptimizations(optimizations, arraysize(optimizations), pass_info_printer);
}

AllocateRegisters

进行liveness优化:

static void AllocateRegisters(HGraph* graph,
                              CodeGenerator* codegen,
                              PassInfoPrinter* pass_info_printer) {
  PrepareForRegisterAllocation(graph).Run();
  SsaLivenessAnalysis liveness(graph, codegen);
  {
    PassInfo pass_info(SsaLivenessAnalysis::kLivenessPassName, pass_info_printer);
    liveness.Analyze();
  }
  {
    PassInfo pass_info(RegisterAllocator::kRegisterAllocatorPassName, pass_info_printer);
    RegisterAllocator(graph->GetArena(), codegen, liveness).AllocateRegisters();
  }
}

CodeGenerator::CompiledOptimized

主要逻辑拆成Initialize()和CompileInternal()两部分

void CodeGenerator::CompileOptimized(CodeAllocator* allocator) {
  // The register allocator already called `InitializeCodeGeneration`,
  // where the frame size has been computed.
  DCHECK(block_order_ != nullptr);
  Initialize();
  CompileInternal(allocator, /* is_baseline */ false);
}

CodeGenerator::CompileInternal

void CodeGenerator::CompileInternal(CodeAllocator* allocator, bool is_baseline) {
  is_baseline_ = is_baseline;
  HGraphVisitor* instruction_visitor = GetInstructionVisitor();
  DCHECK_EQ(current_block_index_, 0u);
  GenerateFrameEntry();
  DCHECK_EQ(GetAssembler()->cfi().GetCurrentCFAOffset(), static_cast<int>(frame_size_));
  for (size_t e = block_order_->Size(); current_block_index_ < e; ++current_block_index_) {
    HBasicBlock* block = block_order_->Get(current_block_index_);
    // Don't generate code for an empty block. Its predecessors will branch to its successor
    // directly. Also, the label of that block will not be emitted, so this helps catch
    // errors where we reference that label.
    if (block->IsSingleGoto()) continue;

到Bind的时候,已经进入跟具体架构相关的生成代码的部分了。比如针对arm64架构,就是CodeGeneratorARM64的Bind:

void CodeGeneratorARM64::Bind(HBasicBlock* block) {
  __ Bind(GetLabelOf(block));
}

Bind之后,对于block之中的指令进行遍历。

    Bind(block);
    for (HInstructionIterator it(block->GetInstructions()); !it.Done(); it.Advance()) {
      HInstruction* current = it.Current();
      if (is_baseline) {
        InitLocationsBaseline(current);
      }
      DCHECK(CheckTypeConsistency(current));
      current->Accept(instruction_visitor);
    }
  }

下面处理慢路径下的操作:

  // Generate the slow paths.
  for (size_t i = 0, e = slow_paths_.Size(); i < e; ++i) {
    slow_paths_.Get(i)->EmitNativeCode(this);
  }

最后调用汇编器生成指令

  // Finalize instructions in assember;
  Finalize(allocator);
}

CodeGenerator::Finalize

void CodeGenerator::Finalize(CodeAllocator* allocator) {
  size_t code_size = GetAssembler()->CodeSize();
  uint8_t* buffer = allocator->Allocate(code_size);

  MemoryRegion code(buffer, code_size);
  GetAssembler()->FinalizeInstructions(code);
}

Arm64Assembler::FinalizeInstructions

下面是Arm64Assembler的FinalizeInstructions,其它芯片用的是通用的版本。

void Arm64Assembler::FinalizeInstructions(const MemoryRegion& region) {
  // Copy the instructions from the buffer.
  MemoryRegion from(vixl_masm_->GetStartAddress<void*>(), CodeSize());
  region.CopyFrom(0, from);
}

CompiledMethod::SwapAllocCompiledMethod

最后看下SwapAllocCompiledMethod,基本上就是分配空间了。

CompiledMethod* CompiledMethod::SwapAllocCompiledMethod(
    CompilerDriver* driver,
    InstructionSet instruction_set,
    const ArrayRef<const uint8_t>& quick_code,
    const size_t frame_size_in_bytes,
    const uint32_t core_spill_mask,
    const uint32_t fp_spill_mask,
    DefaultSrcMap* src_mapping_table,
    const ArrayRef<const uint8_t>& mapping_table,
    const ArrayRef<const uint8_t>& vmap_table,
    const ArrayRef<const uint8_t>& native_gc_map,
    const ArrayRef<const uint8_t>& cfi_info,
    const ArrayRef<const LinkerPatch>& patches) {
  SwapAllocator<CompiledMethod> alloc(driver->GetSwapSpaceAllocator());
  CompiledMethod* ret = alloc.allocate(1);
  alloc.construct(ret, driver, instruction_set, quick_code, frame_size_in_bytes, core_spill_mask,
                  fp_spill_mask, src_mapping_table, mapping_table, vmap_table, native_gc_map,
                  cfi_info, patches);
  return ret;
}
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 205,386评论 6 479
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 87,939评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 151,851评论 0 341
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,953评论 1 278
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,971评论 5 369
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,784评论 1 283
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,126评论 3 399
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,765评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 43,148评论 1 300
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,744评论 2 323
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,858评论 1 333
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,479评论 4 322
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,080评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,053评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,278评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,245评论 2 352
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,590评论 2 343

推荐阅读更多精彩内容