kernel小版本升级和dex2oat的冲突问题

问题现象

  • 复现步骤
    • 手机(Android 8.0) + kernel版本4.4
    • 操作过程
      • 刷入userdebug版本
      • 首次启动,低概率出现dex2oat crash

定位分析

  • 相关log
    • tombstone
      Revision: '0'
      ABI: 'arm'
      pid: 1122, tid: 1122, name: dex2oat  >>> /system/bin/dex2oat <<<
      signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
      Abort message: 'heap.cc:386] Check failed: non_moving_space_mem_map != nullptr Failed to mmap at expected address, mapped at 0xac200000 instead of 0x0ec00000 : Requested region 0x0ec00000-0x12c00000 overlaps with existing map 0x0fa73000-0x0fa92000 (/system/bin/dex2oat)'
          r0 00000000  r1 00000462  r2 00000006  r3 00000008
          r4 00000462  r5 00000462  r6 be8250f0  r7 0000010c
          r8 0000000d  r9 be825151  sl 0000000a  fp be825150
          ip 00000000  sp be8250e0  lr b0860417  pc b089097c  cpsr 200f0010
          d0  2e656d69746e7572  d1  2d72203030303166
          d2  65747379732f2065  d3  696c2f6e69622f72
          d4  3a752f5f5f736569  d5  725f7463656a626f
          d6  746c75616665643a  d7  30733a706f72705f
          d8  0000000000000000  d9  0000000000000000
          d10 0000000000000000  d11 0000000000000000
          d12 0000000000000000  d13 0000000000000000
          d14 0000000000000000  d15 0000000000000000
          d16 2e74736973726570  d17 2e6761742e676f6c
          d18 000c54b4b00ba000  d19 0000000000000000
          d20 0000000800000008  d21 0000000800000008
          d22 0000000400000004  d23 0000000400000004
          d24 0000238100000381  d25 0000638100004381
          d26 0000238000000380  d27 0000638000004380
          d28 0000000000000000  d29 0000000000000000
          d30 0000000000000000  d31 0000000000000000
          scr 30000011
      backtrace:
          #00 pc 0004a97c  /system/lib/libc.so (tgkill+12)
          #01 pc 0001a413  /system/lib/libc.so (abort+54)
          #02 pc 003385ff  /system/lib/libart.so (_ZN3art7Runtime5AbortEPKc+230)
          #03 pc 00338c8f  /system/lib/libart.so (_ZN3art7Runtime7AborterEPKc+10)
          #04 pc 003efacd  /system/lib/libart.so (_ZN7android4base10LogMessageD1Ev+456)
          #05 pc 0018f14b  /system/lib/libart.so (_ZN3art2gc4HeapC1EjjjjddjjRKNSt3__112basic_stringIcNS2_11char_traitsIcEENS2_9allocatorIcEEEENS_14InstructionSetENS0_13CollectorTypeESC_NS0_5space20LargeObjectSpaceTypeEjjjbjjbbbbbbbbbbby+7258)
          #06 pc 0033a655  /system/lib/libart.so (_ZN3art7Runtime4InitEONS_18RuntimeArgumentMapE+6460)
          #07 pc 00338cc7  /system/lib/libart.so (_ZN3art7Runtime6CreateEONS_18RuntimeArgumentMapE+50)
          #08 pc 00014027  /system/bin/dex2oat (_ZN3art7Dex2Oat13CreateRuntimeEONS_18RuntimeArgumentMapE+34)
          #09 pc 0000b29f  /system/bin/dex2oat (_ZN3art7Dex2Oat5SetupEv+606)
          #10 pc 000080b3  /system/bin/dex2oat (main+514)
          #11 pc 00017d35  /system/lib/libc.so (__libc_init+48)
          #12 pc 00007dd0  /system/bin/dex2oat (_start+96)
      

初步分析

  • 问题的直接原因是dex2oat直接abort, 如下:

    Abort message: 'heap.cc:386] Check failed: non_moving_space_mem_map != nullptr Failed to mmap at expected address, mapped at 0xac200000 instead of 0x0ec00000 : Requested region 0x0ec00000-0x12c00000 overlaps with existing map 0x0fa73000-0x0fa92000 (/system/bin/dex2oat)'
    
  • 相关code

    Heap::Heap() {
    ... ...
    374   if (separate_non_moving_space) {
    375     ScopedTrace trace2("Create separate non moving space");
    376     // If we are the zygote, the non moving space becomes the zygote space when we run
    377     // PreZygoteFork the first time. In this case, call the map "zygote space" since we can't
    378     // rename the mem map later.
    379     const char* space_name = is_zygote ? kZygoteSpaceName : kNonMovingSpaceName;
    380     // Reserve the non moving mem map before the other two since it needs to be at a specific
    381     // address.
    382     non_moving_space_mem_map.reset(
    383         MemMap::MapAnonymous(space_name, requested_alloc_space_begin,
    384                              non_moving_space_capacity, PROT_READ | PROT_WRITE, true, false,
    385                              &error_str));
    386     CHECK(non_moving_space_mem_map != nullptr) << error_str;
    ... ...
    }
    

    这里check fail的原因是MemMap::MapAnonymous()返回了nullptr.

  • MemMap::MapAnonymous()相关code

    MemMap* MemMap::MapAnonymous(const char* name,
                               uint8_t* expected_ptr,
                               size_t byte_count,
                               int prot,
                               bool low_4gb,
                               bool reuse,
                               std::string* error_msg,
                               bool use_ashmem) {
    ... ...
    void* actual = MapInternal(expected_ptr,
                               page_aligned_byte_count,
                               prot,
                               flags,
                               fd.get(),
                               0,
                               low_4gb);
    saved_errno = errno;
    
    if (actual == MAP_FAILED) {
      if (error_msg != nullptr) {
        if (kIsDebugBuild || VLOG_IS_ON(oat)) {
          PrintFileToLog("/proc/self/maps", LogSeverity::WARNING);
        }
    
        *error_msg = StringPrintf("Failed anonymous mmap(%p, %zd, 0x%x, 0x%x, %d, 0): %s. "
                                      "See process maps in the log.",
                                  expected_ptr,
                                  page_aligned_byte_count,
                                  prot,
                                  flags,
                                  fd.get(),
                                  strerror(saved_errno));
      }
      return nullptr;
    }
    if (!CheckMapRequest(expected_ptr, actual, page_aligned_byte_count, error_msg)) {
      return nullptr;
    }
    ... ...
    }
    

    根据代码逻辑,在调用CheckMapRequest()后返回false.

  • CheckMapRequest()相关code

    static bool CheckMapRequest(uint8_t* expected_ptr, void* actual_ptr, size_t byte_count,
                              std::string* error_msg) {
    // Handled first by caller for more specific error messages.
    CHECK(actual_ptr != MAP_FAILED);
    
    if (expected_ptr == nullptr) {
      return true;
    }
    
    uintptr_t actual = reinterpret_cast<uintptr_t>(actual_ptr);
    uintptr_t expected = reinterpret_cast<uintptr_t>(expected_ptr);
    uintptr_t limit = expected + byte_count;
    
    if (expected_ptr == actual_ptr) {
      return true;
    }
    
    // We asked for an address but didn't get what we wanted, all paths below here should fail.
    int result = munmap(actual_ptr, byte_count);
    if (result == -1) {
      PLOG(WARNING) << StringPrintf("munmap(%p, %zd) failed", actual_ptr, byte_count);
    }
    
    if (error_msg != nullptr) {
      // We call this here so that we can try and generate a full error
      // message with the overlapping mapping. There's no guarantee that
      // that there will be an overlap though, since
      // - The kernel is not *required* to honor expected_ptr unless MAP_FIXED is
      //   true, even if there is no overlap
      // - There might have been an overlap at the point of mmap, but the
      //   overlapping region has since been unmapped.
      std::string error_detail;
      CheckNonOverlapping(expected, limit, &error_detail);
      std::ostringstream os;
      os <<  StringPrintf("Failed to mmap at expected address, mapped at "
                          "0x%08" PRIxPTR " instead of 0x%08" PRIxPTR,
                          actual, expected);
      if (!error_detail.empty()) {
        os << " : " << error_detail;
      }
      *error_msg = os.str();
     }
    return false;
    }
    

    这里代码很简单,判断expected_ptr和actual_ptr

    • 相等: 返回true
    • 不相等: 返回false, 并将error message保存
  • 根据代码逻辑,初步总结如下:
    • 进程要将dalvik-non moving space映射到指定的虚拟地址0ec00000-12c00000(起始地址为300 * MB - 64 * MB, 大小为64MB)
    • 实际mmap到0xac200000, 原因是dex2oat占用了0x0fa73000-0x0fa92000, 产生了冲突.
      0fa73000-0fa91fff r-x         0     1f000  /system/bin/dex2oat (BuildId: 98d4fb6e838289e7a2bc42b4caf1d455)
      0fa93000-0fa93fff r--     1f000      1000  /system/bin/dex2oat
      0fa94000-0fa94fff rw-         0      1000
      
      
    • 最后抛出错误信息
      Abort message: 'heap.cc:386] Check failed: non_moving_space_mem_map != nullptr Failed to mmap at expected address, mapped at 0xac200000 instead of 0x0ec00000 : Requested region 0x0ec00000-0x12c00000 overlaps with existing map 0x0fa73000-0x0fa92000 (/system/bin/dex2oat)'
      

对比实验

  • 使用同一机器旧版本测试,发现dex2oat是映射到高地址

    0ec00000-0ec01000 rw-p 00000000 00:05 12697      /dev/ashmem/dalvik-non moving space (deleted)
    0ec01000-12c00000 ---p 00001000 00:05 12697      /dev/ashmem/dalvik-non moving space (deleted)
    12c00000-1ac00000 rw-p 00000000 00:05 12700      /dev/ashmem/dalvik-main space (region space) (deleted)
    8cd23000-8cd42000 r-xp 00000000 103:0b 287       /system/bin/dex2oat
    8cd43000-8cd44000 r--p 0001f000 103:0b 287       /system/bin/dex2oat
    ... ...
    
  • Nexus 6p(OPR6.170623.013)也是将可执行程序映射到高地址(kernel 3.10版本)

    ... ...
    727d3000-727da000 r--p 00000000 103:0b 1108                              /system/framework/arm/boot-android.hidl.manager-V1.0-java.oat
    727da000-727db000 rw-p 00000000 00:04 450565                             /dev/ashmem/dalvik-Zero-initialized segment 3 of ELF file /data/dalvik-cache/arm/system@framework@boot-android.hidl.manager-V1.0-java.oat (deleted)
    727db000-727dc000 r--p 00007000 103:0b 1108                              /system/framework/arm/boot-android.hidl.manager-V1.0-java.oat
    727dc000-727dd000 rw-p 00008000 103:0b 1108                              /system/framework/arm/boot-android.hidl.manager-V1.0-java.oat
    727dd000-729a2000 rw-p 00000000 00:04 450566                             /dev/ashmem/dalvik-non moving space (deleted)
    729a2000-767dd000 ---p 001c5000 00:04 450566                             /dev/ashmem/dalvik-non moving space (deleted)
    ab1d5000-ab1f4000 r-xp 00000000 103:0b 342                               /system/bin/dex2oat
    ab1f5000-ab1f6000 r--p 0001f000 103:0b 342                               /system/bin/dex2oat
    ... ...
    
  • 现在可以得出初步结论

    可执行程序在最近版本的映射,由高地址变为低地址,直接导致了问题的发生.

深入分析

  • 经过版本排查,发现问题跟最近kernel小版本升级有关.
  • 相关patch如下
    commit d2471b5e84f32de4e09b58f5436a4ce3ee935e32
    Author: Kees Cook <keescook@chromium.org>
    Date:   Mon Jul 10 15:52:40 2017 -0700
    
      arm: move ELF_ET_DYN_BASE to 4MB
      
      commit 6a9af90a3bcde217a1c053e135f5f43e5d5fafbd upstream.
      
      Now that explicitly executed loaders are loaded in the mmap region, we
      have more freedom to decide where we position PIE binaries in the
      address space to avoid possible collisions with mmap or stack regions.
      
      4MB is chosen here mainly to have parity with x86, where this is the
      traditional minimum load location, likely to avoid historically
      requiring a 4MB page table entry when only a portion of the first 4MB
      would be used (since the NULL address is avoided).
      
      For ARM the position could be 0x8000, the standard ET_EXEC load address,
      but that is needlessly close to the NULL address, and anyone running PIE
      on 32-bit ARM will have an MMU, so the tight mapping is not needed.
      
      Link: http://lkml.kernel.org/r/1498154792-49952-2-git-send-email-keescook@chromium.org
      Signed-off-by: Kees Cook <keescook@chromium.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Pratyush Anand <panand@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
      Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Qualys Security Advisory <qsa@qualys.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    
      diff --git a/arch/arm/include/asm/elf.h b/arch/arm/include/asm/elf.h
      index d2315ff..f13ae15 100644
      --- a/arch/arm/include/asm/elf.h
      +++ b/arch/arm/include/asm/elf.h
      @@ -112,12 +112,8 @@ int dump_task_regs(struct task_struct *t, elf_gregset_t *elfregs);
       #define CORE_DUMP_USE_REGSET
       #define ELF_EXEC_PAGESIZE      4096
    
      -/* This is the location that an ET_DYN program is loaded if exec'ed.  Typical
      -   use of this is to invoke "./ld.so someprog" to test out a new version of
      -   the loader.  We need to make sure that it is out of the way of the program
      -   that it will "exec", and that there is sufficient room for the brk.  */
      -
      -#define ELF_ET_DYN_BASE        (TASK_SIZE / 3 * 2)
      +/* This is the base location for PIE (ET_DYN with INTERP) loads. */
      +#define ELF_ET_DYN_BASE                0x400000UL
    
      /* When the program starts, a1 contains a pointer to a function to be 
      registered with atexit, as per the SVR4 ABI.  A value of 0 means we 
    
  • kernel patch分析
    • load_elf_binary()关键代码如下:

      static int load_elf_binary(struct linux_binprm *bprm)
      {
       ... ...
      
                          if (elf_interpreter) {
                                  load_bias = ELF_ET_DYN_BASE;
                                  if (current->flags & PF_RANDOMIZE)
                                          load_bias += arch_mmap_rnd();
                                  elf_flags |= MAP_FIXED;
                          } else
                                  load_bias = 0;
      
                          /*
                           * Since load_bias is used for all subsequent loading
                           * calculations, we must lower it by the first vaddr
                           * so that the remaining calculations based on the
                           * ELF vaddrs will be correctly offset. The result
                           * is then page aligned.
                           */
                          load_bias = ELF_PAGESTART(load_bias - vaddr);
      
                          total_size = total_mapping_size(elf_phdata,
                                                          loc->elf_ex.e_phnum);
                          if (!total_size) {
                                  retval = -EINVAL;
                                  goto out_free_dentry;
                          }
                  }
      
                  error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt,
                                  elf_prot, elf_flags, total_size);
      ... ...
      
    • arch_mmap_rnd()

      unsigned long arch_mmap_rnd(void)
      {
        unsigned long rnd;
      
        rnd = get_random_long() & ((1UL << mmap_rnd_bits) - 1);
      
        return rnd << PAGE_SHIFT;
      }
      

      这里先调用get_random_long()获取一个随机数,假设为0xXXXXXXXX,

      (0xXXXXXXXX & ((1 << 16) - 1)) << 12 = 0x0XXXX000,最后返回的结果不会超过0x0ffff000

    • patch引入之前

      • ELF_ET_DYN_BASE = (TASK_SIZE / 3 * 2) = (0xC0000000 - 0x01000000) / 3 * 2 = 0x7f555554

      • load_bias = 0x7f555554

      • load_bias += random(不超过0x0ffff000)

      • load_bias最后再和vaddr做page对齐

      • 所以在32bit arm上看到可执行程序经常映射到
        0x8XXXXXXX的. 如:

        8cd23000-8cd42000 r-xp 00000000 103:0b 287       /system/bin/dex2oat
        8cd43000-8cd44000 r--p 0001f000 103:0b 287       /system/bin/dex2oat
        
        8d268000-8d26d000 r-xp 00000000 103:0b 231       /system/bin/app_process32
        8d26d000-8d26e000 r--p 00004000 103:0b 231       /system/bin/app_process32
        
    • patch引入之后

      • ELF_ET_DYN_BASE = 0x400000

      • load_bias = 0x400000

      • load_bias += random(不超过0x0ffff000)

      • load_bias最后再和vaddr做page对齐

      • 所以在32bit arm上看到可执行程序经常映射到
        0x10000000以内. 如:

        
        0fa73000-0fa91fff r-x         0     1f000  /system/bin/dex2oat (BuildId: 98d4fb6e838289e7a2bc42b4caf1d455)
        0fa93000-0fa93fff r--     1f000      1000  /system/bin/dex2oat
        
        0564c000-05651000 r-xp 00000000 103:0b 230       /system/bin/app_process32
        05651000-05652000 r--p 00004000 103:0b 230       /system/bin/app_process32
        05652000-05653000 rw-p 00000000 00:00 0
        
      • 所以,这里就容易和ART默认要mmap到的0x0ec00000-0x12c00000产生冲突.

Root Cause

  • 该问题本质上是kernel与art关于虚拟地址空间的映射范围产生潜在的冲突.

解决方案

  • 已将此问题反馈给Google的Vladimir Marko(vmarko@google.com)

    回复如下:

    I guess we may need to rethink the address range.

  • patch

    目前先临时回退kernel的相关提交,后续问题得到明确修复后再修改.

  • 验证

    基于回退后的版本,可执行程序已能映射到高地址.

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 205,132评论 6 478
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 87,802评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 151,566评论 0 338
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,858评论 1 277
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,867评论 5 368
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,695评论 1 282
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,064评论 3 399
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,705评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 42,915评论 1 300
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,677评论 2 323
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,796评论 1 333
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,432评论 4 322
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,041评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,992评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,223评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,185评论 2 352
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,535评论 2 343

推荐阅读更多精彩内容