场景需求
线上app运行过程中有内存突变、卡顿、cpu飙升、crash等情况,需要获取发生这些情况时的所有堆栈信息,以此来辅助定位问题
1. callStackSymbols
只能获取当前堆栈信息,不能获取指定其他线程的信息,所以不满足要求
[NSThread callStackSymbols];
0 LXDAppFluecyMonitor 0x0000000102a30699 -[ViewController tableView:didSelectRowAtIndexPath:] + 89,
1 UIKitCore 0x0000000116721902 -[UITableView _selectRowAtIndexPath:animated:scrollPosition:notifyDelegate:isCellMultiSelect:deselectPrevious:] + 1962,
2 UIKitCore 0x000000011672113d -[UITableView _selectRowAtIndexPath:animated:scrollPosition:notifyDelegate:] + 94,
3 UIKitCore 0x0000000116721bcb -[UITableView _userSelectRowAtPendingSelectionIndexPath:] + 341,
4 UIKitCore 0x0000000116a322d5 -[_UIAfterCACommitBlock run] + 54,
5 UIKitCore 0x0000000116a327cd -[_UIAfterCACommitQueue flush] + 190,
6 libdispatch.dylib 0x000000010c7d6816 _dispatch_call_block_and_release + 12,
7 libdispatch.dylib 0x000000010c7d7a5b _dispatch_client_callout + 8,
8 libdispatch.dylib 0x000000010c7e6325 _dispatch_main_queue_drain + 1169,
9 libdispatch.dylib 0x000000010c7e5e86 _dispatch_main_queue_callback_4CF + 31,
10 CoreFoundation 0x000000010b5d6261 __CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__ + 9,
2. Mach Thread
思路
- 通过内核API获取所有线程列表
- 遍历每个
pthread_t
,获取线程上下文信息_STRUCT_MCONTEXT - 通过context获得栈帧指针,然后不断调用previous获得当前线程的所有调用堆栈
- 通过栈帧指针获得函数调用地址
- 通过_dyld_image相关API遍历所有image镜像
- 找到load commands的LC_SEGMENT(__TEXT)中包含函数地址的镜像
- 获取ASLR,然后找到函数地址在符号表中对应的位置
- 然后去字符表中查找函数名字
获取堆栈函数调用地址
-
所有线程:调用内核API函数
task_threads
获取指定task线程列表,即list
thread_act_array_t list; mach_msg_type_number_t count; task_threads(mach_task_self(), &list, &count);
-
指定线程:调用API函数
pthread_from_mach_thread_np
获得对应线程pthread_t
,非UI线程比较namefor (int idx = 0; idx < count; idx++) { pthread_t pt = pthread_from_mach_thread_np(list[idx]); if ([nsthread isMainThread] && list[idx] == main_thread_id) { return list[idx]; } if (pt) { name[0] = '\0'; pthread_getname_np(pt, name, sizeof(name)); if (!strcmp(name, [nsthread name].UTF8String)) { [nsthread setName: originName]; return list[idx]; } } }
-
线程信息:调用
thread_get_state
获得指定线程上下问信息_STRUCT_MCONTEXT
。thread_get_stateAPI两个参数随着cpu架构不同而改变。_STRUCT_MCONTEXT
结构存储当前线程栈顶指针(stack pointer)和最顶部的栈帧指针(frame pointer),从而获得整个线程的调用栈。
thread_get_state传入thread,_STRUCT_MCONTEXT->__ss(寄存器指针结构体)
,以及cpu相关常量(target_act,old_stateCnt)
,来实现_STRUCT_MCONTEXT赋值bool lxd_fillThreadStateIntoMachineContext(thread_t thread, _STRUCT_MCONTEXT * machineContext) { mach_msg_type_number_t state_count = LXD_THREAD_STATE_COUNT; kern_return_t kr = thread_get_state(thread, LXD_THREAD_STATE, (thread_state_t)&machineContext->__ss, &state_count); return (kr == KERN_SUCCESS); }
-
栈帧结构体赋值
vm_read_overwrite
- 栈帧结构体
typedef struct StackFrameEntry{ const struct StackFrameEntry *const previous; //前一个栈帧地址 const uintptr_t return_address; //栈帧的函数返回地址 } StackFrameEntry;
- 通过上一步获取的machineContext获取第一个栈帧指针
lxd_mach_copyMem((void *)machineContext->__ss.LXD_FRAME_POINTER, &frame, sizeof(frame))
打印frame//参数src:栈帧指针 //参数dst:StackFrameEntry实例指针 //参数numBytes:StackFrameEntry结构体大小 kern_return_t lxd_mach_copyMem(const void * src, const void * dst, const size_t numBytes) { vm_size_t bytesCopied = 0; // 调用api函数,根据栈帧指针获取该栈帧对应的函数地址 return vm_read_overwrite(mach_task_self(), (vm_address_t)src, (vm_size_t)numBytes, (vm_address_t)dst, &bytesCopied); }
Printing description of frame: (LXDStackFrameEntry) frame = { previous = 0x0000000109f6cb68 return_address = 11598032417672659023 }
- 通过
frame.previous
获取前一个栈帧地址,不断遍历,获得当前线程所有函数调用的地址//循环遍历,停止条件MAX_FRAME_NUMBER栈帧个数 for (; idx < MAX_FRAME_NUMBER; idx++) { //栈帧函数赋值 backtraceBuffer[idx] = frame.return_address; if (backtraceBuffer[idx] == FAILED_UINT_PTR_ADDRESS || frame.previous == NULL || //根据当前的栈帧的previous,获取前一个栈帧地址 lxd_mach_copyMem(frame.previous, &frame, sizeof(frame)) != KERN_SUCCESS) { break; } }
- 栈帧结构体
获得堆栈调用函数名
关于Mach-O的相关知识可以看这篇文章:https://www.coderzhou.com/2019/06/05/fishhook/#Mach-O
源码参考:https://github.com/bestswifter/BSBacktraceLogger
- 创建一个和上面backtraceBuffer长度一样的Dl_info数组
Dl_info symbolicated[backtraceLength];
- 逐个遍历backtraceBuffer,获取对应的符号信息添加到symbolicated中
- 找到栈帧地址对应的image镜像
- 遍历镜像,通过
_dyld_get_image_vmaddr_slide
获取ASLR偏移地址,计算出调用函数栈帧地址在mach-O文件中的地址 - 遍历mach-o的load commands找到
LC_SEGMENT
段 - 计算调用函数在mach-o中的地址是否包含在
LC_SEGMENT
段中 - 返回镜像idx
uint32_t lxd_imageIndexContainingAddress(const uintptr_t address) { const uint32_t imageCount = _dyld_image_count(); const struct mach_header * header = FAILED_UINT_PTR_ADDRESS; for (uint32_t iImg = 0; iImg < imageCount; iImg++) { header = _dyld_get_image_header(iImg); if (header != NULL) { // ASLR: _dyld_get_image_vmaddr_slide获取偏移slide uintptr_t addressWSlide = address - (uintptr_t)_dyld_get_image_vmaddr_slide(iImg); uintptr_t cmdPtr = lxd_firstCmdAfterHeader(header); if (cmdPtr == FAILED_UINT_PTR_ADDRESS) { continue; } for (uint32_t iCmd = 0; iCmd < header->ncmds; iCmd++) { const struct load_command * loadCmd = (struct load_command *)cmdPtr; if (loadCmd->cmd == LC_SEGMENT) { const struct segment_command * segCmd = (struct segment_command *)cmdPtr; if (addressWSlide >= segCmd->vmaddr && addressWSlide < segCmd->vmaddr + segCmd->vmsize) { return iImg; } } else if (loadCmd->cmd == LC_SEGMENT_64) { const struct segment_command_64 * segCmd = (struct segment_command_64 *)cmdPtr; if (addressWSlide >= segCmd->vmaddr && addressWSlide < segCmd->vmaddr + segCmd->vmsize) { char *image_name = (char *)_dyld_get_image_name(iImg); const struct mach_header *mh = _dyld_get_image_header(iImg); intptr_t vmaddr_slide = _dyld_get_image_vmaddr_slide(iImg); printf("Image name %s at address 0x%llx and ASLR slide 0x%lx.\n", image_name, (mach_vm_address_t)mh, vmaddr_slide); return iImg; } } cmdPtr += loadCmd->cmdsize; } } } return UINT_MAX; }
用MachOView查看,和上面获取的数据是一致的
打印出segCmd的虚拟内存结束的地址,判断函数虚拟内存地址是否在当前段中
- 遍历镜像,通过
- 找到对应镜像中load commands的起始段地址,这里正好是代码段
__TEXT
uintptr_t lxd_segmentBaseOfImageIndex(const uint32_t idx) { const struct mach_header * header = _dyld_get_image_header(idx); uintptr_t cmdPtr = lxd_firstCmdAfterHeader(header); if (cmdPtr == FAILED_UINT_PTR_ADDRESS) { return FAILED_UINT_PTR_ADDRESS; } for (uint32_t idx = 0; idx < header->ncmds; idx++) { const struct load_command * loadCmd = (struct load_command *)cmdPtr; if (loadCmd->cmd == LC_SEGMENT) { const struct segment_command * segCmd = (struct segment_command *)cmdPtr; if (strcmp(segCmd->segname, SEG_LINKEDIT) == 0) { return segCmd->vmaddr - segCmd->fileoff; } } else if (loadCmd->cmd == LC_SEGMENT_64) { const struct segment_command_64 * segCmd = (struct segment_command_64 *)cmdPtr; if (strcmp(segCmd->segname, SEG_LINKEDIT) == 0) { return segCmd->vmaddr - segCmd->fileoff; } } cmdPtr += loadCmd->cmdsize; } return FAILED_UINT_PTR_ADDRESS; }
- 遍历load commands,找到LC_SYMTAB,里面包含了符号表和字符串表的偏移信息
struct symtab_command { uint32_t cmd; /* LC_SYMTAB */ uint32_t cmdsize; /* sizeof(struct symtab_command) */ uint32_t symoff; /* 表示符号表的偏移 */ uint32_t nsyms; /* 符号表条目的个数 */ uint32_t stroff; /* 字符串表在文件中的偏移 */ uint32_t strsize; /* 字符串表的大小 */ };
- 遍历符号表,找到函数地址对应的符号表条目所在的地址
符号表单条目结构体struct nlist_64 { union { uint32_t n_strx; /* index into the string table */ } n_un; uint8_t n_type; /* type flag, see below */ uint8_t n_sect; /* section number or NO_SECT */ uint16_t n_desc; /* see <mach-o/stab.h> */ uint64_t n_value; /* value of this symbol (or stab offset) */ };
- 通过上一步获取的符号表数据,获得函数符号在字符串表中的偏移量,然后获得对应的字符串
函数调用地址在符号表中对应的位置if (loadCmd->cmd == LC_SYMTAB) { //LC_SYMTAB 是符号表和字符串表的偏移信息 const struct symtab_command * symtabCmd = (struct symtab_command *)cmdPtr; //符号表在内存中的地址(包含偏移) symoff符号表的偏移 const LXD_NLIST * symbolTable = (LXD_NLIST *)(segmentBase + symtabCmd->symoff); //字符串表在内存中的地址(包含偏移) stroff字符串表在文件中的偏移 const uintptr_t stringTable = segmentBase + symtabCmd->stroff; //nsyms符号表条目的个数 for (uint32_t iSym = 0; iSym < symtabCmd->nsyms; iSym++) { if (symbolTable[iSym].n_value == FAILED_UINT_PTR_ADDRESS) { continue; } //符号表每一项开始地址 uintptr_t symbolBase = symbolTable[iSym].n_value; //函数地址在符号表的偏移 uintptr_t currentDistance = addressWithSlide - symbolBase; if ( (addressWithSlide >= symbolBase && currentDistance <= bestDistance) ) { bestMatch = symbolTable + iSym; bestDistance = currentDistance; } } if (bestMatch != NULL) { info->dli_saddr = (void *)(bestMatch->n_value + imageVMAddressSlide); //n_un.n_strx 表示符号名在字符串表中的偏移量,用于表示函数名 info->dli_sname = (char *)((intptr_t)stringTable + (intptr_t)bestMatch->n_un.n_strx); NSLog(@"%s",info->dli_sname); if (*info->dli_sname == '_') { info->dli_sname++; } if (info->dli_saddr == info->dli_fbase && bestMatch->n_type == 3) { info->dli_sname = NULL; } break; } }
0x00000001000048a0
MachOView中查看
ASLR地址是0x0000000004f31000
函数调用字符在字符串表中的地址0x0000000104f40940
去掉偏移量的地址:0x0000000104f40940 - 0x0000000004f31000 = 0x000000010000F940
在MachOView中查看
打印信息
参考文章
https://www.jianshu.com/p/df5b08330afd
https://www.jianshu.com/p/8b78bbbcaf89
https://blog.csdn.net/jasonblog/article/details/49909209
https://elliotsomething.github.io/2017/06/28/thread%E5%AD%A6%E4%B9%A0/