iOS启动优化--探索load中方法替换迁移到initialize的可行性

我们都知道dyld在runtime初始化的时候注册了三个回调其中有一个load_images，他就是用来做load收集以及调用的

当我们的程序启动时加载的load函数过多时，就会对app的启动有一定的影响；一个load耗时是几毫秒级别的，但是当load函数多了这个优化就显得很有必要了

本文demo代码
可以运行demo之后在此入口进行测试查看结果

图片.png

1. 如何收集项目中所有的load方法

你可以直接正则搜索，能看到项目中源文件中存在的load方法，下面介绍一种使用lldb的方式
在Xcode的lldb执行一下指令就可列出来

# 使用正则的方式给+load打断点
(lldb) br s -r "\+\[.+ load\]$"
Breakpoint 28: 42 locations.
# 查看断点的列表：比如我这里是28号断点
(lldb) br list 28
28: regex = '\+\[.+ load\]$', locations = 42, resolved = 42, hit count = 0
  28.1: where = RuntimeLearning`+[SubTestUnsafeSwizzle load] + 8 at TestUnsafeSwizzle.m:23, address = 0x0000000107a146a8, resolved, hit count = 0 
  28.2: where = RuntimeLearning`+[TestCategorySwizzle(EventTrack) load] + 8 at TestCategorySwizzle.m:51, address = 0x0000000107a149b8, resolved, hit count = 0 
 ... 此处省略很多
  28.12: where = Foundation`+[NSObject(NSObject) load], address = 0x00007fff2574d4d2, resolved, hit count = 0 
  28.13: where = CoreFoundation`+[NSObject(NSObject) load], address = 0x00007fff23c91d70, resolved, hit count = 0 
  ...

这里就截取了部分输出的断点列表，可以看到还有一些是系统的也列出来了，我们也可以通过指定库的名称来只对我们的项目代码中的load断点
比如我的项目是RuntimeLearning我就指定是他，这个时候断点就是该项目中的load方法了

(lldb) br s -s RuntimeLearning -r "\+\[.+ load\]$"
Breakpoint 29: 10 locations.
(lldb) br list 29
29: regex = '\+\[.+ load\]$', module = RuntimeLearning, locations = 10, resolved = 10, hit count = 0
  29.1: where = RuntimeLearning`+[SubTestUnsafeSwizzle load] + 8 at TestUnsafeSwizzle.m:23, address = 0x0000000107a146a8, resolved, hit count = 0 
  29.2: where = RuntimeLearning`+[TestCategorySwizzle(EventTrack) load] + 8 at TestCategorySwizzle.m:51, address = 0x0000000107a149b8, resolved, hit count = 0 
  ...此处省略一些断点信息
  29.10: where = RuntimeLearning`+[UIFont(Test) load] + 8 at UIFont+Test.m:18, address = 0x0000000107a14f58, resolved, hit count = 0 

(lldb)

我们有各种手段能收集到项目中的load方法，那么我们怎么去优化掉这些load方法了。

2. 我们一般在load中做了些什么

方法的存在是有原因的，我们在load方法中做了一些事，也是利用了load函数的加载调用机制

2.1 load中做方法交换hook

我们hook函数的时候想要保证，函数在被调用之前就能被hook住，同时在一个类的多个分类中去做hook也都能生效；那么load函数就再合适不过了；

load的加载时机是在runtime初始化，在main函数之前
一个类的多个分类的load函数也都会被执行，执行顺序是按照编译顺序；分类的load在原类之后执行
load的执行是按照继承链从上往下执行的
load方法只有实现了才会执行，没有实现不会去调用继承链上游的，也不会调用分类中的

那么我们有没有替代方案了？

__attribute__ constructor也是在main之前在objc_init的static_init()阶段执行的，我们排除掉
initialize函数是在消息首次接受消息的时候触发，但是他有2个特点是我们有时候不想要的；

子类没实现initialize方法会调用父类的
多个分类中实现会从行为上覆盖掉原类及其他分类的initialize(最后编译的那个分类中的会被调用)

这些行为导致我们如果在类的多个分类中去hook那么只会有一个生效；那么有没有办法解决了，稍后来一起探讨技术上可行的方案

2.2 load中做路由配置的注册

这里主要就是我们有时候在做路由跳转的时候，需要将一些标识信息注册到路由表中，往往我们就是写在load中，在程序启动这个路由表就注册好了

这个的解决方式就比较多

配置文件的方式，可以是个json，plist
管理类维护注册关系的方式
协议的方式，协议定义好需要的信息；容器中实现该协议，在用到的时候使用runtime去收集所有实现该协议的类的信息
__attribute__((section("name")))将数据写入到可执行文件中，在使用的时候读取出来

3 如何用initialize替代load做方法hook

我们先了解一些load是怎么加载的，前面也说过dyld注册了回调，其中一个load_images回调就是来调用load方法

3.1 dyld在objc_init中注册回调`load_images`

void _objc_init(void)
{
    static bool initialized = false;
    if (initialized) return;
    initialized = true;
    
    // fixme defer initialization until an objc-using image is found?
    environ_init();
    tls_init();
    static_init();
    runtime_init();
    exception_init();
    cache_init();
    _imp_implementationWithBlock_init();

    _dyld_objc_notify_register(&map_images, load_images, unmap_image);
}

3.2 load_images内部流程

调用prepare_load_methods收集该镜像image的所有load方法，然后调用call_load_methods触发load方法

void
load_images(const char *path __unused, const struct mach_header *mh)
{
    // Return without taking locks if there are no +load methods here.
    if (!hasLoadMethods((const headerType *)mh)) return;

    recursive_mutex_locker_t lock(loadMethodLock);

    // Discover load methods
    {
        mutex_locker_t lock2(runtimeLock);
        prepare_load_methods((const headerType *)mh);
    }

    // Call +load methods (without runtimeLock - re-entrant)
    call_load_methods();
}

3.3 收集load方法的imp

从MachO的__objc_nlclslist段获取class列表，收集class的load方法；schedule_class_load内部会沿着继承链向上递归调用，依次将类的load方法通过add_class_to_loadable_list加到一个结构体数组中

从MachO的__objc_nlcatlist段获取category列表，然后按照MachO中的顺序依次将load方法通过add_category_to_loadable_list加到一个结构体数组中

void prepare_load_methods(const headerType *mhdr)
{
    size_t count, i;

    runtimeLock.assertLocked();
    // 获取class列表，收集class的load方法
    classref_t const *classlist = 
        _getObjc2NonlazyClassList(mhdr, &count);
    for (i = 0; i < count; i++) {
        schedule_class_load(remapClass(classlist[i]));
    }

    category_t * const *categorylist = _getObjc2NonlazyCategoryList(mhdr, &count);
    for (i = 0; i < count; i++) {
        category_t *cat = categorylist[i];
        Class cls = remapClass(cat->cls);
        if (!cls) continue;  // category for ignored weak-linked class
        if (cls->isSwiftStable()) {
            _objc_fatal("Swift class extensions and categories on Swift "
                        "classes are not allowed to have +load methods");
        }
        realizeClassWithoutSwift(cls, nil);
        ASSERT(cls->ISA()->isRealized());
        add_category_to_loadable_list(cat);
    }
}


// Recursively schedule +load for cls and any un-+load-ed superclasses.
// cls must already be connected.
static void schedule_class_load(Class cls)
{
    if (!cls) return;
    ASSERT(cls->isRealized());  // _read_images should realize

    if (cls->data()->flags & RW_LOADED) return;

    // Ensure superclass-first ordering
    schedule_class_load(cls->superclass);

    add_class_to_loadable_list(cls);
    cls->setInfo(RW_LOADED); 
}

3.3.1 add_category_to_loadable_list内部实现

loadable_category构体包含Category信息，以及load的imp；然后用一个结构体数组loadable_categories去存储loadable_category结构体对象

方法的add_class_to_loadable_list实现跟类别的类似，只是存储的结构体的差异

struct loadable_class {
    Class cls;  // may be nil
    IMP method;
};

struct loadable_category {
    Category cat;  // may be nil
    IMP method;
};
// List of categories that need +load called (pending parent class +load)
static struct loadable_category *loadable_categories = nil;
static int loadable_categories_used = 0;
static int loadable_categories_allocated = 0;


void add_category_to_loadable_list(Category cat)
{
    IMP method;

    loadMethodLock.assertLocked();
    // 直接拿到load方法的imp
    method = _category_getLoadMethod(cat);

    // Don't bother if cat has no +load method
    if (!method) return;

    if (PrintLoading) {
        _objc_inform("LOAD: category '%s(%s)' scheduled for +load", 
                     _category_getClassName(cat), _category_getName(cat));
    }
    // 如果需要扩容，就扩容
    if (loadable_categories_used == loadable_categories_allocated) {
        loadable_categories_allocated = loadable_categories_allocated*2 + 16;
        loadable_categories = (struct loadable_category *)
            realloc(loadable_categories,
                              loadable_categories_allocated *
                              sizeof(struct loadable_category));
    }
    // 将category的类以及imp存储在数组中
    loadable_categories[loadable_categories_used].cat = cat;
    loadable_categories[loadable_categories_used].method = method;
    loadable_categories_used++;
}

3.4 call_load_methods调用load方法

先调用class的loadcall_class_loads，再调用类别的loadcall_category_loads

void call_load_methods(void)
{
    static bool loading = NO;
    bool more_categories;

    loadMethodLock.assertLocked();

    // Re-entrant calls do nothing; the outermost call will finish the job.
    if (loading) return;
    loading = YES;

    void *pool = objc_autoreleasePoolPush();

    do {
        // 1. Repeatedly call class +loads until there aren't any more
        while (loadable_classes_used > 0) {
            call_class_loads();
        }

        // 2. Call category +loads ONCE
        more_categories = call_category_loads();

        // 3. Run more +loads if there are classes OR more untried categories
    } while (loadable_classes_used > 0  ||  more_categories);

    objc_autoreleasePoolPop(pool);

    loading = NO;
}

至此我们也大概梳理了runtime是如何加载load方法的

收集load方法的信息，先类再分类，先父类再子类，分别存储到2个结构体数组中

调用load方法，先调用类的，再调用分类的

由于系统是收集了所有的load的imp，然后去执行，所以就保证了load方法都执行了，并且是按照先类再分类，类是按照继承链从父到子类的顺序执行的

3.5 initialize是如何调用的

在initialize方法下个断点，然后在Xcode的Debug -- Debug Workflow -- Always Show Disassembly打开汇编调试，可以看到调用了如下的函数CALLING_SOME_+initialize_METHOD

libobjc.A.dylib`CALLING_SOME_+initialize_METHOD:
     0x7fff513fc0f2 <+0>:  pushq  %rbp
     0x7fff513fc0f3 <+1>:  movq   %rsp, %rbp
     0x7fff513fc0f6 <+4>:  movq   0x38a0a3cb(%rip), %rsi    ; "initialize"
     0x7fff513fc0fd <+11>: callq  *0x3663a18d(%rip)         ; (void *)0x00007fff513f7780: objc_msgSend
 ->  0x7fff513fc103 <+17>: popq   %rbp
     0x7fff513fc104 <+18>: retq

CALLING_SOME_+initialize_METHOD的实现；
可以看到是调用的callInitialize内部是走的objc_msgSend消息机制；这也能解释为啥分类中的initialize方法会覆盖类中的initialize方法了

void callInitialize(Class cls)
    asm("_CALLING_SOME_+initialize_METHOD");
// 内部是走的objc_msgSend消息机制    
void callInitialize(Class cls)
{
    ((void(*)(Class, SEL))objc_msgSend)(cls, @selector(initialize));
    asm("");
}

callInitialize的上层调用initializeNonMetaClass，方法实现太长就不贴代码了，内部就是递归调用父类的，简化版如下

void initializeNonMetaClass(Class cls)
{
    ASSERT(!cls->isMetaClass());

    Class supercls;
    bool reallyInitialize = NO;

    // Make sure super is done initializing BEFORE beginning to initialize cls.
    // See note about deadlock above.
    supercls = cls->superclass;
    if (supercls  &&  !supercls->isInitialized()) {
        initializeNonMetaClass(supercls);
    }
    // 此处省略很多代码
    callInitialize(cls);
}

3.6 对initialize做点什么

我们了解了load以及initialize的调用机制，现在我们又想将方法替换的代码移到initialize方法中，那么我们可以仿照load的加载流程，再调用initialize的时候将类和分类中的initialize方法也收集起来，然后手动去调用一下，是不是就可以了
流程如下：

按照规则收集initialize方法的imp保存到一个结构体数组中

规则是先类再分类；类需要按照继承链去获取

在调用initialize的时候，手动调用收集到的方法imp列表

3.6.1 收集initialize的imp列表
我们也定义一个结构体去存储Class以及imp

struct initailize_class {
    Class cls;
    IMP method;
};

编写收集imp的方法，入参传入Class、方法子、以及count；count是用来统计结构体数组的个数

struct initailize_class *gatherClassMethodImps(Class gatherCls, SEL gatherSel, unsigned int *count) {
    if (gatherCls == Nil || gatherSel == nil) {
        return nil;
    }
    struct initailize_class *initailize_classes = nil;
    int used = 0, allocated = 0;
    Class cls = gatherCls;
    // 沿着继承链去从类的ias即metaClass的方法列表中获取initialize方法并存储起来
    while (cls != NSObject.class) {
        unsigned int count = 0;
        Class gatherClsIsa = object_getClass(cls);
        Method *methodList = class_copyMethodList(gatherClsIsa, &count);
        for (unsigned int i = 0; i < count; i++) {
            Method method = methodList[i];
            SEL sel = method_getName(method);
            if (sel == gatherSel) {
                IMP imp = method_getImplementation(method);
                if (used == allocated) {
                    allocated = allocated * 2 + 16;
                    initailize_classes = (struct initailize_class *)realloc(initailize_classes, sizeof(struct initailize_class) * allocated);
                }
                initailize_classes[used].cls = cls;
                initailize_classes[used].method = imp;
                used++;
            }
        }
        
        free(methodList);
        cls = [cls superclass]; // class_getSuperclass(cls);
    }
    // 倒序一下，得到的是按照继承链从上到下的顺序
    struct initailize_class *reverse_initailize_classes = calloc(used, sizeof(struct initailize_class));
    for (NSInteger i = used - 1; i >= 0; i--) {
        reverse_initailize_classes[used - i - 1].cls = initailize_classes[i].cls;
        reverse_initailize_classes[used - i - 1].method = initailize_classes[i].method;
    }
    *count = used;
    free(initailize_classes);
    initailize_classes = nil;
    return reverse_initailize_classes;
}

3.6.2 遍历调用initialize方法
这里传入originalImp是由于我是在某个类的initialize方法中触发的，所以我将消息接收者的initialize的调用过滤掉

void callClassMethods(Class cls, SEL callSel, IMP originalImp) {
    unsigned int count = 0;
    struct initailize_class *initailize_classes = gatherClassMethodImps(cls, @selector(initialize), &count);
    for (unsigned int i = 0; i < count; i ++) {
        struct initailize_class item = initailize_classes[i];
        if (item.method != originalImp) { // 过滤调调用者的方法
            initalize_imp imp = (initalize_imp)item.method;
            imp(item.cls, @selector(initialize));
        }
    }
    if (initailize_classes) {
        free(initailize_classes);
        initailize_classes = nil;
    }
}

3.6.3 编写测试代码

定义一个方法（被hook的方法），分别在本类，分类，子类，父类中对其进行hook

我们在子类中进行initialize方法的收集和调用

我们预期的结果最终hook的结果：类中是保持着类的继承链的，类和类的分类是先类再分类

/// 父类
@interface SuperHookMethodInInitialize : NSObject
// 需要被hook的方法
- (void)methodToBeHooked;

@end
/// 本类
@interface HookMethodInInitialize : SuperHookMethodInInitialize

@end
/// 子类
@interface SubHookMethodInInitialize : HookMethodInInitialize

@end
/// 分类A
@interface HookMethodInInitialize(CategoryA)

@end
/// 分类B
@interface HookMethodInInitialize(CategoryB)

@end
/// 分类C
@interface HookMethodInInitialize(CategoryC)

@end

@implementation SuperHookMethodInInitialize

+ (void)initialize {
    if (self == [SuperHookMethodInInitialize self]) {
        static dispatch_once_t onceToken;
        dispatch_once(&onceToken, ^{
            [MethodSwizzleUtil swizzleInstanceMethodWithClass:self originalSel:@selector(methodToBeHooked) replacementSel:@selector(hookedMethodInSuper)];
        });
    }
}

- (void)methodToBeHooked; {
    NSLog(@"%s", __FUNCTION__);
}

- (void)hookedMethodInSuper {
    [self hookedMethodInSuper];
    NSLog(@"%s", __FUNCTION__);
}

@end

@implementation HookMethodInInitialize

+ (void)initialize {
    if (self == [HookMethodInInitialize self]) {
        static dispatch_once_t onceToken;
        dispatch_once(&onceToken, ^{
            [MethodSwizzleUtil swizzleInstanceMethodWithClass:self originalSel:@selector(methodToBeHooked) replacementSel:@selector(hookedMethodInMine)];
        });
    }
}

- (void)hookedMethodInMine {
    [self hookedMethodInMine];
    NSLog(@"%s", __FUNCTION__);
}

@end

@implementation SubHookMethodInInitialize

+ (void)initialize {
    if (self == [SubHookMethodInInitialize self]) {
        Method currentMethod = class_getClassMethod(self, _cmd);
        IMP currentMethodImp = method_getImplementation(currentMethod);
        callClassMethods(self, @selector(initialize), currentMethodImp);
        static dispatch_once_t onceToken;
        dispatch_once(&onceToken, ^{
            [MethodSwizzleUtil swizzleInstanceMethodWithClass:self originalSel:@selector(methodToBeHooked) replacementSel:@selector(hookedMethodInSubclass)];
        });
    }
}

- (void)hookedMethodInSubclass {
    [self hookedMethodInSubclass];
    NSLog(@"%s", __FUNCTION__);
}

@end

@implementation HookMethodInInitialize(CategoryA)

+ (void)initialize {
    if (self == [HookMethodInInitialize self]) {
        static dispatch_once_t onceToken;
        dispatch_once(&onceToken, ^{
            [MethodSwizzleUtil swizzleInstanceMethodWithClass:self originalSel:@selector(methodToBeHooked) replacementSel:@selector(hookedMethodInCategoryA)];
        });
    }
}

- (void)hookedMethodInCategoryA {
    [self hookedMethodInCategoryA];
    NSLog(@"%s", __FUNCTION__);
}

@end

@implementation HookMethodInInitialize(CategoryB)

+ (void)initialize {
    if (self == [HookMethodInInitialize self]) {
        static dispatch_once_t onceToken;
        dispatch_once(&onceToken, ^{
            [MethodSwizzleUtil swizzleInstanceMethodWithClass:self originalSel:@selector(methodToBeHooked) replacementSel:@selector(hookedMethodInCategoryB)];
        });
    }
}

- (void)hookedMethodInCategoryB {
    [self hookedMethodInCategoryB];
    NSLog(@"%s", __FUNCTION__);
}

@end

@implementation HookMethodInInitialize(CategoryC)

+ (void)initialize {
    if (self == [HookMethodInInitialize self]) {
        static dispatch_once_t onceToken;
        dispatch_once(&onceToken, ^{
            [MethodSwizzleUtil swizzleInstanceMethodWithClass:self originalSel:@selector(methodToBeHooked) replacementSel:@selector(hookedMethodInCategoryC)];
        });
    }
}

- (void)hookedMethodInCategoryC {
    [self hookedMethodInCategoryC];
    NSLog(@"%s", __FUNCTION__);
}

@end

在子类中调用方法，查看函数被hook之后的调用情况

[[SubHookMethodInInitialize new] methodToBeHooked];

运行查看日志：

2020-06-18 20:34:25.528307+0800 RuntimeLearning[7568:93450] -[SuperHookMethodInInitialize methodToBeHooked]
2020-06-18 20:34:25.528860+0800 RuntimeLearning[7568:93450] -[SuperHookMethodInInitialize hookedMethodInSuper]
2020-06-18 20:34:25.529013+0800 RuntimeLearning[7568:93450] -[HookMethodInInitialize(CategoryC) hookedMethodInCategoryC]
2020-06-18 20:34:25.529163+0800 RuntimeLearning[7568:93450] -[HookMethodInInitialize hookedMethodInMine]
2020-06-18 20:34:25.529273+0800 RuntimeLearning[7568:93450] -[HookMethodInInitialize(CategoryA) hookedMethodInCategoryA]
2020-06-18 20:34:25.551023+0800 RuntimeLearning[7568:93450] -[HookMethodInInitialize(CategoryB) hookedMethodInCategoryB]
2020-06-18 20:34:25.551186+0800 RuntimeLearning[7568:93450] -[SubHookMethodInInitialize hookedMethodInSubclass]

分析日志看到，hook的继承链是符合预期，按照类的继承链的顺序；然而类有分类的时候，却不是按照先类再分类的顺序

这是由于在子类触发initialize的时候会沿着继承链触发上层类的initialize的调用，而当类有分类的时候则调用分类的函数，所以出现了如上结果先调用了CategoryC的再调用了类的

优化代码
我们在分类中也收集一下initialize方法imp列表再去触发一下，就能保证该类是先类再分类的顺序了

@implementation HookMethodInInitialize(CategoryC)

+ (void)initialize {
    if (self == [HookMethodInInitialize self]) {
        Method currentMethod = class_getClassMethod(self, _cmd);
        IMP currentMethodImp = method_getImplementation(currentMethod);
        callClassMethods(self, @selector(initialize), currentMethodImp);
        static dispatch_once_t onceToken;
        dispatch_once(&onceToken, ^{
            [MethodSwizzleUtil swizzleInstanceMethodWithClass:self originalSel:@selector(methodToBeHooked) replacementSel:@selector(hookedMethodInCategoryC)];
        });
    }
}

@end

运行查看日志：

2020-06-18 20:49:55.871230+0800 RuntimeLearning[7693:104497] -[SuperHookMethodInInitialize methodToBeHooked]
2020-06-18 20:49:55.871384+0800 RuntimeLearning[7693:104497] -[SuperHookMethodInInitialize hookedMethodInSuper]
2020-06-18 20:49:55.871497+0800 RuntimeLearning[7693:104497] -[HookMethodInInitialize hookedMethodInMine]
2020-06-18 20:49:55.871594+0800 RuntimeLearning[7693:104497] -[HookMethodInInitialize(CategoryA) hookedMethodInCategoryA]
2020-06-18 20:49:55.871682+0800 RuntimeLearning[7693:104497] -[HookMethodInInitialize(CategoryB) hookedMethodInCategoryB]
2020-06-18 20:49:55.871960+0800 RuntimeLearning[7693:104497] -[HookMethodInInitialize(CategoryC) hookedMethodInCategoryC]
2020-06-18 20:49:55.872050+0800 RuntimeLearning[7693:104497] -[SubHookMethodInInitialize hookedMethodInSubclass]

至此已经符合我们预期的结果了，跟load中的行为一致

4. 总结

通过阅读源码了解了load的加载机制以及initialize的调用机制；参照系统加载load的思路，通过收集initialize的imp列表，在合适的时机去调用imp列表，从而达到类似load的行为。

initialize和load的本质区别就是一个走的是消息机制，一个是直接通过imp函数指针调用；所以才导致load和initialize的差异性