iOS 底层探索：方法缓存（cache_t）的分析

iOS 底层探索：学习大纲 OC篇

前言

这篇主要内容是分析cache_t流程。之前分析了objc_class中的bits探寻的类的内存结构，在oc语言中，对象调用方法之后，这个方法是会被缓存起来的。下次再调用这个方法的时候，直接从缓存里面去找，而不用再去遍历从类到父类再到祖宗类的方法列表了。本文就是从源码分析这个方法缓存的功能流程；
分析环境：arm64 构架，iPhone 真机编译环境下。

准备工作

复习：iOS底层探索：类的结构分析
先学习几个重要的单词以便于理解源码：

`cache`	缓存
`bucket`	桶，一桶的量
`mask`	面具；subnet mask ,子网掩码
`occupied`	已占用的；使用中的
`capacity`	容量；能力
`shift`	转移, 移位
`reallocate`	重新分配
`increment`	增量

回忆SDWebImage缓存策略

SDWebImage实现流程
大脑中是否已经有了缓存实现思路了呢？大致思路：1.先去缓存找，2.找不到就创建，3.创建成功再去缓存，然后返回结束。

我们接下来看看apple的工程师是如何在底层做方法的缓存的。

一、源码中简单查看cache_t 大致结构和定义

结构体objc_class中

Class是指向objc_class的指针，objc_class内部存在一个cache_t cache；cache就是用来缓存最近调用过的方法的。

结构体cache_t中

_buckets：数组，是bucket_t结构体的数组，bucket_t是用来存放方法的SEL内存地址和IMP的;

_mask的大小是数组大小 - 1，用作掩码。（因为这里维护的数组大小都是2的整数次幂，所以_mask的二进制位000011, 000111, 001111）刚好可以用作hash取余数的掩码。刚好保证相与后不超过缓存大小。

_occupied是当前已缓存的方法数。即数组中已使用了多少位置。

_maskAndBucket是mask和bucket的结合体，提升了性能和效率；

结构体bucket_t中

SEL应该是char*类型的字符串，char*强转unsigned long，其实就是SEL的内存地址。代码如下

_imp就是方法实现IMP了。

二、lldb 调试方法缓存的时机

1 .分析 cache_t的缓存时机

-cache属性的获取，需要通过pclass的首地址平移16字节，即首地址+0x10获取cache的地址

注：通过打印分析方法执行后，cache_t中的_buckets中有值了，并且_occupied 为1 ，很明显说明，方法执行调用后进行的缓存。

2 .分析 cache_t中的_buckets读取sel和imp。
我们无法像往常一样通过* buckets打印_buckets，但是cache_t中提供了public方法*buckets()，同理sel和imp ，但是imp需要传参数类；看源码

struct cache_t {
    explicit_atomic<struct bucket_t *> _buckets;
    explicit_atomic<mask_t> _mask;
public: //对外公开可以调用的方法
    static bucket_t *emptyBuckets();
    struct bucket_t *buckets();
    mask_t mask();
    mask_t occupied();
   ......
}

struct bucket_t {
explicit_atomic<uintptr_t> _imp;
    explicit_atomic<SEL> _sel;
public:
    inline SEL sel() const { return _sel.load(memory_order::memory_order_relaxed); }

    inline IMP imp(Class cls) const {
        uintptr_t imp = _imp.load(memory_order::memory_order_relaxed);
        if (!imp) return nil;
}

这样就在cache中找到了 sayHello！

lldb调试打印如下：

从源码的分析中，我们知道sel-imp是在cache_t的_buckets属性中（目前处于macOS环境），而在cache_t结构体中提供了获取_buckets属性的方法buckets();
获取了_buckets属性，就可以获取sel-imp了，这两个的获取在bucket_t结构体中同样提供了相应的获取方法sel()以及 imp(pClass).

三、cache_t的缓存原理

类似SDWebImage，我们探究方法的缓存的时候，我们不仅要探索什么时候存，还要探索怎么存，存在哪，占多大内存，存取方式等。所以我们接下来就一步一步的去剖析。

1. 方法怎么存？
存是个动作，必然也是个函数，我们在结构体中cache_t去寻找有关联的函数解释如下：

struct cache_t {
#if CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_OUTLINED//macOS、模拟器 -- 主要是架构区分
  // explicit_atomic 显示原子性，目的是为了能够 保证 增删改查时 线程的安全性
    //等价于 struct bucket_t * _buckets;
    //_buckets 中放的是 sel imp
    //_buckets的读取 有提供相应名称的方法 buckets()
    explicit_atomic<struct bucket_t *> _buckets; //最小的buckets大小是 4（为了支持扩容算法需要）
    explicit_atomic<mask_t> _mask;  //散列表长度 - 1
#elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_HIGH_16 //64位真机
    explicit_atomic<uintptr_t> _maskAndBuckets;//写在一起的目的是为了优化
    mask_t _mask_unused;
public: //对外公开可以调用的方法
    static bucket_t *emptyBuckets(); // 清空buckets
    
    struct bucket_t *buckets(); //这个方法的实现很简单就是_buckets对外的一个获取函数
    mask_t mask();  //获取缓存容量_mask
    mask_t occupied(); //获取已经占用的缓存个数_occupied
    void incrementOccupied(); //增加缓存，_occupied自++
    void setBucketsAndMask(struct bucket_t *newBuckets, mask_t newMask); //这个函数是设置一个新的Buckets
    void initializeToEmpty();
    unsigned capacity();
    bool isConstantEmptyCache();
    bool canBeFreed();
   ......
}

一看到 void incrementOccupied(); 这个方法就开始激动了，打开源码：

void cache_t::incrementOccupied() 
{
  _occupied++;  //已占用的 递增
}

源码查看这个方法在哪调用的，只有一个地方调用！！！

// 哈希表中插入sel 和 imp
ALWAYS_INLINE
void cache_t::insert(Class cls, SEL sel, IMP imp, id receiver)
{
  ......
}

看到这个方法： void cache_t::insert(Class cls, SEL sel, IMP imp, id receiver)

insert: 插入的意思

Class cls ：当前class

SEL sel : 方法名

IMP imp : 方法的指针

id receiver ：接收者

一目了然，这个方法就是在展示cache是如何缓存的，这个方法内容比较多，我们一点点的去理解。我们再找找哪里调用了cache->insert，源码进行全局搜索只有一处：

void cache_fill(Class cls, SEL sel, IMP imp, id receiver)
{
    runtimeLock.assertLocked(); // runtime锁  assert断言
#if !DEBUG_TASK_THREADS 
    // Never cache before +initialize is done
    if (cls->isInitialized()) {
        cache_t *cache = getCache(cls);
#if CONFIG_USE_CACHE_LOCK
        mutex_locker_t lock(cacheUpdateLock);
#endif
        cache->insert(cls, sel, imp, receiver);
    }
#else
    _collecting_in_critical();
#endif
}

注：看到这么多Lock 这部分肯定是在做线程的操作，这段cache_fill中的一些操作最后都是在做insert操作，我们再找下cache_fill是你什么时候调用的，经过全局搜索，并没有搜到，说明这里又是经过编译器做了处理，所以我们今天就只讨论cache_fill —>insert里的操作。

2. cache_fill执行流程

3. cache_t::insert执行流程


ALWAYS_INLINE
void cache_t::insert(Class cls, SEL sel, IMP imp, id receiver)
{
#if CONFIG_USE_CACHE_LOCK
    cacheUpdateLock.assertLocked();
#else
    runtimeLock.assertLocked();
#endif
    ASSERT(sel != 0 && cls->isInitialized());
/*
 step1 创建临时变量 newOccupied ，oldCapacity 
*/

    mask_t newOccupied = occupied() + 1;
    unsigned oldCapacity = capacity(), capacity = oldCapacity;

/*
 step2 进行buckets的计算
*/
    if (slowpath(isConstantEmptyCache())) {
        // Cache is read-only. Replace it.
        if (!capacity) capacity = INIT_CACHE_SIZE;
        reallocate(oldCapacity, capacity, /* freeOld */false);
    }
    else if (fastpath(newOccupied + CACHE_END_MARKER <= capacity / 4 * 3)) { // 4  3 + 1 bucket cache_t
       //如果小于等于占用内存的3 /4 则什么都不用做。
    }
    else {
//扩容原因：第一次申请开辟的内存容量是 4 ，如果已经有3个bucket插入到cache里面，再次插入一个就会存满这个容量，为了保证读取的正确性，就对其进行扩容
// 扩容算法：有capacity时扩容为两倍，没有就初始化为INIT_CACHE_SIZE 也就是4
        capacity = capacity ? capacity * 2 : INIT_CACHE_SIZE;  
        if (capacity > MAX_CACHE_SIZE) {
            capacity = MAX_CACHE_SIZE;
        }
        reallocate(oldCapacity, capacity, true); 
 // 内存 库容完毕
    }
/*
 step3 计算好容量之后，进行插入sel imp class
*/

    bucket_t *b = buckets();
    mask_t m = capacity - 1;
    mask_t begin = cache_hash(sel, m); //求cache的hash ，通过计算得到sel存储的下标
    mask_t i = begin;
//遍历操作 
    do {
// 如果sel 不存在就将sel存进去
        if (fastpath(b[i].sel() == 0)) {
            incrementOccupied(); //Occupied ++ 
            b[i].set<Atomic, Encoded>(sel, imp, cls); //存入  sel imp cls 
            return;
        }
// 如果sel 存在就返回
        if (b[i].sel() == sel) {
            return;
        }
    } while (fastpath((i = cache_next(i, m)) != begin));

    cache_t::bad_cache(receiver, (SEL)sel, cls);
}

注：以上总共分为三步：

1 计算当前bucket占用量；
2 根据bucket在buckets中的占用比，开辟空间
3 根据cache_hash方法，计算sel-imp存储的哈希下标，存入sel, imp, cls

分析如图

1.待分析处理1:
reallocate()源码

ALWAYS_INLINE
void cache_t::reallocate(mask_t oldCapacity, mask_t newCapacity, bool freeOld)
{
    //取到老的缓存池
    bucket_t *oldBuckets = buckets();
    //建立新容量的缓存池
    bucket_t *newBuckets = allocateBuckets(newCapacity);

    ASSERT(newCapacity > 0);
    ASSERT((uintptr_t)(mask_t)(newCapacity-1) == newCapacity-1);
    //初始化缓存池，缓存池的可缓存数比缓存池的r总容量要小1
    setBucketsAndMask(newBuckets, newCapacity - 1);
   /*
    释放老的缓存池，因为读和写是非常耗时的操作，
    缓存的目的是为了节省时间，
    所以在创建新的缓存池时候没有将老缓存池的内存copy过来
    而且这种操作也会清理掉缓存中长时间没调用的方法
    */
    if (freeOld) {
        cache_collect_free(oldBuckets, oldCapacity);
    }
}

分析如下

reallocate操作

如果有旧的buckets，需要清理之前的缓存，即调用cache_collect_free方法

static void cache_collect_free(bucket_t *data, mask_t capacity)
{
#if CONFIG_USE_CACHE_LOCK
    cacheUpdateLock.assertLocked();
#else
    runtimeLock.assertLocked();
#endif

    if (PrintCaches) recordDeadCache(capacity);

    _garbage_make_room ();// 创建 垃圾回收空间
    garbage_byte_size += cache_t::bytesForCapacity(capacity);
    garbage_refs[garbage_count++] = data;//存bucket_t *data
    cache_collect(false); // 清理旧bucket_t 垃圾回收
}

_garbage_make_room 源码解析


static void _garbage_make_room(void)
{
    static int first = 1;

    // Create the collection table the first time it is needed
    //在第一次需要集合表时创建它
    if (first)
    {
        first = 0;
        garbage_refs = (bucket_t**)
            malloc(INIT_GARBAGE_COUNT * sizeof(void *));
        garbage_max = INIT_GARBAGE_COUNT;
    }

    // Double the table if it is full  如果表已满，则加倍
    else if (garbage_count == garbage_max)
    {
        garbage_refs = (bucket_t**)
            realloc(garbage_refs, garbage_max * 2 * sizeof(void *));
        garbage_max *= 2; //扩容
    }
}

为什么要创建新的新的buckets来替换原有的buckets并抹掉原有的buckets的方案，而不是在在原有buckets的基础上进行扩容？

减少对方法快速查找流程的影响：调用objc_msgSend时会触发方法快速查找，如果进行扩容需要做一些读写操作，对快速查找影响比较大。

对性能要求比较高：开辟新的buckets空间并抹掉原有buckets的消耗比在原有buckets上进行扩展更加高效

1. 待分析处理2:

   bucket_t *b = buckets();
    mask_t m = capacity - 1;
    mask_t begin = cache_hash(sel, m); //求cache的hash ，通过计算得到sel存储的下标
    mask_t i = begin;
//遍历操作 
    do {
// 如果sel 不存在就将sel存进去
        if (fastpath(b[i].sel() == 0)) {
            incrementOccupied(); //Occupied ++ 
            b[i].set<Atomic, Encoded>(sel, imp, cls); //存入  sel imp cls 
            return;
        }
// 如果sel 存在就返回
        if (b[i].sel() == sel) {
            return;
        }
    } while (fastpath((i = cache_next(i, m)) != begin));

    cache_t::bad_cache(receiver, (SEL)sel, cls);

bucket赋值插入操作

mask_t begin = cache_hash(sel, m);

static inline mask_t cache_hash(SEL sel, mask_t mask) 
{
//求cache的hash ，通过计算得到sel存储的下标
    return (mask_t)(uintptr_t)sel & mask;
}

key 就是 SEL
映射关系其实就是 sel & mask = index
mask = 散列表长度 - 1
所以 index 一定是 <= mask

至此：cache_t的缓存过程基本分析完了。总体而言就是围绕bucket_t进行分析处理，大致思路：1计算bucket大小；2计算bucket在buckets的占比进行容量重组；3存入赋值

4 . cache_t的缓存原理如图

Cache_t原理分析图来自 Cooci大神.png

四、重难点答疑

1 .什么时候存储到cache中？

objc_msgSend第一次发送消息会触发方法查找，找到方法后会调用cache_fill()方法把方法缓存到cache中,这个在后面分析方法的本质的时候会提到。

2 .哪几种情况下会调用insert？

a. init初始化对象的时候;
b. 属性赋值，调用了set方法；
c. 方法调用；

3.bucket数据为什么会有丢失的情况？

原因是在扩容时，是将原有的内存全部清除了，再重新申请了内存导致的

4 .方法缓存cache_t的方法?

散列表技术 key-value, 用散列表来缓存曾经调用过的方法，可以提高方法的查找速度

1. 散列表的数据储存位置：

_mask( 散列表的长度-1 ) = 这个数据缓存的位置的下标，也就是缓存方法的索引，这个下标经过位运算之后，一定会小于或者等于散列表的长度-1 ，就不会出现数组越界的情况了

五、总结

这篇主要探索了cache_t的结构和大概的缓存原理，其实cache_t的整个流程在源码的注释中已经给出，注释如下：

 * All functions that modify cache data or structures must acquire the 
 * cacheUpdateLock to prevent interference from concurrent modifications.
 * The function that frees cache garbage must acquire the cacheUpdateLock 
 * and use collecting_in_critical() to flush out cache readers.
 * The cacheUpdateLock is also used to protect the custom allocator used 
 * for large method cache blocks.
 *
 * Cache readers (PC-checked by collecting_in_critical())
 * objc_msgSend*
 * cache_getImp
 *
 * Cache writers (hold cacheUpdateLock while reading or writing; not PC-checked)
 * cache_fill         (acquires lock)
 * cache_expand       (only called from cache_fill)
 * cache_create       (only called from cache_expand)
 * bcopy               (only called from instrumented cache_expand)
 * flush_caches        (acquires lock)
 * cache_flush        (only called from cache_fill and flush_caches)
 * cache_collect_free (only called from cache_expand and cache_flush)

接下来我们就会讨论objc_msgSend。

iOS 底层探索：方法缓存（cache_t）的分析

前言

准备工作

一 、源码中简单查看cache_t 大致结构和定义

二 、lldb 调试 方法缓存的时机

三 、cache_t的缓存原理

四 、重难点答疑

五 、总结

一、源码中简单查看cache_t 大致结构和定义

二、lldb 调试方法缓存的时机

三、cache_t的缓存原理

四、重难点答疑

五、总结