Android lmkd--01：源码分析

本文转载自：Android源码阅读：lmkd

本文基于android-13.0.0_r1

1.lmkd进程启动和初始化过程

lmkd由init进程启动，在系统中作为一个单独的进程存在。

// system/core/rootdir/init.rc
    // ...
    # Start lmkd before any other services run so that it can register them
    write /proc/sys/vm/watermark_boost_factor 0
    chown root system /sys/module/lowmemorykiller/parameters/adj
    chmod 0664 /sys/module/lowmemorykiller/parameters/adj
    chown root system /sys/module/lowmemorykiller/parameters/minfree
    chmod 0664 /sys/module/lowmemorykiller/parameters/minfree
    start lmkd // 启动lmkd
    // ...

lmkd.rc

// system/memory/lmkd/lmkd.rc
service lmkd /system/bin/lmkd
    class core
    user lmkd
    group lmkd system readproc
    capabilities DAC_OVERRIDE KILL IPC_LOCK SYS_NICE SYS_RESOURCE
    critical
    socket lmkd seqpacket+passcred 0660 system system
    task_profiles ServiceCapacityLow

on property:lmkd.reinit=1
    exec_background /system/bin/lmkd --reinit

# reinitialize lmkd after device finished booting if experiments set any flags during boot
on property:sys.boot_completed=1 && property:lmkd.reinit=0
    setprop lmkd.reinit 1

# properties most likely to be used in experiments
# setting persist.device_config.* property either triggers immediate lmkd re-initialization
# if the device finished booting or sets lmkd.reinit=0 to re-initialize lmkd after boot completes
on property:persist.device_config.lmkd_native.debug=*
    setprop lmkd.reinit ${sys.boot_completed:-0}

on property:persist.device_config.lmkd_native.kill_heaviest_task=*
    setprop lmkd.reinit ${sys.boot_completed:-0}

on property:persist.device_config.lmkd_native.kill_timeout_ms=*
    setprop lmkd.reinit ${sys.boot_completed:-0}

on property:persist.device_config.lmkd_native.swap_free_low_percentage=*
    setprop lmkd.reinit ${sys.boot_completed:-0}

on property:persist.device_config.lmkd_native.psi_partial_stall_ms=*
    setprop lmkd.reinit ${sys.boot_completed:-0}

on property:persist.device_config.lmkd_native.psi_complete_stall_ms=*
    setprop lmkd.reinit ${sys.boot_completed:-0}

on property:persist.device_config.lmkd_native.thrashing_limit=*
    setprop lmkd.reinit ${sys.boot_completed:-0}

on property:persist.device_config.lmkd_native.thrashing_limit_decay=*
    setprop lmkd.reinit ${sys.boot_completed:-0}

on property:persist.device_config.lmkd_native.thrashing_limit_critical=*
    setprop lmkd.reinit ${sys.boot_completed:-0}

on property:persist.device_config.lmkd_native.swap_util_max=*
    setprop lmkd.reinit ${sys.boot_completed:-0}

on property:persist.device_config.lmkd_native.filecache_min_kb=*
    setprop lmkd.reinit ${sys.boot_completed:-0}

启动时直接运行lmkd.cpp中的main函数。main函数中，逻辑较清楚，更新参数，创建logger，之后在if中进行init，之后在mainloop()中循环等待。

// system/memory/lmkd/lmkd.cpp
int main(int argc, char **argv) {
    update_props(); // 更新参数
    ctx = create_android_logger(KILLINFO_LOG_TAG); // 创建logger
    if (!init()) {
        //...
        mainloop();
    }
    android_log_destroy(&ctx);
}

1.1 update_props()参数更新

update_props函数中主要是使用GET_LMK_PROPERTY从属性中获取各个参数配置，例如从参数中获取low 、medium、critical三种压力等级下，可以kill的adj等级。

static void update_props() {
    /* By default disable low level vmpressure events */
    level_oomadj[VMPRESS_LEVEL_LOW] =
        GET_LMK_PROPERTY(int32, "low", OOM_SCORE_ADJ_MAX + 1);
    level_oomadj[VMPRESS_LEVEL_MEDIUM] =
        GET_LMK_PROPERTY(int32, "medium", 800);
    level_oomadj[VMPRESS_LEVEL_CRITICAL] =
        GET_LMK_PROPERTY(int32, "critical", 0);
    // ...
}

GET_LMK_PROPERTY是一个宏定义，用来读取ro.lmk参数

#define GET_LMK_PROPERTY(type, name, def) \
    property_get_##type("persist.device_config.lmkd_native." name, \
        property_get_##type("ro.lmk." name, def))

1.2 init()初始化过程

1.2.1 创建epoll监听

init中比较重要的一步是创建epoll监听，这里有宏定义MAX_EPOLL_EVENTS是10，也就是epoll监听了10个event。

/* max supported number of data connections (AMS, init, tests) */
/* 支持的最大数据连接数（AMS、init、测试） */
#define MAX_DATA_CONN 3

/*
 * 1 ctrl listen socket, 3 ctrl data socket, 3 memory pressure levels,
 * 1 lmk events + 1 fd to wait for process death + 1 fd to receive kill failure notifications
 * 
 * 1个控制监听socket，3个控制数据通信socket，3个内存压力等级，1个lmk时间，1个监控进程死亡，1个接收kill失败通知
 */
#define MAX_EPOLL_EVENTS (1 + MAX_DATA_CONN + VMPRESS_LEVEL_COUNT + 1 + 1 + 1)

static int epollfd;

static int init(void) {

    // ...

    epollfd = epoll_create(MAX_EPOLL_EVENTS);
    if (epollfd == -1) {
        ALOGE("epoll_create failed (errno=%d)", errno);
        return -1;
    }

    // ...
}

例如ams会作为socket客户端，通过/dev/socket/lmkd与lmkd进行socket通信，将进程的adj通知到lmkd，并由lmkd写入"/proc/[pid]/oom_score_adj"路径。

1.2.2 初始化lmkd触发方式

接下来init函数需要决定lmkd的触发方式，早期的lmk使用内核驱动的方式，这里通过access确认旧的节点是否还存在（kernel 4.12已废弃）。不支持的话就是执行 init_monitors()。

    has_inkernel_module = !access(INKERNEL_MINFREE_PATH, W_OK);
    use_inkernel_interface = has_inkernel_module && !enable_userspace_lmk;

    if (use_inkernel_interface) {
        // 大多内核已不支持
    } else {
        if (!init_monitors()) {
            return -1;
        }
    }

注意初始化监控器这里，有4个看起来很像的函数，分别是init_monitors()、init_psi_monitors()、init_mp_psi()、init_psi_monitor()，注意区分。

看代码，在init_monitors()函数中，要确认使用PSI触发还是vmpressure触发；在“ro.lmk. use_psi”属性为true的情况下，调用 init_psi_monitors 初始化PSI监控器，失败才会使用init_mp_common初始化vmpressure监控器，这里可以看出lmkd还是倾向于优先使用PSI触发。

static bool init_monitors() {
    /* 在内核支持的情况下，尽量使用PSI监控器 */
    use_psi_monitors = GET_LMK_PROPERTY(bool, "use_psi", true) &&
        init_psi_monitors();

    /* PSI监控器初始化失败，回退到vmpressure触发 */
    if (!use_psi_monitors &&
        (!init_mp_common(VMPRESS_LEVEL_LOW) ||
        !init_mp_common(VMPRESS_LEVEL_MEDIUM) ||
        !init_mp_common(VMPRESS_LEVEL_CRITICAL))) {
        ALOGE("Kernel does not support memory pressure events or in-kernel low memory killer");
        return false;
    }
    if (use_psi_monitors) {
        ALOGI("Using psi monitors for memory pressure detection");
    } else {
        ALOGI("Using vmpressure for memory pressure detection");
    }
    return true;
}

（1）PSI触发

接下来看调用init_psi_monitors() 初始化PSI监控器，在明确设置属性use_new_strategy为true的情况下，或低内存设备，或明确use_minfree_levels为false的情况下，都是倾向于使用“新的策略”。这里新的策略其实指的是在PSI触发之后，是根据free page的情况（水线）去查杀进程，还是根据不同PSI压力去查杀进程，前者就是旧策略，后者为新策略；个人认为这里用“新旧”去区分非常不优雅。

注意这里新旧的策略，是依据PSI压力杀进程还是依据水线杀进程，但都不影响这里是设置的是PSI监控器，即触发仍然还是用PSI触发的，是杀进程的方式存在不同。

static bool init_psi_monitors() {

    bool use_new_strategy =
        GET_LMK_PROPERTY(bool, "use_new_strategy", low_ram_device || !use_minfree_levels);

    /* 在默认 PSI模式下，使用系统属性覆盖 psi stall阈值 */
    if (use_new_strategy) {
        /* Do not use low pressure level */
        psi_thresholds[VMPRESS_LEVEL_LOW].threshold_ms = 0;
        psi_thresholds[VMPRESS_LEVEL_MEDIUM].threshold_ms = psi_partial_stall_ms;
        psi_thresholds[VMPRESS_LEVEL_CRITICAL].threshold_ms = psi_complete_stall_ms;
    }

    if (!init_mp_psi(VMPRESS_LEVEL_LOW, use_new_strategy)) {
        return false;
    }
    if (!init_mp_psi(VMPRESS_LEVEL_MEDIUM, use_new_strategy)) {
        destroy_mp_psi(VMPRESS_LEVEL_LOW);
        return false;
    }
    if (!init_mp_psi(VMPRESS_LEVEL_CRITICAL, use_new_strategy)) {
        destroy_mp_psi(VMPRESS_LEVEL_MEDIUM);
        destroy_mp_psi(VMPRESS_LEVEL_LOW);
        return false;
    }
    return true;
}

决定好新旧策略后，接下来调用init_mp_psi来初始化各个等级的PSI事件。

init_mp_psi有两个参数，第一个是压力等级，第二个新旧策略的标志位。注意第一个参数的命名是“vmpressure_level”，尽管是“vmpressure”，但实际这里用PSI触发，是根据PSI来判断内存压力等级的，和前面说的vmpressure判断内存压力等级并非同一个“vmpressure”，这是第二个我认为代码非常不优雅的地方，容易引起歧义。vmpressure全称是虚拟内存压力，难道设计者的想法中，PSI所产生的stall ms也是一种虚拟的内存压力？

static bool init_mp_psi(enum vmpressure_level level, bool use_new_strategy) {
    int fd;

    /* Do not register a handler if threshold_ms is not set */
    if (!psi_thresholds[level].threshold_ms) {
        return true;
    }

    fd = init_psi_monitor(psi_thresholds[level].stall_type,
        psi_thresholds[level].threshold_ms * US_PER_MS,
        PSI_WINDOW_SIZE_MS * US_PER_MS);

    if (fd < 0) {
        return false;
    }

    vmpressure_hinfo[level].handler = use_new_strategy ? mp_event_psi : mp_event_common;
    vmpressure_hinfo[level].data = level;
    if (register_psi_monitor(epollfd, fd, &vmpressure_hinfo[level]) < 0) {
        destroy_psi_monitor(fd);
        return false;
    }
    maxevents++;
    mpevfd[level] = fd;

    return true;
}

注意这里的init_psi_monitor和前面的init_psi_monitors做区分，init_psi_monitor是定义在system/memory/lmkd/libpsi/psi.cpp中的，它的作用是根据stall类型、阈值、窗口大小，获取epoll监听的句柄。

然后最重要的就是vmpressure_hinfo[level].handler，其根据是否使用新策略，决定了在这个压力等级事件发生时，要调用的是mp_event_psi还是mp_event_common。也就是使用新策略的情况下，当这个压力事件到来时，会调用mp_event_psi。

后面register_psi_monitor则是epoll监听压力事件。

至此可以认为init_psi_monitors()也就是PSI监控器初始化完成，各个压力事件发生时，会调用mp_event_psi。

（2）vmpressure触发

由于现在大部分Android机型均使用PSI触发，vmpressure触发这部分暂略过。
init中除了init_monitors()还有其他一些初始化过程，也先略过。

2.PSI触发后的新策略(mp_event_psi)

mp_event_psi函数可以大致分为三个部分，第一部分做一些参数和状态的计算，第二部分根据得出的状态确定查杀原因(kill_reason)，第三部分选择进程进行一轮查杀。

2.1 参数和状态的计算

2.1.1 一些static变量

首先是这个函数中有一些static变量，在多次进入这个函数时，这些static变量持续记录状态。

    static int64_t init_ws_refault; // 记录 杀进程后 初始的 workingset_refault
    static int64_t prev_workingset_refault; // 记录上一轮的 workingset_refault
    static int64_t base_file_lru;      // 记录初始时的 文件页缓存大小
    static int64_t init_pgscan_kswapd; // 记录初始时的 kswap回收量
    static int64_t init_pgscan_direct; // 记录初始时的 直接回收量
    static bool killing; // 如果有进程被杀会被置为true
    static int thrashing_limit = thrashing_limit_pct; // 抖动的阈值，一开始由参数中获取
    static struct zone_watermarks watermarks;
    static struct timespec wmark_update_tm;
    static struct wakeup_info wi;
    static struct timespec thrashing_reset_tm;
    static int64_t prev_thrash_growth = 0;
    static bool check_filecache = false;
    static int max_thrashing = 0;

2.1.2 一些临时变量

    union meminfo mi;  // 从 /proc/meminfo 解析
    union vmstat vs;   // 从 /proc/vmstat 解析
    struct psi_data psi_data;
    struct timespec curr_tm; // 每轮开始时记录时间
    int64_t thrashing = 0;
    bool swap_is_low = false;
    enum vmpressure_level level = (enum vmpressure_level)data;
    enum kill_reasons kill_reason = NONE;
    bool cycle_after_kill = false; // 如果上一轮有进程被杀，这一轮会被置为true
    enum reclaim_state reclaim = NO_RECLAIM;
    enum zone_watermark wmark = WMARK_NONE;
    char kill_desc[LINE_MAX];
    bool cut_thrashing_limit = false;
    int min_score_adj = 0;
    int swap_util = 0;
    int64_t swap_low_threshold;
    long since_thrashing_reset_ms;
    int64_t workingset_refault_file;
    bool critical_stall = false;

2.1.3 一些状态的判断

这部分代码较多，比较重要的是通过vmstat_parse和meminfo_parse读取信息，判断thrashing、水线、swap状态等，便于下一步确认查杀原因。

    if (clock_gettime(CLOCK_MONOTONIC_COARSE, &curr_tm) != 0) {
        ALOGE("Failed to get current time");
        return;
    }

    record_wakeup_time(&curr_tm, events ? Event : Polling, &wi);

    bool kill_pending = is_kill_pending();
    if (kill_pending && (kill_timeout_ms == 0 ||
        get_time_diff_ms(&last_kill_tm, &curr_tm) < static_cast<long>(kill_timeout_ms))) {
        /* Skip while still killing a process */
        wi.skipped_wakeups++;
        goto no_kill;
    }
    /*
     * Process is dead or kill timeout is over, stop waiting. This has no effect if pidfds are
     * supported and death notification already caused waiting to stop.
     * 进程死亡或者kill超时结束，停止等待。 如果支持pidfd并且死亡通知已导致等待停止，则此操作无效。
     */
    stop_wait_for_proc_kill(!kill_pending);

    // vmstat解析
    if (vmstat_parse(&vs) < 0) {
        ALOGE("Failed to parse vmstat!");
        return;
    }

    /* 从5.9开始内核workingset_refault vmstat字段被重命名为workingset_refault_file */
    workingset_refault_file = vs.field.workingset_refault ? : vs.field.workingset_refault_file;

    // meminfo解析
    if (meminfo_parse(&mi) < 0) {
        ALOGE("Failed to parse meminfo!");
        return;
    }

    /* Reset states after process got killed */
    /* 杀进程后重置一些状态 */
    if (killing) {
        killing = false;
        cycle_after_kill = true;
        /* Reset file-backed pagecache size and refault amounts after a kill */
        base_file_lru = vs.field.nr_inactive_file + vs.field.nr_active_file; // 重置 文件页 缓存大小
        init_ws_refault = workingset_refault_file;                           // 重置 refault量
        thrashing_reset_tm = curr_tm; // thrashing重置时间设置为当前时间
        prev_thrash_growth = 0;       // thrashing重置为0
    }

    /* Check free swap levels */
    /* 确认swap状态: swap_is_low */
    if (swap_free_low_percentage) { // 由属性中获取
        swap_low_threshold = mi.field.total_swap * swap_free_low_percentage / 100;
        swap_is_low = get_free_swap(&mi) < swap_low_threshold;  // free swap低于 XX%，认为swap较低
    } else {
        swap_low_threshold = 0;
    }

    /* Identify reclaim state */
    /* 确认回收状态: reclaim */
    if (vs.field.pgscan_direct != init_pgscan_direct) { // pgscan_direct发生了变化，说明发生了【DIRECT_RECLAIM】
        init_pgscan_direct = vs.field.pgscan_direct;
        init_pgscan_kswapd = vs.field.pgscan_kswapd;
        reclaim = DIRECT_RECLAIM;
    } else if (vs.field.pgscan_kswapd != init_pgscan_kswapd) { // kswapd回收量发生变化，发生了【KSWAPD_RECLAIM】
        init_pgscan_kswapd = vs.field.pgscan_kswapd;
        reclaim = KSWAPD_RECLAIM;
    } else if (workingset_refault_file == prev_workingset_refault) {
        /*
         * Device is not thrashing and not reclaiming, bail out early until we see these stats
         * changing
         * 设备没有抖动也没有回收，该轮不查杀，直到我们看到这些统计数据发生变化
         */
        goto no_kill;
    }

    prev_workingset_refault = workingset_refault_file;

    /*
    * It's possible we fail to find an eligible process to kill (ex. no process is
    * above oom_adj_min). When this happens, we should retry to find a new process
    * for a kill whenever a new eligible process is available.
    * 有可能我们找不到合适的进程来终止（例如，没有进程高于 oom_adj_min）。
    * 这种情况下，只要有新的符合条件的进程可用，我们就应该重试寻找新的进程进行kill。
    * 
    * This is especially important for a slow growing refault case. 
    * 这对于增长缓慢的缺页场景尤为重要。
    * 
    * While retrying, we should keep monitoring new thrashing counter 
    * as someone could release the memory to mitigate the thrashing. 
    * 在重试时，我们应该继续监视新的抖动计数器(thrashing counter)，因为有人可以释放内存来减轻抖动。
    * 
    * Thus, when thrashing reset window comes, 
    * we decay the prev thrashing counter by window counts. 
    * 因此，当抖动重置窗口到来时，我们通过窗口计数衰减前一个抖动计数器。
    * 
    * If the counter is still greater than thrashing limit,
    * we preserve the current prev_thrash counter so we will retry kill again. 
    * 如果计数器仍然大于抖动限制，我们将保留当前的 prev_thrash 计数器，以便我们再次重试 kill。
    * 
    * Otherwise, we reset the prev_thrash counter so we will stop retrying.
    * 否则，我们重置 prev_thrash 计数器以停止重试。
    */

    // 从thrashing重置 到 现在 的时间差，注意如果上一轮查杀过，时间会被重置，时间差=0
    since_thrashing_reset_ms = get_time_diff_ms(&thrashing_reset_tm, &curr_tm);
    if (since_thrashing_reset_ms > THRASHING_RESET_INTERVAL_MS) {
        long windows_passed;
        /* Calculate prev_thrash_growth if we crossed THRASHING_RESET_INTERVAL_MS */
        /* 在超过thrashing reset间隔时间的情况下，计算上一次thrash增长 */
        prev_thrash_growth = (workingset_refault_file - init_ws_refault) * 100
                            / (base_file_lru + 1);          // 新增缺页数量 / 总文件页数量 * 100
        // 代表超过了多少个thrashing reset窗口
        windows_passed = (since_thrashing_reset_ms / THRASHING_RESET_INTERVAL_MS);

        /*
         * Decay prev_thrashing unless over-the-limit thrashing was registered in the window we
         * just crossed, which means there were no eligible processes to kill. We preserve the
         * counter in that case to ensure a kill if a new eligible process appears.
         * 
         * 减少 prev_thrashing 除非 在我们刚刚越过的窗口中 注册了超过限制的抖动，这意味着没有符合条件的进程可以杀死。 
         * 在这种情况下，我们保留计数器以确保在出现新的合格进程时终止。
         */
        // 不太懂
        if (windows_passed > 1 || prev_thrash_growth < thrashing_limit) {
            prev_thrash_growth >>= windows_passed;
        }

        /* Record file-backed pagecache size when crossing THRASHING_RESET_INTERVAL_MS */
        /* 超过THRASHING_RESET_INTERVAL_MS时，记录 文件页数量 */
        // 实际看这里是重置了 文件页大小、refault数量、抖动重置时间、抖动阈值
        base_file_lru = vs.field.nr_inactive_file + vs.field.nr_active_file;
        init_ws_refault = workingset_refault_file;
        thrashing_reset_tm = curr_tm;        // thrashing重置
        thrashing_limit = thrashing_limit_pct;
    } else {
        /* Calculate what % of the file-backed pagecache refaulted so far */
        // 上一轮发生过查杀，或thrashing刚重置没多久，就计算到目前为止，文件页缓存发生缺页的百分比
        thrashing = (workingset_refault_file - init_ws_refault) * 100 / (base_file_lru + 1);
    }
    /* Add previous cycle's decayed thrashing amount */
    // 累加上一轮的thrashing衰减量
    thrashing += prev_thrash_growth;
    if (max_thrashing < thrashing) {
        max_thrashing = thrashing;
    }

    /*
     * Refresh watermarks once per min in case user updated one of the margins.
     * 每 60s 刷新一次水线
     *
     * TODO: b/140521024 replace this periodic update with an API for AMS to notify LMKD
     * that zone watermarks were changed by the system software.
     * TODO: 使用 AMS的API 替换 此定时更新，以通知 LMKD 水线已被系统软件更改。
     */
    if (watermarks.high_wmark == 0 || get_time_diff_ms(&wmark_update_tm, &curr_tm) > 60000) {
        struct zoneinfo zi;

        // 进行一次zoninfo解析
        if (zoneinfo_parse(&zi) < 0) {
            ALOGE("Failed to parse zoneinfo!");
            return;
        }

        // 计算zone水线，看这个函数把各个zone的水线都加了起来，存到watermarks里
        calc_zone_watermarks(&zi, &watermarks);
        wmark_update_tm = curr_tm;
    }

    /* Find out which watermark is breached if any */
    // 确认到了哪个水线等级
    wmark = get_lowest_watermark(&mi, &watermarks);

    /* 从/proc/pressure/memory解析 psi 数据，确认是否达到 critical 等级 */
    if (!psi_parse_mem(&psi_data)) {
        critical_stall = psi_data.mem_stats[PSI_FULL].avg10 > (float)stall_limit_critical;
    }

2.2 确认查杀原因和最低adj

该部分主要是根据上一部分得出的状态，确认要进行查杀的原因，以及对最低可查杀adj等级(min_score_adj)做出修改，这部分源码基本上全是if else if注释比较详细，kill_reason和kill_desc的赋值也比较直观，高通等厂商也会对这部分代码做较大的改动，因此暂不详细标注这部分内容。如有lmkd异常查杀等情况发生，可以根据lmkd日志中打印的kill reason，在这一部分找到对应的源码。

代码示例：

    if (cycle_after_kill && wmark < WMARK_LOW) {
        /*
         * Prevent kills not freeing enough memory which might lead to OOM kill.
         * This might happen when a process is consuming memory faster than reclaim can
         * free even after a kill. Mostly happens when running memory stress tests.
         */
        kill_reason = PRESSURE_AFTER_KILL;
        strncpy(kill_desc, "min watermark is breached even after kill", sizeof(kill_desc));
    } else if (level == VMPRESS_LEVEL_CRITICAL && events != 0) {
        // ......
    } else if (swap_is_low && thrashing > thrashing_limit_pct) {
        // ......
    } else if (/*...*/)

2.3 进程查杀

至此已经确定了查杀原因和最低允许查杀的adj，调用find_and_kill_process函数进行查杀。

    if (kill_reason != NONE) {
        struct kill_info ki = {
            .kill_reason = kill_reason,
            .kill_desc = kill_desc,
            .thrashing = (int)thrashing,
            .max_thrashing = max_thrashing,
        };
    // ...
        int pages_freed = find_and_kill_process(min_score_adj, &ki, &mi, &wi, &curr_tm, &psi_data);
        if (pages_freed > 0) {
            killing = true;
            // ...
        }
    }

find_and_kill_process函数的作用是在大于等于min_score_adj的范围内，选择合适的进程进行查杀。

static int find_and_kill_process(int min_score_adj, struct kill_info *ki, union meminfo *mi,
                                 struct wakeup_info *wi, struct timespec *tm,
                                 struct psi_data *pd) {
    int i;
    int killed_size = 0;
    bool lmk_state_change_start = false;
    bool choose_heaviest_task = kill_heaviest_task;

    // 从 1000 开始循环
    for (i = OOM_SCORE_ADJ_MAX; i >= min_score_adj; i--) {
        struct proc *procp;

        if (!choose_heaviest_task && i <= PERCEPTIBLE_APP_ADJ) {
            /*
             * If we have to choose a perceptible process, choose the heaviest one to
             * hopefully minimize the number of victims.
             * 如果我们必须选择一个可感知的进程(adj<=200)，就选择最严重的一个，来尽量避免查杀过多的进程。
             */
            choose_heaviest_task = true;
        }

        while (true) {
            procp = choose_heaviest_task ?
                proc_get_heaviest(i) : proc_adj_tail(i); // 在adj==i的进程中找"最严重的"或末尾的

            if (!procp)
                break;

            killed_size = kill_one_process(procp, min_score_adj, ki, mi, wi, tm, pd);
            if (killed_size >= 0) {
                break;
            }
        }

        // 有进程查杀发生时，不再继续查杀更低adj的进程
        if (killed_size) {
            break;
        }
    }

    return killed_size;
}

从find_and_kill_process可以得知lmkd每次触发查杀时，都是从adj1000的进程开始逐个筛选合适的进程查杀，发生查杀后退出。在“选择合适的进程”的策略中，可以通过kill_heaviest_task系统属性控制lmkd是用proc_get_heaviest还是proc_adj_tail做筛选，不过注意在查杀到adj小于等于200时，已经到了必须查杀用户可感知进程的地步，此时强制筛选“最严重”的进程。

所谓“最严重的进程”，可以在proc_get_heaviest函数中看到是对进程读取"/proc/[pid]/statm"路径，获取进程rss内存占用，排序找出rss最大的进程；

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 206,378评论 6赞 481
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 88,356评论 2赞 382
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 152,702评论 0赞 342
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 55,259评论 1赞 279
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 64,263评论 5赞 371
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 49,036评论 1赞 285
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 38,349评论 3赞 400
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,979评论 0赞 259
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 43,469评论 1赞 300
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,938评论 2赞 323
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 38,059评论 1赞 333
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,703评论 4赞 323
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 39,257评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 30,262评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,485评论 1赞 262
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 45,501评论 2赞 354
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,792评论 2赞 345