多线程之线程池执行器ThreadPoolExecutor源码探究

在日常的移动开发过程中，并发处理任务是不可避免的。尤其是伴随着移动互联时代的飞速发展，用户对应用的要求也越来越最求高质量的极致体验，甚至逼得Google不允许在Android的UI线程发起网络请求。
还好由于Android强大的生态圈，AsyncTask、Executors、RxJava等一批异步处理工具被开发人员用的炉火纯青。
可好景不长，近年阿里发布了自己的Java开发手册，带动全行业纷纷效仿升级，不但不能使用Thread搞开发，就算是使用Executors提供的四大线程池工具，也被限制了。对就是下面这张图：

系统提供的线程池执行器

为什么阿里不允许使用系统提供的线程池方法构建线程池呢？这就跟线程池的执行逻辑有关了。
那我们先通过一张表来阐述一下线程池的执行逻辑。由于所有的的Executors工具提供的线程池创建方法最终都会通过ThreadPoolExecutor创建线程池，并通过其execute方法执行。我们直接分析ThreadPoolExecutor的执行逻辑。

线程池执行逻辑图

一、执行策略

根据图上我们知道，当系统调用execute方法发起执行的时候，线程池有三种选择：拒绝执行任务、直接执行任务、缓冲执行任务。执行不同的任务自然要依赖不同的条件，我们从源码角度来看看这些条件到底是什么？
当我们调用execute方法开始执行的时候，官方对这个方法做了相当详细的解释：系统提供的任务会在未来的某个时刻被执行，执行的线程要么是一个新的线程，要么是线程池里的一个已经存在的线程。如果因为线程池关闭或者容量超限，任务无法执行，那么就会通过RejectedExecutionHandler执行拒绝策略。而系统提供的任务，就是execute传入的Runnable参数：command

    /**
     * Executes the given task sometime in the future.  The task
     * may execute in a new thread or in an existing pooled thread.
     *
     * If the task cannot be submitted for execution, either because this
     * executor has been shutdown or because its capacity has been reached,
     * the task is handled by the current {@code RejectedExecutionHandler}.
     *
     * @param command the task to execute
     * @throws RejectedExecutionException at discretion of
     *         {@code RejectedExecutionHandler}, if the task
     *         cannot be accepted for execution
     * @throws NullPointerException if {@code command} is null
     */
    public void execute(Runnable command) {
        if (command == null)
            throw new NullPointerException();
        /*
         * Proceed in 3 steps:
         *
         * 1. If fewer than corePoolSize threads are running, try to
         * start a new thread with the given command as its first
         * task.  The call to addWorker atomically checks runState and
         * workerCount, and so prevents false alarms that would add
         * threads when it shouldn't, by returning false.
         *
         * 2. If a task can be successfully queued, then we still need
         * to double-check whether we should have added a thread
         * (because existing ones died since last checking) or that
         * the pool shut down since entry into this method. So we
         * recheck state and if necessary roll back the enqueuing if
         * stopped, or start a new thread if there are none.
         *
         * 3. If we cannot queue task, then we try to add a new
         * thread.  If it fails, we know we are shut down or saturated
         * and so reject the task.
         */
        int c = ctl.get();
        if (workerCountOf(c) < corePoolSize) { // 代码1
            if (addWorker(command, true))
                return;
            c = ctl.get();
        }
        if (isRunning(c) && workQueue.offer(command)) { // 代码2
            int recheck = ctl.get();
            if (! isRunning(recheck) && remove(command)) // 代码2.1
                reject(command);
            else if (workerCountOf(recheck) == 0) // 代码2.2
                addWorker(null, false);
        }
        else if (!addWorker(command, false)) // 代码3
            reject(command);
    }

为了是用户更容易理解，方法内还通过三个步骤对执行逻辑，做了更进一步的解释：
step1：如果当前正在运行的线程数<corePoolSize【核心线程数】，系统就用command这个待执行任务开启一个新的线程【其实是构造worker并的同时开启新的线程】，而这个工作是通过调用addWorker方法完成的，该方法可以自动检测运行状态和工作线程数。而且，如果在不该添加thread的时候添加了，它会返回false作为警示。举例：线程池处于STOP状态，就不允许线程运行。
该步骤对应代码1处的条件。
step2：如果一个任务可以成功的加入阻塞队列，我们就要对预执行行为做二次检测：2.1 我们是否本应该先添加一个新的线程【用于执行command任务】，因为从上一次核对到现在线程池中的线程可能已经死了；2.2 当前线程池已经不是running状态了。根据核对结果，如果线程池停了就重新执行入队操作；如果工作线程空了，就先启动一个新的工作线程获取阻塞队列里的执行任务，维持正常的执行工作。
代码2对应这个步骤，其中代码2.1处所谓的恢复线程池，就是执行拒绝策略。代码2.2处倒是真的通过addWorker启动了一个新的工作线程。
step3：如果阻塞队列已满，就要尝试通过addWorker启动一个新的工作线程。如果启动失败，就要执行拒绝策略：关闭线程池或者执行拒绝任务。
整个线程池执行的策略，可以参考下面的流程图：

线程池执行策略流程图

二、任务处理逻辑

在上面的步骤中，我们一直在提到addWorker这个方法，是否执行拒绝策略是由它通过返回一个bool值决定的。看源码说明：
注释译文：（该方法）根据当前线程池状态、核心线程数、最大线程数等临界值，来决定是否可以向线程池中添加一个新的worker【执行线程】。如果添加完成，执行线程的数量也要做相应的调整。如果可能的话，甚至要创建并开启一个新的线程，把传入firstTask作为第一个任务。在线程池STOP、SHUTDOWN状态下，返回false。线程工厂申请创建线程失败，或者申请过程中发生OOM，也返回false。
firstTask参数就是在execute执行时传入的command待执行任务，如果执行线程数<核心线程数，或者队列满的时候执行线程说小于最大线程数，addWorker都会通过它创建一个新的执行线程worker。
初始空闲线程常常通过prestartCoreThread 方法创建，或者用来代替其他的死亡线程。
当然firstTask也可能为空，比如系统在将command插入阻塞队列后发现，执行线程数为0，就会先创建一个空任务线程执行。因为就算执行任务的线程数为0，也要有一个调度线程负责获取任务。

  /**
     * Checks if a new worker can be added with respect to current
     * pool state and the given bound (either core or maximum). If so,
     * the worker count is adjusted accordingly, and, if possible, a
     * new worker is created and started, running firstTask as its
     * first task. This method returns false if the pool is stopped or
     * eligible to shut down. It also returns false if the thread
     * factory fails to create a thread when asked.  If the thread
     * creation fails, either due to the thread factory returning
     * null, or due to an exception (typically OutOfMemoryError in
     * Thread.start()), we roll back cleanly.
     *
     * @param firstTask the task the new thread should run first (or
     * null if none). Workers are created with an initial first task
     * (in method execute()) to bypass queuing when there are fewer
     * than corePoolSize threads (in which case we always start one),
     * or when the queue is full (in which case we must bypass queue).
     * Initially idle threads are usually created via
     * prestartCoreThread or to replace other dying workers.
     *
     * @param core if true use corePoolSize as bound, else
     * maximumPoolSize. (A boolean indicator is used here rather than a
     * value to ensure reads of fresh values after checking other pool
     * state).
     * @return true if successful
     */
    private boolean addWorker(Runnable firstTask, boolean core) {
        retry:  // break时，可以直接结束所有循环。
        for (;;) {
            
            int c = ctl.get();
            int rs = runStateOf(c);

            // Check if queue empty only if necessary.
            if (rs >= SHUTDOWN &&   
                ! (rs == SHUTDOWN &&
                   firstTask == null &&
                   ! workQueue.isEmpty()))
                return false;

            for (;;) {
                int wc = workerCountOf(c);
                if (wc >= CAPACITY ||
                    wc >= (core ? corePoolSize : maximumPoolSize))
                    return false;
                if (compareAndIncrementWorkerCount(c))
                    break retry; //  结束内外循环
                c = ctl.get();  // Re-read ctl
                if (runStateOf(c) != rs)
                    continue retry; //  结束外循环的当前循环，执行下一个外循环
                // else CAS failed due to workerCount change; retry inner loop
            }
        }

        //  代码1
        boolean workerStarted = false;
        boolean workerAdded = false;
        Worker w = null;
        try {  //  代码2
            w = new Worker(firstTask);
            final Thread t = w.thread;
            if (t != null) {
                //  代码3
                final ReentrantLock mainLock = this.mainLock;
                mainLock.lock();
                try {
                    // Recheck while holding lock.
                    // Back out on ThreadFactory failure or if
                    // shut down before lock acquired.
                    int rs = runStateOf(ctl.get());

                    //  代码4
                    if (rs < SHUTDOWN ||
                        (rs == SHUTDOWN && firstTask == null)) {
                        if (t.isAlive()) // precheck that t is startable
                            throw new IllegalThreadStateException();
                        workers.add(w);
                        int s = workers.size();
                        if (s > largestPoolSize)
                            largestPoolSize = s;
                        workerAdded = true;
                    }
                } finally {
                    mainLock.unlock();
                }
                if (workerAdded) { //  代码5
                    t.start(); 
                    workerStarted = true;
                }
            }
        } finally {
            if (! workerStarted)
                addWorkerFailed(w);
        }
        return workerStarted;
    }

知道addWorker的功能，我们就可以按部就班的查看源码的实现逻辑。整个方法大致分为两个步骤：
step1：通过两个for循环检测整个线程池环境：判断线程池是否处于RUNNING状态【负数】、SHUTDOWN状态【0】、其他状态【正数】，只有RUNNING状态和符合条件的SHUTDOWN状态【比如，阻塞队列为空】才允许创建新的执行线程。
在第二个for循环处对允许创建执行线程的临界值做了判断，只有执行线程数小于核心线程数或者队列满的时候，总执行线程数小于最大线程数才允许添加新的执行任务。并且通过compareAndIncrementWorkerCount更新执行线程数。
step2：构造worker执行线程实体，并发起执行任务。
代码1处，声明两个局部变量workerStarted和workerAdded表征执行线程worker是否添加成功并发起执行；
代码2处，构造了worker实体，并将待执行任务firstTask一并传入。其构造方法如下：

        /**
         * Creates with given first task and thread from ThreadFactory.
         * @param firstTask the first task (null if none)
         */
        Worker(Runnable firstTask) {
            setState(-1); // inhibit interrupts until runWorker
            this.firstTask = firstTask;
            this.thread = getThreadFactory().newThread(this);
        }

可见无论执行任务是否为空，线程工厂都会创建一个新的线程thread。看看整个thread的定义：

        /** Thread this worker is running in.  Null if factory fails. */
        final Thread thread;

执行线程worker要运行在整个thread中。换句话说，任务在这个thread中执行。
注意：newThread()的参数this，说明worker也是一个Runnable对象。
紧接着，
代码3处，声明了一个ReentrantLock可重入锁对象mainLock，它锁住了下面的try代码块，因为这里要进行一个重要的操作：
workers.add(w)；
这里的workers是一个HashSet集合，如下：

/**
* Set containing all worker threads in pool. Accessed only when
* holding mainLock.
*/
// Android-added: @ReachabilitySensitive
@ReachabilitySensitive
private final HashSet<Worker> workers = new HashSet<>();

它存储了线程里所有的执行线程，当多个任务同时访问他的时候，必然要做好安全保护。所以，规定只有持有mainlock的请求任务才能修改它。一旦一个任务添加完成，workerAdded被置位true，系统就要发起工作线程的执行。
代码5处，通过t.start()方法发起工作线程的执行。t就是代码2处，构造worker时，ThreadFactory创建的执行线程。
注意：之所以用workerStarted表征线程启动，是因为执行线程可能启动失败。方便通过addWorkerFailed完成兜底操作。
整个执行逻辑比较简单，如下图：

addWorker执行逻辑

三、任务执行逻辑

还记得刚刚发起的Thread吗，它在被创建的时候传入了worker实例：

this.thread = getThreadFactory().newThread(this);

看看worker的继承关系：

    /**
     * Class Worker mainly maintains interrupt control state for
     * threads running tasks, along with other minor bookkeeping.
     * This class opportunistically extends AbstractQueuedSynchronizer
     * to simplify acquiring and releasing a lock surrounding each
     * task execution.  This protects against interrupts that are
     * intended to wake up a worker thread waiting for a task from
     * instead interrupting a task being run.  We implement a simple
     * non-reentrant mutual exclusion lock rather than use
     * ReentrantLock because we do not want worker tasks to be able to
     * reacquire the lock when they invoke pool control methods like
     * setCorePoolSize.  Additionally, to suppress interrupts until
     * the thread actually starts running tasks, we initialize lock
     * state to a negative value, and clear it upon start (in
     * runWorker).
     */
    private final class Worker
        extends AbstractQueuedSynchronizer
        implements Runnable

也就是说Worker类是一个Runnable的实现类，thread.start其实启动的是Worker的run方法。注意：这个类，主要是为了维持正在运行任务的线程的中断控制状态。它还顺便继承了AbstractQueuedSynchronizer，这是为了简化每一次围绕任务执行的锁请求和释放操作。
看一下Worker的run方法：

        /** Delegates main run loop to outer runWorker. */
        public void run() {
            runWorker(this);
        }

它把run循环委托给了外部的runWorker方法。系统对该方法也做了较为详细的说明：

    /**
     * Main worker run loop.  Repeatedly gets tasks from queue and
     * executes them, while coping with a number of issues:
     *
     * 1. We may start out with an initial task, in which case we
     * don't need to get the first one. Otherwise, as long as pool is
     * running, we get tasks from getTask. If it returns null then the
     * worker exits due to changed pool state or configuration
     * parameters.  Other exits result from exception throws in
     * external code, in which case completedAbruptly holds, which
     * usually leads processWorkerExit to replace this thread.
     *
     * 2. Before running any task, the lock is acquired to prevent
     * other pool interrupts while the task is executing, and then we
     * ensure that unless pool is stopping, this thread does not have
     * its interrupt set.
     *
     * 3. Each task run is preceded by a call to beforeExecute, which
     * might throw an exception, in which case we cause thread to die
     * (breaking loop with completedAbruptly true) without processing
     * the task.
     *
     * 4. Assuming beforeExecute completes normally, we run the task,
     * gathering any of its thrown exceptions to send to afterExecute.
     * We separately handle RuntimeException, Error (both of which the
     * specs guarantee that we trap) and arbitrary Throwables.
     * Because we cannot rethrow Throwables within Runnable.run, we
     * wrap them within Errors on the way out (to the thread's
     * UncaughtExceptionHandler).  Any thrown exception also
     * conservatively causes thread to die.
     *
     * 5. After task.run completes, we call afterExecute, which may
     * also throw an exception, which will also cause thread to
     * die. According to JLS Sec 14.20, this exception is the one that
     * will be in effect even if task.run throws.
     *
     * The net effect of the exception mechanics is that afterExecute
     * and the thread's UncaughtExceptionHandler have as accurate
     * information as we can provide about any problems encountered by
     * user code.
     *
     * @param w the worker
     */
    final void runWorker(Worker w) {
        Thread wt = Thread.currentThread();
        Runnable task = w.firstTask;
        w.firstTask = null;
        w.unlock(); // allow interrupts
        boolean completedAbruptly = true;
        try {
            while (task != null || (task = getTask()) != null) {
                w.lock();
                // If pool is stopping, ensure thread is interrupted;
                // if not, ensure thread is not interrupted.  This
                // requires a recheck in second case to deal with
                // shutdownNow race while clearing interrupt
                if ((runStateAtLeast(ctl.get(), STOP) ||
                     (Thread.interrupted() &&
                      runStateAtLeast(ctl.get(), STOP))) &&
                    !wt.isInterrupted())
                    wt.interrupt();
                try {
                    beforeExecute(wt, task);
                    Throwable thrown = null;
                    try {
                        task.run(); 
                    } catch (RuntimeException x) {
                        thrown = x; throw x;
                    } catch (Error x) {
                        thrown = x; throw x;
                    } catch (Throwable x) {
                        thrown = x; throw new Error(x);
                    } finally {
                        afterExecute(task, thrown);
                    }
                } finally {
                    task = null;
                    w.completedTasks++;
                    w.unlock();
                }
            }
            completedAbruptly = false;
        } finally {
            processWorkerExit(w, completedAbruptly);
        }
    }

虽然这个方法，从头到尾扯了一箩筐的闲片儿，归结起来就是：获取任务task、执行任务task.run、中断控制、执行前任务处理、执行后任务处理。

1. 获取任务

1.1 firstTask不为null
也就是执行线程<核心线程数，或者队列已满，执行线程<最大线程数的情况下，直接执行firstTask中的任务
1.2 firstTask为null
此时需要从阻塞队列获取正在等待的任务。它是通过getTask方法完成。
源码如下：

    /**
     * Performs blocking or timed wait for a task, depending on
     * current configuration settings, or returns null if this worker
     * must exit because of any of:
     * 1. There are more than maximumPoolSize workers (due to
     *    a call to setMaximumPoolSize).
     * 2. The pool is stopped.
     * 3. The pool is shutdown and the queue is empty.
     * 4. This worker timed out waiting for a task, and timed-out
     *    workers are subject to termination (that is,
     *    {@code allowCoreThreadTimeOut || workerCount > corePoolSize})
     *    both before and after the timed wait, and if the queue is
     *    non-empty, this worker is not the last thread in the pool.
     *
     * @return task, or null if the worker must exit, in which case
     *         workerCount is decremented
     */
    private Runnable getTask() {
        boolean timedOut = false; // Did the last poll() time out?

        for (;;) {
            int c = ctl.get();
            int rs = runStateOf(c);

            // Check if queue empty only if necessary.
            if (rs >= SHUTDOWN && (rs >= STOP || workQueue.isEmpty())) {
                decrementWorkerCount();
                return null;
            }

            int wc = workerCountOf(c);

            // Are workers subject to culling?
            boolean timed = allowCoreThreadTimeOut || wc > corePoolSize;

            if ((wc > maximumPoolSize || (timed && timedOut))
                && (wc > 1 || workQueue.isEmpty())) {
                if (compareAndDecrementWorkerCount(c))
                    return null;
                continue;
            }

            try {
                Runnable r = timed ?
                    workQueue.poll(keepAliveTime, TimeUnit.NANOSECONDS) :
                    workQueue.take();
                if (r != null)
                    return r;
                timedOut = true;
            } catch (InterruptedException retry) {
                timedOut = false;
            }
        }
    }

通过一个无限循环，检测线程池环境、队列状况，没有异常的情况下，就从队列里获取一个等待执行的任务。
1.3 中断控制
线程的执行要跟线程池的状态保持一致：如果线程池停止，执行线程就要中断；如果线程池运行，执行线程就不能中断。
1.4 预执行处理
在执行任务执行，runWorker调用beforeExecute(wt, task)方法，将将要执行的工作线程和任务拦截处理，比如：重新初始化ThreadLocals、打印日志等
1.5 执行后处理
在执行完task后，runWorker最终会调用afterExecute(task, thrown)方法，将任务和搜集的异常传入。当然，如果调用submit方法可以针对每一个执行任务的结果进行监听。
值得注意的是，以上五个步骤都是在
AQS的保护下完成的。还记得Worker的继承关系吗？它是AbstractQueuedSynchronizer的派生类，Worker利用了AQS的独占机制，来控制任务执行过程的安全。之所以，没有使用ReentrantLock这个可重入锁，是为了防止调用类似于setCorePoolSize的方法时，worker任务可以再次获取到锁。

问题1. 什么是可重入锁？ThreadPoolExecutor在向workers添加任务的时候使用了ReentrantLock这个可重入锁？使用sychronized行吗？
ReentrantLock是AQS的派生类，它不仅支持synchronized加锁方式的基本功能，还做了相应的扩充：支持中断、超时、在获取失败的时候可以尝试二次获取。synchronized的灵活性相对较差，而且他是基于监视器模式，在大量并发的情况下性能不如ReentrantLock。
问题2. 什么是AQS，谈谈对它的理解
AQS也就是AbstractQueuedSynchronizer，他是一个框架。提供了原子式管理、阻塞和唤醒线程的功能。Worker和ReentrantLock、CountdownLatch都是基于这种框架的，它通过一个volatile修饰的state变量，控制锁的可重入性。Worker只使用了State的0和1两个值，所以不支持可重入机制。可重入的意思就是：一个线程可以对一个临界资源可以重复加锁，并且将请求次数+1。释放的时候，将请求次数-1。
为了实现原子式管理，它通过CAS修改STATE状态，而且它内存通过双向链表队列控制锁的acquire和release，支持独占和共享模式。
问题3. 如何监听线程池中执行任务的执行结果？
ThreadPoolExecutor继承自AbstractExecutorService，该方法提拱了submit方法执行任务并返回FutureTask，然后用afterExecute即可监听结果。
问题4. 为什么阿里不允许使用系统提供的线程池方法构建线程池呢？
以单例线程池为例，它使用的是阻塞队列是LinkedBlockingQueue，该队列允许最多添加Integer.MAX_VALUE个task进行等待，如果CPU执行效率低，而任务量过于繁重的情况下OOM是不可避免的。由于四个基本线程池的限制条件是固定的，可控性相对较差，为了灵活控制线程池的运行，使用自定义方案是不错的选择。