〇.序
起因,在Fabric遇到了很多BUG..出于想对于BUG的来龙去脉做进一步的了解.做了一番调查..本文也是Android Crash的防护与追踪的后续研究.
-
这是Fabric给出的一些link
传送门 :
How to handle :java.util.concurrent.TimeoutException: android.os.BinderProxy.finalize() timed out after 10 seconds errors?
WebView crashes on opening of popup
TimeoutException - timed out after 10 seconds
然而并没有解决我的疑惑.所以我决定从源码的角度来了解一下.这种异常发生的真实原因.
一.浅读Daemons
1.Daemons的由来:
代码位置:java.lang.Daemons.
Daemons开始于Zygote进程创建子进程.
三件事情:
- VM_HOOKS.preFork(),做准备工作。
- nativeForkAndSpecialize,创建子进程
- VM_HOOKS.postForkCommon();启动Zygote的4个Daemon线程,java堆整理,引用队列,以及析构线程。
//第一步:
//停止四个线程:Daemon线程,java堆整理,引用队列,析构线程
//也就是创建子进程的时候,不能有这几个线程搅和。
public void preFork() {
Daemons.stop();
waitUntilAllThreadsStopped();
token = nativePreFork();
}
....
/**
* Called by the zygote in both the parent and child processes after
* every fork. In the child process, this method is called after
* {@code postForkChild}.
*/
//第三步,启动Daemons
public void postForkCommon() {
Daemons.start();
}
2.代码开篇:
/**
* Calls Object.finalize() on objects in the finalizer reference queue. The VM
* will abort if any finalize() call takes more than the maximum finalize time
* to complete.
*
* @hide
*/
public final class Daemons {
private static final int NANOS_PER_MILLI = 1000 * 1000;
private static final int NANOS_PER_SECOND = NANOS_PER_MILLI * 1000;
private static final long MAX_FINALIZE_NANOS = 10L * NANOS_PER_SECOND;
public static void start() {
ReferenceQueueDaemon.INSTANCE.start();
FinalizerDaemon.INSTANCE.start();
FinalizerWatchdogDaemon.INSTANCE.start();
HeapTaskDaemon.INSTANCE.start();
}
public static void stop() {
HeapTaskDaemon.INSTANCE.stop();
ReferenceQueueDaemon.INSTANCE.stop();
FinalizerDaemon.INSTANCE.stop();
FinalizerWatchdogDaemon.INSTANCE.stop();
}
...
}
Daemons类的开头写下了关键的注释,当调用finalizer reference queue(终结引用队列)中的对象的Object.finalize()方法的时候.假如有任何调用finalize() 方法时.超出了最大终结时间(一般为10秒).VM就会中止.这个最大终结时间就是MAX_FINALIZE_NANOS
2. 1四个守护线程
- ReferenceQueueDaemon:引用队列守护线程。我们知道,在创建引用对象的时候,可以关联一个队列。当被引用对象引用的对象被GC回收的时候,被引用对象就会被加入到其创建时关联的队列去。这个加入队列的操作就是由ReferenceQueueDaemon守护线程来完成的。这样应用程序就可以知道哪些被引用的对象已经被回收了。
- FinalizerDaemon:析构守护线程。对于重写了成员函数finalize的对象,它们被GC决定回收时,并没有马上被回收,而是被放入到一个队列中,等待FinalizerDaemon守护线程去调用它们的成员函数finalize,然后再被回收。
- FinalizerWatchdogDaemon:析构监护守护线程。用来监控FinalizerDaemon线程的执行。一旦检测那些重写了finalize的对象在执行成员函数finalize时超出一定时间,那么就会退出VM。
- HeapTaskDaemon : 堆裁剪守护线程。用来执行裁剪堆的操作,也就是用来将那些空闲的堆内存归还给系统。
2.2 重要属性解析
MAX_FINALIZE_NANOS:此类开始就定义了一个MAX_FINALIZE_NANOS.可能有类似经历的同学,看了之后就能明白.这个是finalize方法执行的超时时间.(经常会遇到XXX.finalize() timed out after 10 seconds)
Daemons提供了start(),stop()方法,分别开启和停止四个守护进程.
3.四大守护线程的抽象类
/**
* A background task that provides runtime support to the application.
* Daemons can be stopped and started, but only so that the zygote can be a
* single-threaded process when it forks.
*/
private static abstract class Daemon implements Runnable {
private Thread thread;
private String name;
protected Daemon(String name) {
this.name = name;
}
public synchronized void start() {
if (thread != null) {
throw new IllegalStateException("already running");
}
thread = new Thread(ThreadGroup.systemThreadGroup, this, name);
thread.setDaemon(true);
thread.start();
}
public abstract void run();
...
}
Daemon线程是四大守护线程的抽象类.核心是在start()方法的时候,将自己放到ThreadGroup.systemThreadGroup中.并设置为守护线程.这部分不过分解读.
二. 本文重点:FinalizerWatchdogDaemon
本文重点是因为遇到finalize() timed out after 10 seconds.这样的异常,想要对此调研一番.
1.初识FinalizerWatchdogDaemon
1.1 类注释解读
FinalizerWatchdogDaemon的类注释写的很清楚.当一个实例的finalizer函数,花费了超过MAX_FINALIZATION_MILLIS时间(默认为10秒).将被认定finalizer卡死,此时,FinalizerWatchdogDaemon将会让VM退出.
/**
* The watchdog exits the VM if the finalizer ever gets stuck. We consider
* the finalizer to be stuck if it spends more than MAX_FINALIZATION_MILLIS
* on one instance.
*/
private static class FinalizerWatchdogDaemon extends Daemon {
private static final FinalizerWatchdogDaemon INSTANCE = new FinalizerWatchdogDaemon();
FinalizerWatchdogDaemon() {
super("FinalizerWatchdogDaemon");
}
@Override public void run() {
...
}
}
1.2 错误日志元凶
- 找你一万年.当读到这的时候,大家一定发现了.这就是我们看到的那个TimeoutException.拼装.finalize() timed out after 10 seconds.的整个作案现场.
- h.uncaughtException(Thread.currentThread(), syntheticException);上面的注释也特别关键,虽然这个TimeoutException是我抛出的.但是并不是我这个线程超时,我只是静静的看着它.
当WatchDog线程检测finalizing!=null的时候.会提示TimeOutException.
@Override public void run() {
while (isRunning()) {
if (!sleepUntilNeeded()) {
// We have been interrupted, need to see if this daemon has been stopped.
continue;
}
final Object finalizing = waitForFinalization();
if (finalizing != null && !VMRuntime.getRuntime().isDebuggerActive()) {
finalizerTimedOut(finalizing);
break;
}
}
}
TimeOutException的生成.
private static void finalizerTimedOut(Object object) {
// The current object has exceeded the finalization deadline; abort!
String message = object.getClass().getName() + ".finalize() timed out after "
+ (MAX_FINALIZE_NANOS / NANOS_PER_SECOND) + " seconds";
Exception syntheticException = new TimeoutException(message);
// We use the stack from where finalize() was running to show where it was stuck.
syntheticException.setStackTrace(FinalizerDaemon.INSTANCE.getStackTrace());
Thread.UncaughtExceptionHandler h = Thread.getDefaultUncaughtExceptionHandler();
// Send SIGQUIT to get native stack traces.
try {
Os.kill(Os.getpid(), OsConstants.SIGQUIT);
// Sleep a few seconds to let the stack traces print.
Thread.sleep(5000);
} catch (Exception e) {
System.logE("failed to send SIGQUIT", e);
} catch (OutOfMemoryError ignored) {
// May occur while trying to allocate the exception.
}
if (h == null) {
// If we have no handler, log and exit.
System.logE(message, syntheticException);
System.exit(2);
}
// Otherwise call the handler to do crash reporting.
// We don't just throw because we're not the thread that
// timed out; we're the thread that detected it.
h.uncaughtException(Thread.currentThread(), syntheticException);
}
观察FinalizerDaemons的回收.
/**
* Return an object that took too long to finalize or return null.
* Wait MAX_FINALIZE_NANOS. If the FinalizerDaemon took essentially the whole time
* processing a single reference, return that reference. Otherwise return null.
*/
private Object waitForFinalization() {
long startCount = FinalizerDaemon.INSTANCE.progressCounter.get();
// Avoid remembering object being finalized, so as not to keep it alive.
if (!sleepFor(MAX_FINALIZE_NANOS)) {
// Don't report possibly spurious timeout if we are interrupted.
return null;
}
if (getNeedToWork() && FinalizerDaemon.INSTANCE.progressCounter.get() == startCount) {
// We assume that only remove() and doFinalize() may take time comparable to
// MAX_FINALIZE_NANOS.
// We observed neither the effect of the gotoSleep() nor the increment preceding a
// later wakeUp. Any remove() call by the FinalizerDaemon during our sleep
// interval must have been followed by a wakeUp call before we checked needToWork.
// But then we would have seen the counter increment. Thus there cannot have
// been such a remove() call.
// The FinalizerDaemon must not have progressed (from either the beginning or the
// last progressCounter increment) to either the next increment or gotoSleep()
// call. Thus we must have taken essentially the whole MAX_FINALIZE_NANOS in a
// single doFinalize() call. Thus it's OK to time out. finalizingObject was set
// just before the counter increment, which preceded the doFinalize call. Thus we
// are guaranteed to get the correct finalizing value below, unless doFinalize()
// just finished as we were timing out, in which case we may get null or a later
// one. In this last case, we are very likely to discard it below.
Object finalizing = FinalizerDaemon.INSTANCE.finalizingObject;
sleepFor(NANOS_PER_SECOND / 2);
// Recheck to make it even less likely we report the wrong finalizing object in
// the case which a very slow finalization just finished as we were timing out.
if (getNeedToWork()
&& FinalizerDaemon.INSTANCE.progressCounter.get() == startCount) {
return finalizing;
}
}
return null;
}
顺便学习下 System.exit(2);的数字的意思.
System.exit(2);
//0表示正常退出,0以上表示非正常退出.
private static final int RUNNING = 0;
private static final int HOOKS = 1;
private static final int FINALIZERS = 2;
传送门 : StackOverflow传送门,System.exit()扩展阅读
1.3 不得不提的FinalizerDaemon
WatchDog顾名思义,观察的就是这个FinalizerDaemon.一旦FinalizerDaemon在处理一个对象的时候超过10s中,那么就会结束进程,导致崩溃。附上FinalizerDaemon的源码.源码中的注释很详尽.可以自行阅读.
private static class FinalizerDaemon extends Daemon {
private static final FinalizerDaemon INSTANCE = new FinalizerDaemon();
private final ReferenceQueue<Object> queue = FinalizerReference.queue;
private final AtomicInteger progressCounter = new AtomicInteger(0);
// Object (not reference!) being finalized. Accesses may race!
private Object finalizingObject = null;
FinalizerDaemon() {
super("FinalizerDaemon");
}
@Override public void run() {
// This loop may be performance critical, since we need to keep up with mutator
// generation of finalizable objects.
// We minimize the amount of work we do per finalizable object. For example, we avoid
// reading the current time here, since that involves a kernel call per object. We
// limit fast path communication with FinalizerWatchDogDaemon to what's unavoidable: A
// non-volatile store to communicate the current finalizable object, e.g. for
// reporting, and a release store (lazySet) to a counter.
// We do stop the FinalizerWatchDogDaemon if we have nothing to do for a
// potentially extended period. This prevents the device from waking up regularly
// during idle times.
// Local copy of progressCounter; saves a fence per increment on ARM and MIPS.
int localProgressCounter = progressCounter.get();
while (isRunning()) {
try {
// Use non-blocking poll to avoid FinalizerWatchdogDaemon communication
// when busy.
FinalizerReference<?> finalizingReference = (FinalizerReference<?>)queue.poll();
if (finalizingReference != null) {
finalizingObject = finalizingReference.get();
progressCounter.lazySet(++localProgressCounter);
} else {
finalizingObject = null;
progressCounter.lazySet(++localProgressCounter);
// Slow path; block.
FinalizerWatchdogDaemon.INSTANCE.goToSleep();
finalizingReference = (FinalizerReference<?>)queue.remove();
finalizingObject = finalizingReference.get();
progressCounter.set(++localProgressCounter);
FinalizerWatchdogDaemon.INSTANCE.wakeUp();
}
doFinalize(finalizingReference);
} catch (InterruptedException ignored) {
} catch (OutOfMemoryError ignored) {
}
}
}
@FindBugsSuppressWarnings("FI_EXPLICIT_INVOCATION")
private void doFinalize(FinalizerReference<?> reference) {
FinalizerReference.remove(reference);
Object object = reference.get();
reference.clear();
try {
object.finalize();
} catch (Throwable ex) {
// The RI silently swallows these, but Android has always logged.
System.logE("Uncaught exception thrown by finalizer", ex);
} finally {
// Done finalizing, stop holding the object as live.
finalizingObject = null;
}
}
}
三.问题分析:
从GC来看.JAVA在内存回收的时候.并不是马上将对象销毁,而是放入到一个队列中,由FinalizerDaemon(析构守护线程)调用他们的finalize方法,再被回收,所以当FinalizerDaemon回收过程中.超过10秒,FinalizerWatchdogDaemon线程就会抛出TimeoutException XXX.finalize() timed out after 10 seconds.结尾附赠了两篇文章,有提到析构相关的性能开销和可以去追踪问题的办法.供大家扩展阅读.
四.聊聊FinalizerReference
内存占用大户(FinalizerReference)
[待续]换个认知角度再来看看这个问题.
结尾:
两篇不错的文章.
传送门 :
提升Android下内存的使用意识和排查能力
再谈Finalizer对象--大型App中内存与性能的隐性杀手