1、前言
在文章开篇,我们抛出两个问题:
- 当我们的应用发生crash或是anr的时候,系统框架做了什么?
- 我们是否可以接收系统监控到的应用崩溃,并进行记录和上传呢?
而要解释这个问题,就不得不从进程的启动开始讲起,因为监控也是在这个时候开始的..
2、源码分析
2.1、启动app进程
有深入了解过android底层的同学应该会都知道,Android进程都是由zygote孵化而来。而启动zygote和其他Java程序的应用程序, 代码位于frameworks/base/cmds/app_process/app_main.cpp
关键代码:
int main(int argc, char* const argv[])
{
....
if (strcmp(arg, "--zygote") == 0) {
zygote = true;
niceName = ZYGOTE_NICE_NAME;
} else if (strcmp(arg, "--start-system-server") == 0) {
startSystemServer = true;
} else if (strcmp(arg, "--application") == 0) {
application = true;
} else if (strncmp(arg, "--nice-name=", 12) == 0) {
niceName.setTo(arg + 12);
}
....
if (zygote) {
runtime.start("com.android.internal.os.ZygoteInit", args, zygote);
} else if (className) {
runtime.start("com.android.internal.os.RuntimeInit", args, zygote);
} else {
fprintf(stderr, "Error: no class name or --zygote supplied.\n");
app_usage();
LOG_ALWAYS_FATAL("app_process: no class name or --zygote supplied.");
}
}
可以看到,app_process 里面定义了三种应用程序类型:
- Zygote: com.android.internal.os.ZygoteInit
- System Server, 不单独启动,而是由Zygote启动
- 其他指定类名的Java 程序,比如说常用的 am. /system/bin/am ,其实是一个shell程序,它的真正实现是:
exec app_process @"
根据传入参数的不同可以有两种启动方式,一个是 "com.android.internal.os.RuntimeInit", 另一个是 ”com.android.internal.os.ZygoteInit", 对应RuntimeInit 和 ZygoteInit 两个类, 图中用绿色和粉红色分别表示。这两个类的主要区别在于Java端,可以明显看出,ZygoteInit 相比 RuntimeInit 多做了很多事情,比如说 “preload", "gc" 等等。但是在Native端,他们都做了相同的事, startVM() 和 startReg().而我们的目的是在这两类的java端实现中.
2.2、 ZygoteInit
当VM准备就绪,就可以运行Java代码了,系统也将在此第一次进入Java世界,还记得app_main.cpp里面调到的 Runtime.start()的参数吗, 那就是我们要运行的Java类。Android支持两个类做为起点,一个是‘com.android.internal.os.ZygoteInit', 另一个是'com.android.internal.os.RuntimeInit'。
此外Runtime_Init 类里还定义了一个ZygoteInit() 静态方法。它在Zygote 创建一个新的应用进程的时候被创建,它和RuntimeInit 类的main() 函数做了以下相同的事情:
redirectLogStreams(): 将System.out 和 System.err 输出重定向到Android 的Log系统(定义在 android.util.Log).
commonInit(): 初始化了一下系统属性,其中最重要的一点就是设置了一个未捕捉异常的handler,当代码有任何未知异常,就会执行它
RuntimeInit:
protected static final void commonInit() {
if (DEBUG) Slog.d(TAG, "Entered RuntimeInit!");
/*
* set handlers; these apply to all threads in the VM. Apps can replace
* the default handler, but not the pre handler.
*/
LoggingHandler loggingHandler = new LoggingHandler();
Thread.setUncaughtExceptionPreHandler(loggingHandler);
Thread.setDefaultUncaughtExceptionHandler(new KillApplicationHandler(loggingHandler));
/*
...
}
调试过Android代码的同学经常看到的"*** FATAL EXCEPTION IN SYSTEM PROCESS" 打印就出自LoggingHandler这里:
private static class LoggingHandler implements Thread.UncaughtExceptionHandler {
public volatile boolean mTriggered = false;
@Override
public void uncaughtException(Thread t, Throwable e) {
...
// Don't re-enter if KillApplicationHandler has already run
if (mCrashing) return;
// mApplicationObject is null for non-zygote java programs (e.g. "am")
// There are also apps running with the system UID. We don't want the
// first clause in either of these two cases, only for system_server.
if (mApplicationObject == null && (Process.SYSTEM_UID == Process.myUid())) {
Clog_e(TAG, "*** FATAL EXCEPTION IN SYSTEM PROCESS: " + t.getName(), e);
} else {
StringBuilder message = new StringBuilder();
// The "FATAL EXCEPTION" string is still used on Android even though
// apps can set a custom UncaughtExceptionHandler that renders uncaught
// exceptions non-fatal.
message.append("FATAL EXCEPTION: ").append(t.getName()).append("\n");
final String processName = ActivityThread.currentProcessName();
if (processName != null) {
message.append("Process: ").append(processName).append(", ");
}
message.append("PID: ").append(Process.myPid());
Clog_e(TAG, message.toString(), e);
}
}
}
KillApplicationHandler:
private static class KillApplicationHandler implements Thread.UncaughtExceptionHandler {
private final LoggingHandler mLoggingHandler;
public KillApplicationHandler(LoggingHandler loggingHandler) {
this.mLoggingHandler = Objects.requireNonNull(loggingHandler);
}
@Override
public void uncaughtException(Thread t, Throwable e) {
try {
...
// Don't re-enter -- avoid infinite loops if crash-reporting crashes.
if (mCrashing) return;
mCrashing = true;
...
//关键代码
// Bring up crash dialog, wait for it to be dismissed
ActivityManager.getService().handleApplicationCrash(
mApplicationObject, new ApplicationErrorReport.ParcelableCrashInfo(e));
} catch (Throwable t2) {
...
} finally {
// Try everything to make sure this process goes away.
Process.killProcess(Process.myPid());
System.exit(10);
}
}
ActivityManager.getService().handleApplicationCrash-->ActivityManagerService.handleApplicationCrash,此处转到ams进行处理。
从这里的调用函数我们也可以了看出,handleApplicationCrash,是处理应用crash的,但其实还有native_crash、ANR、wtf,下面我们就来说说各种类型的崩溃处理流程.
2.3、Application crash
ActivityManagerService:
public void handleApplicationCrash(IBinder app,
ApplicationErrorReport.ParcelableCrashInfo crashInfo) {
ProcessRecord r = findAppProcess(app, "Crash");
final String processName = app == null ? "system_server"
: (r == null ? "unknown" : r.processName);
handleApplicationCrashInner("crash", r, processName, crashInfo);
}
从这里可以看出,若传入app为null时,processName就设置为system_server
void handleApplicationCrashInner(String eventType, ProcessRecord r, String processName,
ApplicationErrorReport.CrashInfo crashInfo) {
...
//关键
addErrorToDropBox(eventType, r, processName, null, null, null, null, null, crashInfo);
if (r == null || r.pid == MY_PID) {
if ( "crash".equals(eventType) ) {
dumpErrorInfo(processName, MY_PID, 2, 2);
SystemClock.sleep(3000);
}
} else {
}
mAppErrors.crashApplication(r, crashInfo);
}
上面会调用addErrorToDropBox将应用crash,进行封装输出,这里我们先不说addErrorToDropBox的具体逻辑,先说说其他类型的异常崩溃,你会发现,最终都是调用addErrorToDropBox方法。
2.4、native_crash
native_crash,顾名思义,就是native层发生的crash。其实他是通过一个NativeCrashListener线程去监控的。
final class NativeCrashListener extends Thread {
...
/*
* Daemon thread that accept()s incoming domain socket connections from debuggerd
* and processes the crash dump that is passed through.
*/
NativeCrashListener(ActivityManagerService am) {
mAm = am;
}
@Override
public void run() {
final byte[] ackSignal = new byte[1];
...
// The file system entity for this socket is created with 0777 perms, owned
// by system:system. selinux restricts things so that only crash_dump can
// access it.
{
File socketFile = new File(DEBUGGERD_SOCKET_PATH);
if (socketFile.exists()) {
socketFile.delete();
}
}
try {
FileDescriptor serverFd = Os.socket(AF_UNIX, SOCK_STREAM, 0);
final UnixSocketAddress sockAddr = UnixSocketAddress.createFileSystem(
DEBUGGERD_SOCKET_PATH);
Os.bind(serverFd, sockAddr);
Os.listen(serverFd, 1);
Os.chmod(DEBUGGERD_SOCKET_PATH, 0777);
//1.一直循环地读peerFd文件,若发生存在,则进入consumeNativeCrashData
while (true) {
FileDescriptor peerFd = null;
try {
if (MORE_DEBUG) Slog.v(TAG, "Waiting for debuggerd connection");
peerFd = Os.accept(serverFd, null /* peerAddress */);
if (MORE_DEBUG) Slog.v(TAG, "Got debuggerd socket " + peerFd);
if (peerFd != null) {
// the reporting thread may take responsibility for
// acking the debugger; make sure we play along.
//2.进入native crash数据处理流程
consumeNativeCrashData(peerFd);
}
} catch (Exception e) {
Slog.w(TAG, "Error handling connection", e);
} finally {
...
}
}
} catch (Exception e) {
Slog.e(TAG, "Unable to init native debug socket!", e);
}
}
// Read a crash report from the connection
void consumeNativeCrashData(FileDescriptor fd) {
try {
...
//3.启动NativeCrashReporter作为上报错误的新线程
final String reportString = new String(os.toByteArray(), "UTF-8");
(new NativeCrashReporter(pr, signal, reportString)).start();
} catch (Exception e) {
...
}
}
}
上报native_crash的线程-->NativeCrashReporter:
class NativeCrashReporter extends Thread {
ProcessRecord mApp;
int mSignal;
String mCrashReport;
NativeCrashReporter(ProcessRecord app, int signal, String report) {
super("NativeCrashReport");
mApp = app;
mSignal = signal;
mCrashReport = report;
}
@Override
public void run() {
try {
//1.包装崩溃信息
CrashInfo ci = new CrashInfo();
ci.exceptionClassName = "Native crash";
ci.exceptionMessage = Os.strsignal(mSignal);
ci.throwFileName = "unknown";
ci.throwClassName = "unknown";
ci.throwMethodName = "unknown";
ci.stackTrace = mCrashReport;
if (DEBUG) Slog.v(TAG, "Calling handleApplicationCrash()");
//2.转到ams中处理,跟普通crash一致,只是类型不一样
mAm.handleApplicationCrashInner("native_crash", mApp, mApp.processName, ci);
if (DEBUG) Slog.v(TAG, "<-- handleApplicationCrash() returned");
} catch (Exception e) {
Slog.e(TAG, "Unable to report native crash", e);
}
}
}
native crash跟到这里就结束了,后面的流程就是跟application crash一样,都会走到addErrorToDropBox中,这个我们后面再分析。
2.5、ANR
相信做android的同学都知道anr这个让人郁闷的东西,而发生的场景,也有好几种,比如Activity、Service、Broadcast。因为篇幅有限,我们这里就不讨论每种anr发生后的原因和具体的流程了,直接跳到已经触发ANR的位置。
AppErrors.appNotResponding:
final void appNotResponding(ProcessRecord app, ActivityRecord activity,
ActivityRecord parent, boolean aboveSystem, final String annotation) {
ArrayList<Integer> firstPids = new ArrayList<Integer>(5);
SparseArray<Boolean> lastPids = new SparseArray<Boolean>(20);
if (mService.mController != null) {
try {
//1.判断是否继续后面的流程,还是直接kill掉当前进程
// 0 == continue, -1 = kill process immediately
int res = mService.mController.appEarlyNotResponding(
app.processName, app.pid, annotation);
if (res < 0 && app.pid != MY_PID) {
app.kill("anr", true);
}
} catch (RemoteException e) {
mService.mController = null;
Watchdog.getInstance().setActivityController(null);
}
}
//2.记录发生anr的时间
long anrTime = SystemClock.uptimeMillis();
//3.更新cpu使用情况
if (ActivityManagerService.MONITOR_CPU_USAGE) {
mService.updateCpuStatsNow();
}
//可以在设置中设置发生anr后,是弹框显示还是后台处理,默认是后台
// Unless configured otherwise, swallow ANRs in background processes & kill the process.
boolean showBackground = Settings.Secure.getInt(mContext.getContentResolver(),
Settings.Secure.ANR_SHOW_BACKGROUND, 0) != 0;
boolean isSilentANR;
synchronized (mService) {
...
// In case we come through here for the same app before completing
// this one, mark as anring now so we will bail out.
app.notResponding = true;
//3.将anr写入event log中
EventLog.writeEvent(EventLogTags.AM_ANR, app.userId, app.pid,
app.processName, app.info.flags, annotation);
// Dump thread traces as quickly as we can, starting with "interesting" processes.
firstPids.add(app.pid);
// Don't dump other PIDs if it's a background ANR
isSilentANR = !showBackground && !isInterestingForBackgroundTraces(app);
if (!isSilentANR) {
int parentPid = app.pid;
if (parent != null && parent.app != null && parent.app.pid > 0) {
parentPid = parent.app.pid;
}
if (parentPid != app.pid) firstPids.add(parentPid);
if (MY_PID != app.pid && MY_PID != parentPid) firstPids.add(MY_PID);
for (int i = mService.mLruProcesses.size() - 1; i >= 0; i--) {
ProcessRecord r = mService.mLruProcesses.get(i);
if (r != null && r.thread != null) {
int pid = r.pid;
if (pid > 0 && pid != app.pid && pid != parentPid && pid != MY_PID) {
if (r.persistent) {
firstPids.add(pid);
if (DEBUG_ANR) Slog.i(TAG, "Adding persistent proc: " + r);
} else if (r.treatLikeActivity) {
firstPids.add(pid);
if (DEBUG_ANR) Slog.i(TAG, "Adding likely IME: " + r);
} else {
lastPids.put(pid, Boolean.TRUE);
if (DEBUG_ANR) Slog.i(TAG, "Adding ANR proc: " + r);
}
}
}
}
}
}
// 4.将主要的anr信息写到main.log中
StringBuilder info = new StringBuilder();
info.setLength(0);
info.append("ANR in ").append(app.processName);
if (activity != null && activity.shortComponentName != null) {
info.append(" (").append(activity.shortComponentName).append(")");
}
info.append("\n");
info.append("PID: ").append(app.pid).append("\n");
if (annotation != null) {
info.append("Reason: ").append(annotation).append("\n");
}
if (parent != null && parent != activity) {
info.append("Parent: ").append(parent.shortComponentName).append("\n");
}
ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);
ArrayList<Integer> nativePids = null;
// don't dump native PIDs for background ANRs unless it is the process of interest
String[] nativeProc = null;
if (isSilentANR) {
for (int i = 0; i < NATIVE_STACKS_OF_INTEREST.length; i++) {
if (NATIVE_STACKS_OF_INTEREST[i].equals(app.processName)) {
nativeProc = new String[] { app.processName };
break;
}
}
int[] pid = nativeProc == null ? null : Process.getPidsForCommands(nativeProc);
if(pid != null){
nativePids = new ArrayList<Integer>(pid.length);
for (int i : pid) {
nativePids.add(i);
}
}
} else {
nativePids = Watchdog.getInstance().getInterestingNativePids();
}
//5.dump出stacktraces文件
// For background ANRs, don't pass the ProcessCpuTracker to
// avoid spending 1/2 second collecting stats to rank lastPids.
File tracesFile = ActivityManagerService.dumpStackTraces(
true, firstPids,
(isSilentANR) ? null : processCpuTracker,
(isSilentANR) ? null : lastPids,
nativePids);
String cpuInfo = null;
if (ActivityManagerService.MONITOR_CPU_USAGE) {
//6.再次更新cpu使用情况
mService.updateCpuStatsNow();
synchronized (mService.mProcessCpuTracker) {
//7.打印anr时cpu使用状态
cpuInfo = mService.mProcessCpuTracker.printCurrentState(anrTime);
}
info.append(processCpuTracker.printCurrentLoad());
info.append(cpuInfo);
}
info.append(processCpuTracker.printCurrentState(anrTime));
//8.当traces文件不存在时,只能打印线程日志了
if (tracesFile == null) {
// There is no trace file, so dump (only) the alleged culprit's threads to the log
Process.sendSignal(app.pid, Process.SIGNAL_QUIT);
}
...
//9.关键,回到了我们熟悉的addErrorToDropBox,进行错误信息包装跟上传了
mService.addErrorToDropBox("anr", app, app.processName, activity, parent, annotation,
cpuInfo, tracesFile, null);
if (mService.mController != null) {
try {
//10.根据appNotResponding返回结果,看是否继续等待,还是结束当前进程
// 0 == show dialog, 1 = keep waiting, -1 = kill process immediately
int res = mService.mController.appNotResponding(
app.processName, app.pid, info.toString());
if (res != 0) {
if (res < 0 && app.pid != MY_PID) {
app.kill("anr", true);
} else {
synchronized (mService) {
mService.mServices.scheduleServiceTimeoutLocked(app);
}
}
return;
}
} catch (RemoteException e) {
mService.mController = null;
Watchdog.getInstance().setActivityController(null);
}
}
...
}
我们来看一下traces文件是怎么dump出来的:
public static File dumpStackTraces(boolean clearTraces, ArrayList<Integer> firstPids,
ProcessCpuTracker processCpuTracker, SparseArray<Boolean> lastPids,
ArrayList<Integer> nativePids) {
ArrayList<Integer> extraPids = null;
//1.测量CPU的使用情况,以便在请求时对顶级用户进行实际的采样。
if (processCpuTracker != null) {
processCpuTracker.init();
try {
Thread.sleep(200);
} catch (InterruptedException ignored) {
}
processCpuTracker.update();
// 2.爬取顶级应用到的cpu使用情况
final int N = processCpuTracker.countWorkingStats();
extraPids = new ArrayList<>();
for (int i = 0; i < N && extraPids.size() < 5; i++) {
ProcessCpuTracker.Stats stats = processCpuTracker.getWorkingStats(i);
if (lastPids.indexOfKey(stats.pid) >= 0) {
if (DEBUG_ANR) Slog.d(TAG, "Collecting stacks for extra pid " + stats.pid);
extraPids.add(stats.pid);
} else if (DEBUG_ANR) {
Slog.d(TAG, "Skipping next CPU consuming process, not a java proc: "
+ stats.pid);
}
}
}
//3.读取trace文件的保存目录
File tracesFile;
final String tracesDirProp = SystemProperties.get("dalvik.vm.stack-trace-dir", "");
if (tracesDirProp.isEmpty()) {
...
String globalTracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null);
...
} else {
...
}
//4.传入指定目录,进入实际dump逻辑
dumpStackTraces(tracesFile.getAbsolutePath(), firstPids, nativePids, extraPids,
useTombstonedForJavaTraces);
return tracesFile;
}
dumpStackTraces:
private static void dumpStackTraces(String tracesFile, ArrayList<Integer> firstPids,
ArrayList<Integer> nativePids, ArrayList<Integer> extraPids,
boolean useTombstonedForJavaTraces) {
...
final DumpStackFileObserver observer;
if (useTombstonedForJavaTraces) {
observer = null;
} else {
// Use a FileObserver to detect when traces finish writing.
// The order of traces is considered important to maintain for legibility.
observer = new DumpStackFileObserver(tracesFile);
}
//我们必须在20秒内完成所有堆栈转储。
long remainingTime = 20 * 1000;
try {
if (observer != null) {
observer.startWatching();
}
// 首先收集所有最重要的pid堆栈。
if (firstPids != null) {
int num = firstPids.size();
for (int i = 0; i < num; i++) {
if (DEBUG_ANR) Slog.d(TAG, "Collecting stacks for pid "
+ firstPids.get(i));
final long timeTaken;
if (useTombstonedForJavaTraces) {
timeTaken = dumpJavaTracesTombstoned(firstPids.get(i), tracesFile, remainingTime);
} else {
timeTaken = observer.dumpWithTimeout(firstPids.get(i), remainingTime);
}
remainingTime -= timeTaken;
if (remainingTime <= 0) {
Slog.e(TAG, "Aborting stack trace dump (current firstPid=" + firstPids.get(i) +
"); deadline exceeded.");
return;
}
if (DEBUG_ANR) {
Slog.d(TAG, "Done with pid " + firstPids.get(i) + " in " + timeTaken + "ms");
}
}
}
//接下来收集native pid的堆栈
if (nativePids != null) {
for (int pid : nativePids) {
if (DEBUG_ANR) Slog.d(TAG, "Collecting stacks for native pid " + pid);
final long nativeDumpTimeoutMs = Math.min(NATIVE_DUMP_TIMEOUT_MS, remainingTime);
final long start = SystemClock.elapsedRealtime();
Debug.dumpNativeBacktraceToFileTimeout(
pid, tracesFile, (int) (nativeDumpTimeoutMs / 1000));
final long timeTaken = SystemClock.elapsedRealtime() - start;
remainingTime -= timeTaken;
if (remainingTime <= 0) {
Slog.e(TAG, "Aborting stack trace dump (current native pid=" + pid +
"); deadline exceeded.");
return;
}
if (DEBUG_ANR) {
Slog.d(TAG, "Done with native pid " + pid + " in " + timeTaken + "ms");
}
}
}
// 最后,从CPU跟踪器转储所有额外PID的堆栈。
if (extraPids != null) {
for (int pid : extraPids) {
if (DEBUG_ANR) Slog.d(TAG, "Collecting stacks for extra pid " + pid);
final long timeTaken;
if (useTombstonedForJavaTraces) {
timeTaken = dumpJavaTracesTombstoned(pid, tracesFile, remainingTime);
} else {
timeTaken = observer.dumpWithTimeout(pid, remainingTime);
}
remainingTime -= timeTaken;
if (remainingTime <= 0) {
Slog.e(TAG, "Aborting stack trace dump (current extra pid=" + pid +
"); deadline exceeded.");
return;
}
if (DEBUG_ANR) {
Slog.d(TAG, "Done with extra pid " + pid + " in " + timeTaken + "ms");
}
}
}
} finally {
if (observer != null) {
observer.stopWatching();
}
}
}
看完之后,应该可以很清楚地的明白。ANR的流程就是打印一些 ANR reason、cpu stats、线程日志,然后分别写入main.log、event.log,然后调用到addErrorToDropBox中,最后kill该进程。
2.6、wtf
同样在RuntimeInit中,存在一个wtf的方法,该方法主要由Log.wtf调用,主要用报告一个永远不可能发生的情况。毕竟wtf=what a terrible failure,都开始爆粗了,这个问题可不得了。
public static void wtf(String tag, Throwable t, boolean system) {
try {
if (ActivityManager.getService().handleApplicationWtf(
mApplicationObject, tag, system,
new ApplicationErrorReport.ParcelableCrashInfo(t))) {
// The Activity Manager has already written us off -- now exit.
Process.killProcess(Process.myPid());
System.exit(10);
}
} catch (Throwable t2) {
if (t2 instanceof DeadObjectException) {
// System process is dead; ignore
} else {
Slog.e(TAG, "Error reporting WTF", t2);
Slog.e(TAG, "Original WTF:", t);
}
}
}
此处又转到AMS中处理:
public boolean handleApplicationWtf(final IBinder app, final String tag, boolean system,
final ApplicationErrorReport.ParcelableCrashInfo crashInfo) {
...
if (system) {
// If this is coming from the system, we could very well have low-level
// system locks held, so we want to do this all asynchronously. And we
// never want this to become fatal, so there is that too.
mHandler.post(new Runnable() {
@Override public void run() {
handleApplicationWtfInner(callingUid, callingPid, app, tag, crashInfo);
}
});
return false;
}
//无论system是true或是false,都会调用handleApplicationWtfInner
final ProcessRecord r = handleApplicationWtfInner(callingUid, callingPid, app, tag,
crashInfo);
final boolean isFatal = Build.IS_ENG || Settings.Global
.getInt(mContext.getContentResolver(), Settings.Global.WTF_IS_FATAL, 0) != 0;
final boolean isSystem = (r == null) || r.persistent;
if (isFatal && !isSystem) {
//将再进入普通应用crash的流程
mAppErrors.crashApplication(r, crashInfo);
return true;
} else {
return false;
}
}
handleApplicationWtfInner:
ProcessRecord handleApplicationWtfInner(int callingUid, int callingPid, IBinder app, String tag,
final ApplicationErrorReport.CrashInfo crashInfo) {
final ProcessRecord r = findAppProcess(app, "WTF");
final String processName = app == null ? "system_server"
: (r == null ? "unknown" : r.processName);
EventLog.writeEvent(EventLogTags.AM_WTF, UserHandle.getUserId(callingUid), callingPid,
processName, r == null ? -1 : r.info.flags, tag, crashInfo.exceptionMessage);
StatsLog.write(StatsLog.WTF_OCCURRED, callingUid, tag, processName,
callingPid);
//关键,最终走到了我们的addErrorToDropBox
addErrorToDropBox("wtf", r, processName, null, null, tag, null, null, crashInfo);
return r;
}
2.7、殊途同归的addErrorToDropBox
为什么说addErrorToDropBox是殊途同归呢,因为无论是crash、native_crash、ANR或是wtf,最终都是来到这里,交由它去处理。那下面我们就来揭开它的神秘面纱吧。
public void addErrorToDropBox(String eventType,
ProcessRecord process, String processName, ActivityRecord activity,
ActivityRecord parent, String subject,
final String report, final File dataFile,
final ApplicationErrorReport.CrashInfo crashInfo) {
// NOTE -- this must never acquire the ActivityManagerService lock,
// otherwise the watchdog may be prevented from resetting the system.
// Bail early if not published yet
if (ServiceManager.getService(Context.DROPBOX_SERVICE) == null) return;
final DropBoxManager dbox = mContext.getSystemService(DropBoxManager.class);
//只有这几种类型的错误,才会进行上传
final boolean shouldReport = ("anr".equals(eventType)
|| "crash".equals(eventType)
|| "native_crash".equals(eventType)
|| "watchdog".equals(eventType));
// Exit early if the dropbox isn't configured to accept this report type.
final String dropboxTag = processClass(process) + "_" + eventType;
//1.如果DropBoxManager没有初始化,或不是要上传的类型,则直接返回
if (dbox == null || !dbox.isTagEnabled(dropboxTag)&& !shouldReport)
return;
...
final StringBuilder sb = new StringBuilder(1024);
//2.添加一些头部log信息
appendDropBoxProcessHeaders(process, processName, sb);
//3.添加崩溃进程和界面的信息
try {
if (process != null) {
//添加是否前台前程log
sb.append("Foreground: ")
.append(process.isInterestingToUserLocked() ? "Yes" : "No")
.append("\n");
}
//触发该崩溃的界面,可以为null
if (activity != null) {
sb.append("Activity: ").append(activity.shortComponentName).append("\n");
}
if (parent != null && parent.app != null && parent.app.pid != process.pid) {
sb.append("Parent-Process: ").append(parent.app.processName).append("\n");
}
if (parent != null && parent != activity) {
sb.append("Parent-Activity: ").append(parent.shortComponentName).append("\n");
}
//定入简要信息
if (subject != null) {
sb.append("Subject: ").append(subject).append("\n");
}
sb.append("Build: ").append(Build.FINGERPRINT).append("\n");
//是否连接了调试
if (Debug.isDebuggerConnected()) {
sb.append("Debugger: Connected\n");
}
} catch (NullPointerException e) {
e.printStackTrace();
} finally {
sb.append("\n");
}
final String fProcessName = processName;
final String fEventType = eventType;
final String packageName = getErrorReportPackageName(process, crashInfo, eventType);
Slog.i(TAG,"addErrorToDropbox, real report package is "+packageName);
// Do the rest in a worker thread to avoid blocking the caller on I/O
// (After this point, we shouldn't access AMS internal data structures.)
Thread worker = new Thread("Error dump: " + dropboxTag) {
@Override
public void run() {
//4.添加进程的状态到dropbox中
BufferedReader bufferedReader = null;
String line;
try {
bufferedReader = new BufferedReader(new FileReader("/proc/" + pid + "/status"));
for (int i = 0; i < 5; i++) {
if ((line = bufferedReader.readLine()) != null && line.contains("State")) {
sb.append(line + "\n");
break;
}
}
} catch (IOException e) {
e.printStackTrace();
} finally {
if (bufferedReader != null) {
try {
bufferedReader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
if (report != null) {
sb.append(report);
}
String setting = Settings.Global.ERROR_LOGCAT_PREFIX + dropboxTag;
int lines = Settings.Global.getInt(mContext.getContentResolver(), setting, 0);
int maxDataFileSize = DROPBOX_MAX_SIZE - sb.length()
- lines * RESERVED_BYTES_PER_LOGCAT_LINE;
//5.将dataFile文件定入dropbox中,一般只有anr时,会将traces文件通过该参数传递进来者,其他类型都不传.
if (dataFile != null && maxDataFileSize > 0) {
try {
sb.append(FileUtils.readTextFile(dataFile, maxDataFileSize,
"\n\n[[TRUNCATED]]"));
} catch (IOException e) {
Slog.e(TAG, "Error reading " + dataFile, e);
}
}
//6.如果是crash类型,会传入crashInfo,此时将其写入dropbox中
if (crashInfo != null && crashInfo.stackTrace != null) {
sb.append(crashInfo.stackTrace);
}
if (lines > 0) {
sb.append("\n");
// 7.合并几个logcat流,取最新部分log
InputStreamReader input = null;
try {
java.lang.Process logcat = new ProcessBuilder(
"/system/bin/timeout", "-k", "15s", "10s",
"/system/bin/logcat", "-v", "threadtime", "-b", "events", "-b", "system",
"-b", "main", "-b", "crash", "-t", String.valueOf(lines))
.redirectErrorStream(true).start();
try { logcat.getOutputStream().close(); } catch (IOException e) {}
try { logcat.getErrorStream().close(); } catch (IOException e) {}
input = new InputStreamReader(logcat.getInputStream());
int num;
char[] buf = new char[8192];
while ((num = input.read(buf)) > 0) sb.append(buf, 0, num);
} catch (IOException e) {
Slog.e(TAG, "Error running logcat", e);
} finally {
if (input != null) try { input.close(); } catch (IOException e) {}
}
}
...
if (shouldReport) {
synchronized (mErrorListenerLock) {
try {
if (mIApplicationErrorListener == null) {
return;
}
//8.关键,在这里可以添加一个application error的接口,用来实现应用层接收崩溃信息
mIApplicationErrorListener.onError(fEventType,
packageName, fProcessName, subject, dropboxTag
+ "-" + uuid, crashInfo);
} catch (DeadObjectException e) {
Slog.i(TAG, "ApplicationErrorListener.onError() E :" + e, e);
mIApplicationErrorListener = null;
} catch (Exception e) {
Slog.i(TAG, "ApplicationErrorListener.onError() E :" + e, e);
}
}
}
}
};
...
}
调用appendDropBoxProcessHeaders添加头部log信息:
private void appendDropBoxProcessHeaders(ProcessRecord process, String processName,
StringBuilder sb) {
// Watchdog thread ends up invoking this function (with
// a null ProcessRecord) to add the stack file to dropbox.
// Do not acquire a lock on this (am) in such cases, as it
// could cause a potential deadlock, if and when watchdog
// is invoked due to unavailability of lock on am and it
// would prevent watchdog from killing system_server.
if (process == null) {
sb.append("Process: ").append(processName).append("\n");
return;
}
// Note: ProcessRecord 'process' is guarded by the service
// instance. (notably process.pkgList, which could otherwise change
// concurrently during execution of this method)
synchronized (this) {
sb.append("Process: ").append(processName).append("\n");
sb.append("PID: ").append(process.pid).append("\n");
int flags = process.info.flags;
IPackageManager pm = AppGlobals.getPackageManager();
//添加该进程的flag
sb.append("Flags: 0x").append(Integer.toHexString(flags)).append("\n");
for (int ip=0; ip<process.pkgList.size(); ip++) {
String pkg = process.pkgList.keyAt(ip);
sb.append("Package: ").append(pkg);
try {
PackageInfo pi = pm.getPackageInfo(pkg, 0, UserHandle.getCallingUserId());
if (pi != null) {
sb.append(" v").append(pi.getLongVersionCode());
if (pi.versionName != null) {
sb.append(" (").append(pi.versionName).append(")");
}
}
} catch (RemoteException e) {
Slog.e(TAG, "Error getting package info: " + pkg, e);
}
sb.append("\n");
}
//如果是执行安装的app,会在log中添加此项
if (process.info.isInstantApp()) {
sb.append("Instant-App: true\n");
}
}
}
2.8、接收来自框架层的异常监控信息
这里,就可以回答我们开篇讲的第二个问题,如何去接收框架层监控到的应用或系统崩溃事件?
可以在框架提供了一个IApplicationErrorListener的接口,可通过设置该接口,接收系统框架捕获到的应用到崩溃信息。
IApplicationErrorListener:
public interface IApplicationErrorListener extends android.os.IInterface {
/** Local-side IPC implementation stub class. */
public static abstract class Stub extends android.os.Binder implements android.app.IApplicationErrorListener {
private static final java.lang.String DESCRIPTOR = "android.app.IApplicationErrorListener";
/** Construct the stub at attach it to the interface. */
public Stub() {
this.attachInterface(this, DESCRIPTOR);
}
/**
* Cast an IBinder object into an android.app.IApplicationErrorListener
* interface, generating a proxy if needed.
*/
public static android.app.IApplicationErrorListener asInterface(android.os.IBinder obj) {
if ((obj == null)) {
return null;
}
android.os.IInterface iin = obj.queryLocalInterface(DESCRIPTOR);
if (((iin != null) && (iin instanceof android.app.IApplicationErrorListener))) {
return ((android.app.IApplicationErrorListener) iin);
}
return new android.app.IApplicationErrorListener.Stub.Proxy(obj);
}
@Override
public android.os.IBinder asBinder() {
return this;
}
@Override
public boolean onTransact(int code, android.os.Parcel data, android.os.Parcel reply, int flags)
throws android.os.RemoteException {
switch (code) {
case INTERFACE_TRANSACTION: {
reply.writeString(DESCRIPTOR);
return true;
}
case TRANSACTION_onError: {
data.enforceInterface(DESCRIPTOR);
String errorType = data.readString();
String packageName = data.readString();
String processName = data.readString();
String subject = data.readString();
String dump = data.readString();
ApplicationErrorReport.CrashInfo crashInfo = new ApplicationErrorReport.CrashInfo(data);
this.onError(errorType, packageName, processName, subject, dump, crashInfo);
reply.writeNoException();
return true;
}
}
return super.onTransact(code, data, reply, flags);
}
private static class Proxy implements android.app.IApplicationErrorListener {
private android.os.IBinder mRemote;
Proxy(android.os.IBinder remote) {
mRemote = remote;
}
@Override
public android.os.IBinder asBinder() {
return mRemote;
}
public java.lang.String getInterfaceDescriptor() {
return DESCRIPTOR;
}
@Override
public void onError(String errorType, String packageName, String processName, String subject,
String dump, CrashInfo crashInfo) throws android.os.RemoteException {
android.os.Parcel _data = android.os.Parcel.obtain();
android.os.Parcel _reply = android.os.Parcel.obtain();
try {
_data.writeInterfaceToken(DESCRIPTOR);
_data.writeString(errorType);
_data.writeString(packageName);
_data.writeString(processName);
_data.writeString(subject);
_data.writeString(dump);
if (null == crashInfo) {
crashInfo = new CrashInfo();
}
crashInfo.writeToParcel(_data, 0);
mRemote.transact(Stub.TRANSACTION_onError, _data, _reply, 0);
_reply.readException();
} finally {
_reply.recycle();
_data.recycle();
}
}
}
static final int TRANSACTION_onError = (android.os.IBinder.FIRST_CALL_TRANSACTION + 0);
}
public void onError(String errorType, String packageName, String processName, String subject,
String dump, ApplicationErrorReport.CrashInfo crashInfo) throws android.os.RemoteException;
}
该接口的关键方法就是:
public void onError(String errorType, String packageName, String processName, String subject,
String dump, ApplicationErrorReport.CrashInfo crashInfo) throws android.os.RemoteException;
最终的错误信息,都会通过该方法进行传递。那只要我们的应用拥有系统权限,就可以往系统添加该回调了。