一、内容预览
二、概述
前面进程系列已经更新了两篇,本文(基于Android O源码)主要讲解SystemServer进程创建流程上半部分,下半部梳理一下SytemServer进程创建之后的启动阶段以及运行的核心服务。
Android进程系列第一篇---进程基础
Android进程系列第二篇---Zygote进程的创建流程
简要回顾上一篇的重点的内容
- Zygote进程实质是一种C/S架构,Zygote进程作为Server端,处理四面八方的客户端通过Socket发送来的创建进程的请求;
- 总结了Socket通信的框架,Init进程add了socket的fd,Zygote进程get到这个fd,创建了LocalServerSocket;
- 总结了Zygote进程做为所有应用进程的原因是什么;
- 总结Zygote进程如何进行资源的预加载,以及Zygote进程为什么不能在子线程中加载进程的资源
本篇文章主要写SystemServer进程的创建,SystemServer进程是Zygote进程的大弟子,是Zygote进程fork的第一个进程,Zygote和SystemServer这两个进程顶起了Java世界的半边天,任何一个进程的死亡,都会导致Java世界的崩溃。通常我们大多数死机重启问题也是发生在了SystemServer进程中。SystemServer进程运行了几十种核心服务,为了防止应用进程对系统造成破坏,应用进程没有权限访问系统的资源,只能通过SystemServer进程的代理来访问,从这几点可见SystemServer进程相当重要。
三、SystemServer的创建流程
3.1、ZygoteInit的main方法
上图是SystemServer的创建序列图,我们仍然从ZygoteInit的main方法开始说起,再次亮出下面的“模板”代码。
frameworks/base/core/java/com/android/internal/os/ZygoteInit.java
public static void main(String argv[]) {
//1、创建ZygoteServer
ZygoteServer zygoteServer = new ZygoteServer();
try {
//2、创建一个Server端的Socket
zygoteServer.registerServerSocket(socketName);
//3、加载进程的资源和类
preload(bootTimingsTraceLog);
if (startSystemServer) {
//4、开启SystemServer进程,这是受精卵进程的第一次分裂
startSystemServer(abiList, socketName, zygoteServer);
}
//5、启动一个死循环监听来自Client端的消息
zygoteServer.runSelectLoop(abiList);
//6、关闭SystemServer的Socket
zygoteServer.closeServerSocket();
} catch (Zygote.MethodAndArgsCaller caller) {
//7、这里捕获这个异常调用MethodAndArgsCaller的run方法。
caller.run();
} catch (Throwable ex) {
Log.e(TAG, "System zygote died with exception", ex);
zygoteServer.closeServerSocket();
throw ex;
}
}
ZygoteInit的main方法有7个关键点,1,2,3小点我们在上一篇已经进行了梳理,现在从第四点开始分析。
590 /**
591 * Prepare the arguments and fork for the system server process.
592 */
593 private static boolean startSystemServer(String abiList, String socketName, ZygoteServer zygoteServer)
594 throws Zygote.MethodAndArgsCaller, RuntimeException {
.........
613 /* Hardcoded command line to start the system server */
614 String args[] = {
615 "--setuid=1000",
616 "--setgid=1000",
617 "--setgroups=1001,1002,1003,1004,1005,1006,1007,1008,1009,1010,1018,1021,1023,1032,3001,3002,3003,3006,3007,3009,3010",
618 "--capabilities=" + capabilities + "," + capabilities,
619 "--nice-name=system_server",
620 "--runtime-args",
621 "com.android.server.SystemServer",
622 };
623 ZygoteConnection.Arguments parsedArgs = null;
624
625 int pid;
626
627 try {
628 parsedArgs = new ZygoteConnection.Arguments(args);
629 ZygoteConnection.applyDebuggerSystemProperty(parsedArgs);
630 ZygoteConnection.applyInvokeWithSystemProperty(parsedArgs);
631
632 //创建System进程,底层调用fork函数,见3.2小节
633 pid = Zygote.forkSystemServer(
634 parsedArgs.uid, parsedArgs.gid,
635 parsedArgs.gids,
636 parsedArgs.debugFlags,
637 null,
638 parsedArgs.permittedCapabilities,
639 parsedArgs.effectiveCapabilities);
640 } catch (IllegalArgumentException ex) {
641 throw new RuntimeException(ex);
642 }
643
644 //fork函数会返回两次,pid==0意味着子进程创建成功
645 if (pid == 0) {
//如果机器支持32位应用,需要等待32位的Zygote连接成功
646 if (hasSecondZygote(abiList)) {
647 waitForSecondaryZygote(socketName);
648 }
649 //关闭从Zygote进程继承来的Socket
650 zygoteServer.closeServerSocket();
//处理SytemServer进程接下来的事情,见3.4小节
651 handleSystemServerProcess(parsedArgs);
652 }
653
654 return true;
655 }
656
- 1、将数组args转换成 ZygoteConnection.Arguments的形式,实质就是给 ZygoteConnection.Arguments中成员变量赋值,那么这些参数是什么意思呢?
614 String args[] = {
615 "--setuid=1000",
616 "--setgid=1000",
617 "--setgroups=1001,1002,1003,1004,1005,1006,1007,1008,1009,1010,1018,1021,1023,1032,3001,3002,3003,3006,3007,3009,3010",
618 "--capabilities=" + capabilities + "," + capabilities,
619 "--nice-name=system_server",
620 "--runtime-args",
621 "com.android.server.SystemServer",
622 };
SystemServer进程的pid和gid都设置成1000,setgroups指定进程所属组,capabilities可设定进程的权限,nice-names是进程的名称,执行类是com.android.server.SystemServer。
- 2、调用forkSystemServer fork出系统进程,实质还是调用C层的fork函数(基于写时复制机制),如果返回的pid=0,代表成功fork出System进程。
- 3 、当Zygote复制出新的进程时,由于复制出的新进程与Zygote进程共享内存空间,而在Zygote进程中创建的服务端Socket是新进程不需要的,所以新创建的进程需调用 zygoteServer.closeServerSocket()方法关闭该Socket服务端。
3.2、Zygote的forkSystemServer方法
/frameworks/base/core/java/com/android/internal/os/Zygote.java
146 public static int forkSystemServer(int uid, int gid, int[] gids, int debugFlags,
147 int[][] rlimits, long permittedCapabilities, long effectiveCapabilities) {
148 VM_HOOKS.preFork();
149 // Resets nice priority for zygote process.
150 resetNicePriority();
151 int pid = nativeForkSystemServer(
152 uid, gid, gids, debugFlags, rlimits, permittedCapabilities, effectiveCapabilities);
153 // Enable tracing as soon as we enter the system_server.
154 if (pid == 0) {
155 Trace.setTracingEnabled(true);
156 }
157 VM_HOOKS.postForkCommon();
158 return pid;
159 }
nativeForkSystemServer是一个JNI方法,是在AndroidRuntime.cpp中注册的,调用com_android_internal_os_Zygote.cpp中的register_com_android_internal_os_Zygote()方法建立native方法的映射关系。
/frameworks/base/core/jni/com_android_internal_os_Zygote.cpp
728static jint com_android_internal_os_Zygote_nativeForkSystemServer(
729 JNIEnv* env, jclass, uid_t uid, gid_t gid, jintArray gids,
730 jint debug_flags, jobjectArray rlimits, jlong permittedCapabilities,
731 jlong effectiveCapabilities) {
732 pid_t pid = ForkAndSpecializeCommon(env, uid, gid, gids,
733 debug_flags, rlimits,
734 permittedCapabilities, effectiveCapabilities,
735 MOUNT_EXTERNAL_DEFAULT, NULL, NULL, true, NULL,
736 NULL, NULL, NULL);
737 if (pid > 0) {
738 // The zygote process checks whether the child process has died or not.
739 ALOGI("System server process %d has been created", pid);
740 gSystemServerPid = pid;
741 // There is a slight window that the system server process has crashed
742 // but it went unnoticed because we haven't published its pid yet. So
743 // we recheck here just to make sure that all is well.
744 int status;
745 if (waitpid(pid, &status, WNOHANG) == pid) {
746 ALOGE("System server process %d has died. Restarting Zygote!", pid);
747 RuntimeAbort(env, __LINE__, "System server process has died. Restarting Zygote!");
748 }
749 }
750 return pid;
751}
这里需要解释一下waitpid函数
如果在调用waitpid()函数时,当指定等待的子进程已经停止运行或结束了,则waitpid()会立即返回;但是如果子进程还没有停止运行或结束,则调用waitpid()函数的父进程则会被阻塞,暂停运行。
status这个参数将保存子进程的状态信息,有了这个信息父进程就可以了解子进程为什么会退出,是正常退出还是出了什么错误。如果status不是空指针,则状态信息将被写入。
waitpid()函数第三个参数有两个选项,一是WNOHANG,如果pid指定的子进程没有结束,则waitpid()函数立即返回0,而不是阻塞在这个函数上等待;如果结束了,则返回该子进程的进程号。二是WUNTRACED,如果子进程进入暂停状态,则马上返回。
所以(waitpid(pid, &status, WNOHANG) == pid成立的时候,这意味着SytemServer进程died了,需要重启Zygote进程。继续看ForkAndSpecializeCommon函数。
474// Utility routine to fork zygote and specialize the child process.
475static pid_t ForkAndSpecializeCommon(JNIEnv* env, uid_t uid, gid_t gid, jintArray javaGids,
476 jint debug_flags, jobjectArray javaRlimits,
477 jlong permittedCapabilities, jlong effectiveCapabilities,
478 jint mount_external,
479 jstring java_se_info, jstring java_se_name,
480 bool is_system_server, jintArray fdsToClose,
481 jintArray fdsToIgnore,
482 jstring instructionSet, jstring dataDir) {
//设置子进程的signal信号处理函数,见3.3小节
483 SetSigChldHandler();
516 ......
//fork子进程
517 pid_t pid = fork();
518
519 if (pid == 0) {
520 // The child process.
......
576 if (!is_system_server) {
577 int rc = createProcessGroup(uid, getpid());
578 if (rc != 0) {
579 if (rc == -EROFS) {
580 ALOGW("createProcessGroup failed, kernel missing CONFIG_CGROUP_CPUACCT?");
581 } else {
582 ALOGE("createProcessGroup(%d, %d) failed: %s", uid, pid, strerror(-rc));
583 }
584 }
585 }
586
587 SetGids(env, javaGids);//设置设置group
588
589 SetRLimits(env, javaRlimits);//设置资源limit
590
597 int rc = setresgid(gid, gid, gid);
598 if (rc == -1) {
599 ALOGE("setresgid(%d) failed: %s", gid, strerror(errno));
600 RuntimeAbort(env, __LINE__, "setresgid failed");
601 }
602
603 rc = setresuid(uid, uid, uid);//设置uid
.......
617
618 SetCapabilities(env, permittedCapabilities, effectiveCapabilities, permittedCapabilities);
619
620 SetSchedulerPolicy(env);//设置调度策略
621
.......
//创建selinux上下文
640 rc = selinux_android_setcontext(uid, is_system_server, se_info_c_str, se_name_c_str);
.......
666 } else if (pid > 0) {
.......
673 }
674 }
675 return pid;
676}
677} // anonymous namespace
678
值得注意的是在fork之前,调用了SetSigChldHandler,SetSigChldHandler定义了信号处理函数SigChldHandler,当信号SIGCHLD到来的时候,会进入3.3中的信号处理函数。
3.3、SystemServer与Zygote共存亡
141// Configures the SIGCHLD handler for the zygote process. This is configured
142// very late, because earlier in the runtime we may fork() and exec()
143// other processes, and we want to waitpid() for those rather than
144// have them be harvested immediately.
145//
146// This ends up being called repeatedly before each fork(), but there's
147// no real harm in that.
148static void SetSigChldHandler() {
149 struct sigaction sa;
150 memset(&sa, 0, sizeof(sa));
151 sa.sa_handler = SigChldHandler;
152
153 int err = sigaction(SIGCHLD, &sa, NULL);
154 if (err < 0) {
155 ALOGW("Error setting SIGCHLD handler: %s", strerror(errno));
156 }
157}
89// This signal handler is for zygote mode, since the zygote must reap its children
90static void SigChldHandler(int /*signal_number*/) {
91 pid_t pid;
92 int status;
93
94 // It's necessary to save and restore the errno during this function.
95 // Since errno is stored per thread, changing it here modifies the errno
96 // on the thread on which this signal handler executes. If a signal occurs
97 // between a call and an errno check, it's possible to get the errno set
98 // here.
99 // See b/23572286 for extra information.
100 int saved_errno = errno;
101
102 while ((pid = waitpid(-1, &status, WNOHANG)) > 0) {
103 // Log process-death status that we care about. In general it is
104 // not safe to call LOG(...) from a signal handler because of
105 // possible reentrancy. However, we know a priori that the
106 // current implementation of LOG() is safe to call from a SIGCHLD
107 // handler in the zygote process. If the LOG() implementation
108 // changes its locking strategy or its use of syscalls within the
109 // lazy-init critical section, its use here may become unsafe.
110 if (WIFEXITED(status)) {
111 if (WEXITSTATUS(status)) {
112 ALOGI("Process %d exited cleanly (%d)", pid, WEXITSTATUS(status));
113 }
114 } else if (WIFSIGNALED(status)) {
115 if (WTERMSIG(status) != SIGKILL) {
116 ALOGI("Process %d exited due to signal (%d)", pid, WTERMSIG(status));
117 }
118 if (WCOREDUMP(status)) {
119 ALOGI("Process %d dumped core.", pid);
120 }
121 }
122
123 // If the just-crashed process is the system_server, bring down zygote
124 // so that it is restarted by init and system server will be restarted
125 // from there.
126 if (pid == gSystemServerPid) {
127 ALOGE("Exit zygote because system server (%d) has terminated", pid);
128 kill(getpid(), SIGKILL);
129 }
130 }
131
132 // Note that we shouldn't consider ECHILD an error because
133 // the secondary zygote might have no children left to wait for.
134 if (pid < 0 && errno != ECHILD) {
135 ALOGW("Zygote SIGCHLD error in waitpid: %s", strerror(errno));
136 }
137
138 errno = saved_errno;
139}
system_server进程是zygote的大弟子,是zygote进程fork的第一个进程,zygote和system_server这两个进程可以说是Java世界的半边天,任何一个进程的死亡,都会导致Java世界的崩溃。所以如果子进程SystemServer挂了,Zygote就会自杀,导致Zygote重启。也是Zygote和SystemServer是共存亡的。
3.4、handleSystemServerProcess方法处理fork的新进程
/frameworks/base/core/java/com/android/internal/os/ZygoteInit.java
446 /**
447 * Finish remaining work for the newly forked system server process.
448 */
449 private static void handleSystemServerProcess(
450 ZygoteConnection.Arguments parsedArgs)
451 throws Zygote.MethodAndArgsCaller {
452
453 // set umask to 0077 so new files and directories will default to owner-only permissions.
454 Os.umask(S_IRWXG | S_IRWXO);
455 //设置新进程的名字
456 if (parsedArgs.niceName != null) {
457 Process.setArgV0(parsedArgs.niceName);
458 }
459 //获取systemServerClasspath
460 final String systemServerClasspath = Os.getenv("SYSTEMSERVERCLASSPATH");
461 if (systemServerClasspath != null) {
//优化systemServerClasspath路径之下的dex文件,看延伸阅读
462 performSystemServerDexOpt(systemServerClasspath);
463 // Capturing profiles is only supported for debug or eng builds since selinux normally
464 // prevents it.
465 boolean profileSystemServer = SystemProperties.getBoolean(
466 "dalvik.vm.profilesystemserver", false);
467 if (profileSystemServer && (Build.IS_USERDEBUG || Build.IS_ENG)) {
468 try {
469 File profileDir = Environment.getDataProfilesDePackageDirectory(
470 Process.SYSTEM_UID, "system_server");
471 File profile = new File(profileDir, "primary.prof");
472 profile.getParentFile().mkdirs();
473 profile.createNewFile();
474 String[] codePaths = systemServerClasspath.split(":");
475 VMRuntime.registerAppInfo(profile.getPath(), codePaths);
476 } catch (Exception e) {
477 Log.wtf(TAG, "Failed to set up system server profile", e);
478 }
479 }
480 }
481 //此处是空,所以是eles分之
482 if (parsedArgs.invokeWith != null) {
483 String[] args = parsedArgs.remainingArgs;
484 // If we have a non-null system server class path, we'll have to duplicate the
485 // existing arguments and append the classpath to it. ART will handle the classpath
486 // correctly when we exec a new process.
487 if (systemServerClasspath != null) {
488 String[] amendedArgs = new String[args.length + 2];
489 amendedArgs[0] = "-cp";
490 amendedArgs[1] = systemServerClasspath;
491 System.arraycopy(args, 0, amendedArgs, 2, args.length);
492 args = amendedArgs;
493 }
494
495 WrapperInit.execApplication(parsedArgs.invokeWith,
496 parsedArgs.niceName, parsedArgs.targetSdkVersion,
497 VMRuntime.getCurrentInstructionSet(), null, args);
498 } else {
499 ClassLoader cl = null;
500 if (systemServerClasspath != null) {
501 cl = createPathClassLoader(systemServerClasspath, parsedArgs.targetSdkVersion);
502
503 Thread.currentThread().setContextClassLoader(cl);
504 }
505
506 /*
507 * Pass the remaining arguments to SystemServer.见3.5小节
508 */
509 ZygoteInit.zygoteInit(parsedArgs.targetSdkVersion, parsedArgs.remainingArgs, cl);
510 }
511
512 /* should never reach here */
513 }
延伸阅读:
在Android系统中,一个App的所有代码都在一个Dex文件里面。Dex是一个类似Jar的存储了多有Java编译字节码的归档文件。因为Android系统使用Dalvik虚拟机,所以需要把使用Java Compiler编译之后的class文件转换成Dalvik能够执行的class文件。这里需要强调的是,Dex和Jar一样是一个归档文件,里面仍然是Java代码对应的字节码文件。当Android系统启动一个应用的时候,有一步是对Dex进行优化,这个过程有一个专门的工具来处理,叫DexOpt。DexOpt的执行过程是在第一次加载Dex文件的时候执行的。这个过程会生成一个ODEX文件,即Optimised Dex。执行ODex的效率会比直接执行Dex文件的效率要高很多。但是在早期的Android系统中,DexOpt有一个问题,DexOpt会把每一个类的方法id检索起来,存在一个链表结构里面。但是这个链表的长度是用一个short类型来保存的,导致了方法id的数目不能够超过65536个。当一个项目足够大的时候,显然这个方法数的上限是不够的。尽管在新版本的Android系统中,DexOpt修复了这个问题,但是我们仍然需要对老系统做兼容。
Android提供了一个专门验证与优化dex文件的工具dexopt。其源码位于Android系统源码的dalvik/dexopt目录下classPath中的内容如下
systemServerClasspath = /system/framework/services.jar:/system/framework/ethernet-service.jar:/system/framework/wifi-service.jar
之后会将这三个jar从路径中获取出来,判断是否要进行dexopt优化. 如果需要就调用installer进行优化。
3.5、zygoteInit方法
/frameworks/base/core/java/com/android/internal/os/ZygoteInit.java
816 /**
817 * The main function called when started through the zygote process. This
818 * could be unified with main(), if the native code in nativeFinishInit()
819 * were rationalized with Zygote startup.<p>
820 *
821 * Current recognized args:
822 * <ul>
823 * <li> <code> [--] <start class name> <args>
824 * </ul>
825 *
826 * @param targetSdkVersion target SDK version
827 * @param argv arg strings
828 */
829 public static final void zygoteInit(int targetSdkVersion, String[] argv,
830 ClassLoader classLoader) throws Zygote.MethodAndArgsCaller {
831 if (RuntimeInit.DEBUG) {
832 Slog.d(RuntimeInit.TAG, "RuntimeInit: Starting application from zygote");
833 }
834
835 Trace.traceBegin(Trace.TRACE_TAG_ACTIVITY_MANAGER, "ZygoteInit");
//见3.5.1
836 RuntimeInit.redirectLogStreams();
837 //见3.5.2
838 RuntimeInit.commonInit();
//见3.5.3
839 ZygoteInit.nativeZygoteInit();
//见3.5.4
840 RuntimeInit.applicationInit(targetSdkVersion, argv, classLoader);
841 }
842
3.5.1、RuntimeInit的redirectLogStreams方法
/frameworks/base/core/java/com/android/internal/os/RuntimeInit.java
319 /**
320 * Redirect System.out and System.err to the Android log.
321 */
322 public static void redirectLogStreams() {
323 System.out.close();
324 System.setOut(new AndroidPrintStream(Log.INFO, "System.out"));
325 System.err.close();
326 System.setErr(new AndroidPrintStream(Log.WARN, "System.err"));
327 }
初始化Android LOG输出流, 并且将system.out, system.err关闭, 将两者重新定向到Android log中 。
3.5.2、RuntimeInit的commonInit方法
/frameworks/base/core/java/com/android/internal/os/RuntimeInit.java
135 protected static final void commonInit() {
136 if (DEBUG) Slog.d(TAG, "Entered RuntimeInit!");
137
138 /*
139 * set handlers; these apply to all threads in the VM. Apps can replace
140 * the default handler, but not the pre handler.
141 */
//设置进程的uncaught exception的处理方法,默认是设置LoggingHandler,输出函数的出错堆栈。见3.5.2.1
142 Thread.setUncaughtExceptionPreHandler(new LoggingHandler());
//进入异常崩溃的处理流程,通知AMS弹窗,见3.5.2.2
143 Thread.setDefaultUncaughtExceptionHandler(new KillApplicationHandler());
144
145 /*
146 * Install a TimezoneGetter subclass for ZoneInfo.db,设置时区
147 */
148 TimezoneGetter.setInstance(new TimezoneGetter() {
149 @Override
150 public String getId() {
151 return SystemProperties.get("persist.sys.timezone");
152 }
153 });
154 TimeZone.setDefault(null);
155
156 /*
157 * Sets handler for java.util.logging to use Android log facilities.
158 * The odd "new instance-and-then-throw-away" is a mirror of how
159 * the "java.util.logging.config.class" system property works. We
160 * can't use the system property here since the logger has almost
161 * certainly already been initialized.
162 */
163 LogManager.getLogManager().reset();
164 new AndroidConfig();
165
166 /*
167 * Sets the default HTTP User-Agent used by HttpURLConnection.
168 */
169 String userAgent = getDefaultUserAgent();
170 System.setProperty("http.agent", userAgent);
171
172 /*
173 * Wire socket tagging to traffic stats.
174 */
175 NetworkManagementSocketTagger.install();
176
177 /*
178 * If we're running in an emulator launched with "-trace", put the
179 * VM into emulator trace profiling mode so that the user can hit
180 * F9/F10 at any time to capture traces. This has performance
181 * consequences, so it's not something you want to do always.
182 */
183 String trace = SystemProperties.get("ro.kernel.android.tracing");
184 if (trace.equals("1")) {
185 Slog.i(TAG, "NOTE: emulator trace profiling enabled");
186 Debug.enableEmulatorTraceOutput();
187 }
188
189 initialized = true;
190 }
3.5.2.1、 设置进程出错堆栈的捕获方式。
/frameworks/base/core/java/com/android/internal/os/RuntimeInit.java
63 /**
64 * Logs a message when a thread encounters an uncaught exception. By
65 * default, {@link KillApplicationHandler} will terminate this process later,
66 * but apps can override that behavior.
67 */
68 private static class LoggingHandler implements Thread.UncaughtExceptionHandler {
69 @Override
70 public void uncaughtException(Thread t, Throwable e) {
71 // Don't re-enter if KillApplicationHandler has already run
72 if (mCrashing) return;
73 if (mApplicationObject == null) {
74 // The "FATAL EXCEPTION" string is still used on Android even though
75 // apps can set a custom UncaughtExceptionHandler that renders uncaught
76 // exceptions non-fatal.
77 Clog_e(TAG, "*** FATAL EXCEPTION IN SYSTEM PROCESS: " + t.getName(), e);
78 } else {
79 StringBuilder message = new StringBuilder();
80 // The "FATAL EXCEPTION" string is still used on Android even though
81 // apps can set a custom UncaughtExceptionHandler that renders uncaught
82 // exceptions non-fatal.
83 message.append("FATAL EXCEPTION: ").append(t.getName()).append("\n");
84 final String processName = ActivityThread.currentProcessName();
85 if (processName != null) {
86 message.append("Process: ").append(processName).append(", ");
87 }
88 message.append("PID: ").append(Process.myPid());
89 Clog_e(TAG, message.toString(), e);
90 }
91 }
92 }
应用的JAVA的crash问题是FATAL EXCEPTION开头的,比如:
01-16 11:48:50.525 10026 17635 17635 E AndroidRuntime: FATAL EXCEPTION: main
01-16 11:48:50.525 10026 17635 17635 E AndroidRuntime: Process: com.xiaomi.scanner, PID: 17635
01-16 11:48:50.525 10026 17635 17635 E AndroidRuntime: java.lang.IllegalArgumentException: View=DecorView@77ff3a0[] not attached to window manager
01-16 11:48:50.525 10026 17635 17635 E AndroidRuntime: at android.view.WindowManagerGlobal.findViewLocked(WindowManagerGlobal.java:491)
01-16 11:48:50.525 10026 17635 17635 E AndroidRuntime: at android.view.WindowManagerGlobal.removeView(WindowManagerGlobal.java:400)
01-16 11:48:50.525 10026 17635 17635 E AndroidRuntime: at android.view.WindowManagerImpl.removeViewImmediate(WindowManagerImpl.java:125)
01-16 11:48:50.525 10026 17635 17635 E AndroidRuntime: at android.app.Dialog.dismissDialog(Dialog.java:374)
01-16 11:48:50.525 10026 17635 17635 E AndroidRuntime: at android.app.Dialog.dismiss(Dialog.java:357)
01-16 11:48:50.525 10026 17635 17635 E AndroidRuntime: at com.alibaba.imagesearch.ui.SearchResultActivity.b(Unknown Source:14)
01-16 11:48:50.525 10026 17635 17635 E AndroidRuntime: at com.alibaba.imagesearch.a.d.c(Unknown Source:39)
01-16 11:48:50.525 10026 17635 17635 E AndroidRuntime: at com.alibaba.imagesearch.a.d.a(Unknown Source:53)
01-16 11:48:50.525 10026 17635 17635 E AndroidRuntime: at com.alibaba.imagesearch.ui.SearchResultActivity.b(Unknown Source:30)
01-16 11:48:50.525 10026 17635 17635 E AndroidRuntime: at com.alibaba.imagesearch.ui.SearchResultActivity.b(Unknown Source:0)
01-16 11:48:50.525 10026 17635 17635 E AndroidRuntime: at com.alibaba.imagesearch.ui.SearchResultActivity$6.onJsPrompt(Unknown
系统的JAVA的crash问题是FATAL EXCEPTION IN SYSTEM PROCESS开头的,比如:
logcat.log.01:2211: 08-27 16:41:16.664 2999 3026 E AndroidRuntime: *** FATAL EXCEPTION IN SYSTEM PROCESS: android.bg
logcat.log.01:2212: 08-27 16:41:16.664 2999 3026 E AndroidRuntime: java.lang.NullPointerException: Attempt to get length of null array
logcat.log.01:2213: 08-27 16:41:16.664 2999 3026 E AndroidRuntime: at com.android.server.net.NetworkPolicyManagerService.isUidIdle(NetworkPolicyManagerService.java:2318)
logcat.log.01:2214: 08-27 16:41:16.664 2999 3026 E AndroidRuntime: at com.android.server.net.NetworkPolicyManagerService.updateRuleForAppIdleLocked(NetworkPolicyManagerService.java:2244)
logcat.log.01:2215: 08-27 16:41:16.664 2999 3026 E AndroidRuntime: at com.android.server.net.NetworkPolicyManagerService.updateRulesForTempWhitelistChangeLocked(NetworkPolicyManagerService.java:2298)
logcat.log.01:2216: 08-27 16:41:16.664 2999 3026 E AndroidRuntime: at com.android.server.net.NetworkPolicyManagerService$3.run(NetworkPolicyManagerService.java:572)
logcat.log.01:2217: 08-27 16:41:16.664 2999 3026 E AndroidRuntime: at android.os.Handler.handleCallback(Handler.java:739)
logcat.log.01:2218: 08-27 16:41:16.664 2999 3026 E AndroidRuntime: at android.os.Handler.dispatchMessage(Handler.java:95)
logcat.log.01:2219: 08-27 16:41:16.664 2999 3026 E AndroidRuntime: at android.os.Looper.loop(Looper.java:148)
logcat.log.01:2220: 08-27 16:41:16.664 2999 3026 E AndroidRuntime: at android.os.HandlerThread.run(HandlerThread.java:61)
logcat.log.01:2221: 08-27 16:41:16.665 2999 3026 I am_crash: [2999,0,system_server,-1,java.lang.NullPointerException,Attempt to get length of null array,NetworkPolicyManagerService.java,2318]
logcat.log.01:2224: 08-27 16:41:16.696 2999 3026 I MitvActivityManagerService: handleApplicationCrash, processName: system_server
logcat.log.01:2225: 08-27 16:41:16.696 2999 3026 I Process : Sending signal. PID: 2999 SIG: 9
3.5.2.1、 发生JE问题,弹窗提醒用户。
100 private static class KillApplicationHandler implements Thread.UncaughtExceptionHandler {
101 public void uncaughtException(Thread t, Throwable e) {
102 try {
103 // Don't re-enter -- avoid infinite loops if crash-reporting crashes.
104 if (mCrashing) return;
105 mCrashing = true;
106
107 // Try to end profiling. If a profiler is running at this point, and we kill the
108 // process (below), the in-memory buffer will be lost. So try to stop, which will
109 // flush the buffer. (This makes method trace profiling useful to debug crashes.)
110 if (ActivityThread.currentActivityThread() != null) {
111 ActivityThread.currentActivityThread().stopProfiling();
112 }
113
114 // Bring up crash dialog, wait for it to be dismissed,通知AMS弹窗
115 ActivityManager.getService().handleApplicationCrash(
116 mApplicationObject, new ApplicationErrorReport.ParcelableCrashInfo(e));
117 } catch (Throwable t2) {
118 if (t2 instanceof DeadObjectException) {
119 // System process is dead; ignore
120 } else {
121 try {
122 Clog_e(TAG, "Error reporting crash", t2);
123 } catch (Throwable t3) {
124 // Even Clog_e() fails! Oh well.
125 }
126 }
127 } finally {
128 // Try everything to make sure this process goes away.
129 Process.killProcess(Process.myPid());
130 System.exit(10);
131 }
132 }
133 }
3.5.3、ZygoteInit的nativeZygoteInit方法
nativeZygoteInit方法是个JNI方法,在AndroidRuntime.cpp中注册。
/frameworks/base/core/jni/AndroidRuntime.cpp
1281
1282static const RegJNIRec gRegJNI[] = {
1283 REG_JNI(register_com_android_internal_os_RuntimeInit),
1284 REG_JNI(register_com_android_internal_os_ZygoteInit),
.....
/frameworks/base/core/jni/AndroidRuntime.cpp
48int register_com_android_internal_os_ZygoteInit(JNIEnv* env)
249{
250 const JNINativeMethod methods[] = {
251 { "nativeZygoteInit", "()V",
252 (void*) com_android_internal_os_ZygoteInit_nativeZygoteInit },
253 };
254 return jniRegisterNativeMethods(env, "com/android/internal/os/ZygoteInit",
255 methods, NELEM(methods));
256}
所以实际调用的是com_android_internal_os_ZygoteInit_nativeZygoteInit函数。
/frameworks/base/core/jni/AndroidRuntime.cpp
221static void com_android_internal_os_ZygoteInit_nativeZygoteInit(JNIEnv* env, jobject clazz)
222{
223 gCurRuntime->onZygoteInit();
224}
com_android_internal_os_ZygoteInit_nativeZygoteInit调用的是AndroidRuntime的onZygoteInit函数,但是onZygoteInit函数是个虚函数,它的实现是app_main.cpp中。
/frameworks/base/cmds/app_process/app_main.cpp
91 virtual void onZygoteInit()
92 {
93 sp<ProcessState> proc = ProcessState::self();
94 ALOGV("App process: starting thread pool.\n");
//开启Binder线程池
95 proc->startThreadPool();
96 }
/frameworks/native/libs/binder/ProcessState.cpp
145void ProcessState::startThreadPool()
146{
147 AutoMutex _l(mLock);
148 if (!mThreadPoolStarted) {
149 mThreadPoolStarted = true;
150 spawnPooledThread(true);
151 }
152}
153
/frameworks/native/libs/binder/ProcessState.cpp
300void ProcessState::spawnPooledThread(bool isMain)
301{
302 if (mThreadPoolStarted) {
303 String8 name = makeBinderThreadName();
304 ALOGV("Spawning new pooled thread, name=%s\n", name.string());
305 sp<Thread> t = new PoolThread(isMain);
306 t->run(name.string());
307 }
308}
/frameworks/native/libs/binder/ProcessState.cpp
292String8 ProcessState::makeBinderThreadName() {
293 int32_t s = android_atomic_add(1, &mThreadPoolSeq);
294 pid_t pid = getpid();
295 String8 name;
296 name.appendFormat("Binder:%d_%X", pid, s);
297 return name;
298}
3.5.4、RuntimeInit的applicationInit方法
/frameworks/base/core/java/com/android/internal/os/RuntimeInit.java
289 protected static void applicationInit(int targetSdkVersion, String[] argv, ClassLoader classLoader)
290 throws Zygote.MethodAndArgsCaller {
291 // If the application calls System.exit(), terminate the process
292 // immediately without running any shutdown hooks. It is not possible to
293 // shutdown an Android application gracefully. Among other things, the
294 // Android runtime shutdown hooks close the Binder driver, which can cause
295 // leftover running threads to crash before the process actually exits.
296 nativeSetExitWithoutCleanup(true);
297
298 // We want to be fairly aggressive about heap utilization, to avoid
299 // holding on to a lot of memory that isn't needed.
300 VMRuntime.getRuntime().setTargetHeapUtilization(0.75f);
301 VMRuntime.getRuntime().setTargetSdkVersion(targetSdkVersion);
302
303 final Arguments args;
304 try {
//将com.android.server.SystemServer赋值给startClass
305 args = new Arguments(argv);
306 } catch (IllegalArgumentException ex) {
307 Slog.e(TAG, ex.getMessage());
308 // let the process exit
309 return;
310 }
311
312 // The end of of the RuntimeInit event (see #zygoteInit).
313 Trace.traceEnd(Trace.TRACE_TAG_ACTIVITY_MANAGER);
314
315 // Remaining arguments are passed to the start class's static main
316 invokeStaticMain(args.startClass, args.startArgs, classLoader);
317 }
经过applicationInit中的Arguments构造方法,args.startClass的值就是com.android.server.SystemServer。
/frameworks/base/core/java/com/android/internal/os/RuntimeInit.java
231 private static void invokeStaticMain(String className, String[] argv, ClassLoader classLoader)
232 throws Zygote.MethodAndArgsCaller {
233 Class<?> cl;
234
235 try {
236 cl = Class.forName(className, true, classLoader);
237 } catch (ClassNotFoundException ex) {
238 throw new RuntimeException(
239 "Missing class when invoking static main " + className,
240 ex);
241 }
242
243 Method m;
244 try {
245 m = cl.getMethod("main", new Class[] { String[].class });
246 } catch (NoSuchMethodException ex) {
247 throw new RuntimeException(
248 "Missing static main on " + className, ex);
249 } catch (SecurityException ex) {
250 throw new RuntimeException(
251 "Problem getting static main on " + className, ex);
252 }
253
254 int modifiers = m.getModifiers();
255 if (! (Modifier.isStatic(modifiers) && Modifier.isPublic(modifiers))) {
256 throw new RuntimeException(
257 "Main method is not public and static on " + className);
258 }
259
260 /*
261 * This throw gets caught in ZygoteInit.main(), which responds
262 * by invoking the exception's run() method. This arrangement
263 * clears up all the stack frames that were required in setting
264 * up the process.
265 */
266 throw new Zygote.MethodAndArgsCaller(m, argv);
267 }
加载com.android.server.SystemServer的字节码,反射此类的main方法,得到Method对象,抛出Zygote.MethodAndArgsCaller异常。回到最开始的ZygoteInit的main方法。经过层层调用,ZygoteInit.main-->ZygoteInit.startSystemServer-->Zygote.forkSystemServer-->com_android_internal_os_Zygote_nativeForkSystemServer-->ForkAndSpecializeCommon-->fork-->ZygoteInit.handleSystemServerProcess--> ZygoteInit.zygoteInit-->RuntimeInit.applicationInit-->RuntimeInit.invokeStaticMain。最终来到invokeStaticMain方法,抛出一个Zygote.MethodAndArgsCaller异常被ZygoteInit.main方法所捕获。
frameworks/base/core/java/com/android/internal/os/ZygoteInit.java
public static void main(String argv[]) {
//1、创建ZygoteServer
ZygoteServer zygoteServer = new ZygoteServer();
try {
//2、创建一个Server端的Socket
zygoteServer.registerServerSocket(socketName);
//3、加载进程的资源和类
preload(bootTimingsTraceLog);
if (startSystemServer) {
//4、开启SystemServer进程,这是受精卵进程的第一次分裂
startSystemServer(abiList, socketName, zygoteServer);
}
//5、启动一个死循环监听来自Client端的消息
zygoteServer.runSelectLoop(abiList);
//6、关闭SystemServer的Socket
zygoteServer.closeServerSocket();
} catch (Zygote.MethodAndArgsCaller caller) {
//7、这里捕获这个异常调用MethodAndArgsCaller的run方法。
caller.run();
} catch (Throwable ex) {
Log.e(TAG, "System zygote died with exception", ex);
zygoteServer.closeServerSocket();
throw ex;
}
}
/frameworks/base/core/java/com/android/internal/os/Zygote.java
225 public static class MethodAndArgsCaller extends Exception
226 implements Runnable {
227 /** method to call */
228 private final Method mMethod;
229
230 /** argument array */
231 private final String[] mArgs;
232
233 public MethodAndArgsCaller(Method method, String[] args) {
234 mMethod = method;//构造函数, 将SystemServer的main函数赋值给mMethod
235 mArgs = args;
236 }
237
238 public void run() {
239 try {
//执行SystemServer的main函数, 从而进入到SystemServer的main方法。
240 mMethod.invoke(null, new Object[] { mArgs });
241 } catch (IllegalAccessException ex) {
242 throw new RuntimeException(ex);
243 } catch (InvocationTargetException ex) {
244 Throwable cause = ex.getCause();
245 if (cause instanceof RuntimeException) {
246 throw (RuntimeException) cause;
247 } else if (cause instanceof Error) {
248 throw (Error) cause;
249 }
250 throw new RuntimeException(ex);
251 }
252 }
253 }
254}
- 思考:为什么这里要有抛出异常的方式调用SytemServer的main方法呢?
因为从ZygoteInit的main开始fork一个进程出来,经过了层层调用,系统中累积了不少栈帧,为了一个创建一个干干净净的进程,需要清除里面的栈帧,故抛出这个异常。
四、总结
本文主要梳理了SystemServer进程的启动,这是受精卵进程的第一次分裂,有几个重点需要把握。
- 1、waitpid方法的特殊使用
- 2、SystemServer与Zygote共存亡
- 3、进程出错堆栈是怎么输出的,以及错误Dialog是怎么弹出的
- 4、为什么要有抛出异常的方式调用SytemServer的main方法
下篇将会梳理SytemServer的main里面做了哪些事情。