一、什么是ANR
ANR:Application Not Responding,即应用无响应
ANR一般有三种类型
- KeyDispatchTimeout(5 seconds) --主要类型按键或触摸事件在特定时间内无响应
- BroadcastTimeout(10 seconds) --BroadcastReceiver在特定时间内无法处理完成
- ServiceTimeout(20 seconds) --小概率类型 Service在特定的时间内无法处理完成
二、如何分析
- 首先查找 main.log,MTK平台通过SYS_ANDROID_EVENT_LOG来进行分析或者直接看_exp_main.txt
04-02 11:52:15.355 862 1517 I watchdog: Blocked in handler on main thread (main)
(从上面的log看,是wachtdog出现问题,是什么原因导致watchdog出现问题,要从其他的方面进行分析)
- 然后就是查看trace文件,看主进程文件异常。
kernel: (couldn't read /proc/self/task/862/stack)
native: #00 pc 0005fc88 /system/lib64/libc.so (__ioctl+4)
native: #01 pc 00085a98 /system/lib64/libc.so (ioctl+100)
native: #02 pc 0002a8b4 /system/lib64/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+164)
native: #03 pc 0002b3cc /system/lib64/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+112)
native: #04 pc 0002b640 /system/lib64/libbinder.so (android::IPCThreadState::transact(int, unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+176)
native: #05 pc 000235bc /system/lib64/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+108)
native: #06 pc 000e7228 /system/lib64/libandroid_runtime.so (???)
native: #07 pc 00f0d2a0 /data/dalvik-cache/arm64/system@framework@boot.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+212)
at android.os.BinderProxy.transactNative(Native method)
at android.os.BinderProxy.transact(Binder.java:504)
at com.android.internal.telephony.ISub$Stub$Proxy.getDefaultDataSubId(ISub.java:1078)
at android.telephony.SubscriptionManager.getDefaultDataSubId(SubscriptionManager.java:1119)
at android.telephony.TelephonyManager.getSimOperatorNumeric(TelephonyManager.java:1745)
at android.telephony.TelephonyManager.getSimOperator(TelephonyManager.java:1717)
at com.android.server.location.GpsLocationProvider.subscriptionOrSimChanged(GpsLocationProvider.java:567)
at com.android.server.location.GpsLocationProvider.access$800(GpsLocationProvider.java:124)
at com.android.server.location.GpsLocationProvider$3.onSubscriptionsChanged(GpsLocationProvider.java:559)
从上面卡住的CallStack可以看出是SystemServer binder到com.android.phone 进程卡住
然后查看binder调用信息,MTK可以从文件 SYS_BINDER_INFO中寻找,重点搜索关键字“outgoing transaction“
outgoing transaction 1755759: ffffffc048accb00 from 862:862 to 1762:0 code 14 flags 10 pri -2 r1 node 7969 size 80:0 data ffffff802b998878”
主要是看后面有一些长的log地方,找到 ”to 1762:0"后面的 1762 这个就是pid
从binder调用情况看,应该是对端进程的binder线程被耗尽,需要查看对端进程的trace
3.在经过上面的分析后,我们需要从trace文件找到 1762 这个PID所对应的相关进程,然后列出 Block以及Waiting相关的log,从而定位问题
outgoing transaction 1605208: ffffffc09a83ae00 from 1206:1237 to 9957:0 code 39 flags 10 pri 0 r1 node 661390 size 68:0 data ffffff80088001c8
意思是从1206进程掉9957进程
incoming transaction 1605143: ffffffc06b783980 from 2736:2801 to 1206:1237 code f flags 10 pri 0 r1 node 5688 size 88:0 data ffffff800a582d50
意思是从2736进程掉1206进程
pending async transaction 1602114: ffffffc0073af580 from 0:0 to 1206:0 code 1b flags 11 pri -4 r0 node 9259 size 88:0 data ffffff800a580b28
意思是等待调 1206 进程,未进入线程
三、案例分析
问题描述:
同步账号数据时接收短信手机卡死,phone(和systemui )进程无响应(出现一次)
【操作步骤】:
1、登录 XXX 账户,同步数据
2、同步数据的同时本机接收一条短信
分析步骤:
1) 通过event_log 找到 ANR time, PID, ANR type
06-15 15:15:23.774827 1025 1092 I am_anr : [0,1877,com.android.phone,952647245,Broadcast of Intent { act=android.intent.action.SCREEN_OFF flg=0x50000010 }]
从上面的log可以得到如下信息:
ANR发送的时间,ANR time:15:15:23.774827
进程ID, PID:1877
ANR类型,ANR type:broadcast timeout
2) 查看main.log, MTK平台查看sys_log, 这一步主要是查看cpu的使用率
06-15 15:14:57.667196 1025 1092 I ANRManager: getProcessState
06-15 15:14:57.668209 1025 1092 I ActivityManager: Android time :[2016-06-15 15:14:57.66] [2538.692]
06-15 15:14:57.668209 1025 1092 I ActivityManager: CPU usage from 3177ms to 33ms ago:
06-15 15:14:57.668209 1025 1092 I ActivityManager: 133% 12180/com.android.mms: 110% user + 23% kernel / faults: 23045 minor
06-15 15:14:57.668209 1025 1092 I ActivityManager: 81% 2061/android.process.acore: 80% user + 0.9% kernel / faults: 16 minor
06-15 15:14:57.668209 1025 1092 I ActivityManager: 20% 1025/system_server: 11% user + 8.9% kernel / faults: 1483 minor 22 major
06-15 15:14:57.668209 1025 1092 I ActivityManager: 13% 1877/com.android.phone: 13% user + 0% kernel / faults: 553 minor
06-15 15:14:57.668209 1025 1092 I ActivityManager: 10% 223/hps_main: 0% user + 10% kernel
06-15 15:14:57.668209 1025 1092 I ActivityManager: 3.1% 195/mmcqd/0: 0% user + 3.1% kernel
06-15 15:14:57.668209 1025 1092 I ActivityManager: 3.1% 12039/com.meizu.safe:Remote: 1.9% user + 1.2% kernel / faults: 15 minor
06-15 15:14:57.668209 1025 1092 I ActivityManager: 2.8% 194/exe_cq: 0% user + 2.8% kernel
06-15 15:14:57.668209 1025 1092 I ActivityManager: 1.9% 10/migration/0: 0% user + 1.9% kernel
06-15 15:14:57.668209 1025 1092 I ActivityManager: 1.9% 11/migration/1: 0% user + 1.9% kernel
06-15 15:14:57.668209 1025 1092 I ActivityManager: 1.5% 15/migration/2: 0% user + 1.5% kernel
06-15 15:14:57.668209 1025 1092 I ActivityManager: 1.5% 253/migration/4: 0% user + 1.5% kernel
06-15 15:14:57.668209 1025 1092 I ActivityManager: 1.5% 382/mobile_log_d: 0.3% user + 1.2% kernel / faults: 65 minor
06-15 15:14:57.668209 1025 1092 I ActivityManager: 1.2% 61/cfinteractive: 0% user + 1.2% kernel
06-15 15:14:57.668209 1025 1092 I ActivityManager: 1.2% 291/logd: 1.2% user + 0% kernel
06-15 15:14:57.668209 1025 1092 I ActivityManager: 0.9% 176/krtatm: 0% user + 0.9% kernel
06-15 15:14:57.668209 1025 1092 I ActivityManager: 0.6% 19/migration/3: 0% user + 0.6% kernel
06-15 15:14:57.668209 1025 1092 I ActivityManager: 0.6% 315/migration/5: 0% user + 0.6% kernel
06-15 15:14:57.668209 1025 1092 I ActivityManager: 0.6% 377/mediaserver: 0.3% user + 0.3% kernel
06-15 15:14:57.668209 1025 1092 I ActivityManager: 0% 9996/com.tencent.mm: 0% user + 0% kernel / faults: 9 minor
06-15 15:14:57.668209 1025 1092 I ActivityManager: 77% TOTAL: 61% user + 16% kernel + 0.1% iowait + 0% softirq
从log可以看出mms进程user使用率非常高
从CPU的使用情况,如果CPU使用量接近100%,说明当前设备很忙,有可能是CPU饥饿导致了ANR
如果CPU使用量很少,说明主线程被BLOCK了
如果IOwait很高,说明ANR有可能是主线程在进行I/O操作造成的
3)checke trace.txt(MTK平台打卡AEE DB log, 找到SWT_JBT_TRACES), 找到发生anr的进程ID,以及时间点
找到时间在15:15:23左右,PID为1877的主线程执行情况
----- pid 1877 at 2016-06-15 15:15:24 -----
Cmd line: com.android.phone
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 obj=0x76297f90 self=0x7f9137c400
| sysTid=1877 nice=0 cgrp=default sched=0/0 handle=0x7f951e92c0
| state=S schedstat=( 8028180262 2904544285 18378 ) utm=498 stm=304 core=6 HZ=100
| stack=0x7fcf303000-0x7fcf305000 stackSize=8MB
| held mutexes=
at android.os.BinderProxy.transactNative(Native method)
at android.os.BinderProxy.transact(Binder.java:514)
at com.android.internal.telephony.ITelephonyRegistry$Stub$Proxy.notifySignalStrengthForSubscriber(ITelephonyRegistry.java:718)
at com.android.internal.telephony.DefaultPhoneNotifier.notifySignalStrength(DefaultPhoneNotifier.java:131)
at com.android.internal.telephony.PhoneBase.notifySignalStrength(PhoneBase.java:1846)
at com.android.internal.telephony.ServiceStateTracker.notifySignalStrength(ServiceStateTracker.java:446)
- locked <0x04779908> (a android.telephony.CellInfoGsm)
at com.android.internal.telephony.ServiceStateTracker.onSignalStrengthResult(ServiceStateTracker.java:980)
at com.android.internal.telephony.gsm.GsmServiceStateTracker.handleMessage(GsmServiceStateTracker.java:869)
at android.os.Handler.dispatchMessage(Handler.java:111)
at android.os.Looper.loop(Looper.java:207)
at android.app.ActivityThread.main(ActivityThread.java:5940)
at java.lang.reflect.Method.invoke!(Native method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:929)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:790)
从上面卡住的CallStack可以看出是SystemServer binder到com.android.phone 进程卡住
看Binder对端:查看对应时间点的log:
----- pid 1025 at 2016-06-15 15:15:24 -----
Cmd line: system_server
"Binder_C" prio=5 tid=96 Blocked
| group="main" sCount=1 dsCount=0 obj=0x135b10a0 self=0x7f7079dc00
| sysTid=1961 nice=0 cgrp=default sched=0/0 handle=0x7f6ef45440
| state=S schedstat=( 18431909483 6382895667 48227 ) utm=1317 stm=526 core=0 HZ=100
| stack=0x7f6ee49000-0x7f6ee4b000 stackSize=1013KB
| held mutexes=
at com.android.server.TelephonyRegistry.notifySignalStrengthForSubscriber(TelephonyRegistry.java:908)
- waiting to lock <0x0aec94d6> (a java.util.ArrayList) held by thread 80
at com.android.internal.telephony.ITelephonyRegistry$Stub.onTransact(ITelephonyRegistry.java:184)
at android.os.Binder.execTransact(Binder.java:461)
被tid=80锁住
"Binder_4" prio=5 tid=80 Native
| group="main" sCount=1 dsCount=0 obj=0x12fb44c0 self=0x7f72404800
| sysTid=1365 nice=0 cgrp=default sched=0/0 handle=0x7f70a7f440
| state=S schedstat=( 18852901213 6143285031 48849 ) utm=1386 stm=499 core=7 HZ=100
| stack=0x7f70983000-0x7f70985000 stackSize=1013KB
| held mutexes=
at android.os.BinderProxy.transactNative(Native method)
at android.os.BinderProxy.transact(Binder.java:514)
at com.android.internal.telephony.ISub$Stub$Proxy.getPhoneId(ISub.java:915)
at android.telephony.SubscriptionManager.getPhoneId(SubscriptionManager.java:963)
at com.android.server.TelephonyRegistry.listen(TelephonyRegistry.java:582)
- locked <0x0aec94d6> (a java.util.ArrayList)
at com.android.server.TelephonyRegistry.listenForSubscriber(TelephonyRegistry.java:521)
at com.android.internal.telephony.ITelephonyRegistry$Stub.onTransact(ITelephonyRegistry.java:95)
at android.os.Binder.execTransact(Binder.java:461)
查看binder info信息(MTK平台查看SYS_BINDER_INFO文件),重点搜索关键字“outgoing transaction“
outgoing transaction 1465421: ffffffc06a140500 from 12180:12197 to 1877:3042 code 1 flags 10 pri 19 r1 node 18222 size 316:8 data ffffff800f145638
主要是看后面有一些长的log地方,找到 ”to 1877:"后面的1877这个就是pid
调用者是12180这个进程
应该是对端phone进程的binder线程被耗尽,需要查看对端进程com.android.phone 的trace
4) 在经过上面的分析后,我们需要从trace 文件中找到1877 这个PID所对应的相关进程,然后列出 Block以及Waiting相关的log,从而可以定位问题。
"Binder_1" prio=5 tid=7 TimedWaiting
| group="main" sCount=1 dsCount=0 obj=0x12c500a0 self=0x7f9137ce00
| sysTid=1889 nice=0 cgrp=default sched=0/0 handle=0x7f8fff8440
| state=S schedstat=( 26096548648 3199693554 22007 ) utm=2417 stm=192 core=0 HZ=100
| stack=0x7f8fefc000-0x7f8fefe000 stackSize=1013KB
| held mutexes=
at java.lang.Object.wait!(Native method)
- waiting on <0x06b95eb5> (a java.lang.Object)
at java.lang.Thread.parkFor$(Thread.java:1220)
- locked <0x06b95eb5> (a java.lang.Object)
at sun.misc.Unsafe.park(Unsafe.java:299)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
at android.database.sqlite.SQLiteConnectionPool.waitForConnection(SQLiteConnectionPool.java:670)
at android.database.sqlite.SQLiteConnectionPool.acquireConnection(SQLiteConnectionPool.java:348)
at android.database.sqlite.SQLiteSession.acquireConnection(SQLiteSession.java:894)
at android.database.sqlite.SQLiteSession.prepare(SQLiteSession.java:586)
at android.database.sqlite.SQLiteProgram.<init>(SQLiteProgram.java:58)
at android.database.sqlite.SQLiteQuery.<init>(SQLiteQuery.java:37)
at android.database.sqlite.SQLiteDirectCursorDriver.query(SQLiteDirectCursorDriver.java:44)
at android.database.sqlite.SQLiteDatabase.rawQueryWithFactory(SQLiteDatabase.java:1350)
at android.database.sqlite.SQLiteQueryBuilder.query(SQLiteQueryBuilder.java:400)
at android.database.sqlite.SQLiteQueryBuilder.query(SQLiteQueryBuilder.java:294)
at com.android.providers.telephony.MmsProvider.query(MmsProvider.java:244)
at android.content.ContentProvider.query(ContentProvider.java:1140)
at android.content.ContentProvider$Transport.query(ContentProvider.java:262)
at android.content.ContentProviderNative.onTransact(ContentProviderNative.java:112)
at android.os.Binder.execTransact(Binder.java:461)
问题原因就是xxx同步MmsProvider查询超时导致。