共计 22348 个字符,预计需要花费 56 分钟才能阅读完成。
「Android」ANR 入门学习笔记
ANR 全称是 Applicatipon Not Response
,Android 设计 ANR 的用意,是零碎通过与之交互的组件以及用户交互进行超时监控,用来判断利用过程是否存在卡死或响应过慢的问题
ANR 的触发起因
应用层导致 ANR 的起因:
- 函数阻塞:如死循环、主线程 IO、解决大数据
- 锁出错:主线程期待子线程的锁
- 内存缓和:零碎调配给一个利用的内存是有下限的,长期处于内存缓和,会导致频繁内存替换,进而导致利用的一些操作超时
零碎导致 ANR 的起因:
- CPU 被抢占:一般来说,前台在玩游戏,可能会导致你的后盾播送被抢占
- 零碎服务无奈及时响应:比方获取零碎联系人等,零碎的服务都是 Binder 机制,服务能力也是无限的,有可能零碎服务长时间不响应导致 ANR
- 其余利用占用大量内存
ANR 的触发规范
简略来说,ANR 就是 零碎响应超时,Android 对于零碎是否响应超时有如下规范:
- Service 触发 ANR Service Timeout:比方前台服务在 20s 内未执行实现,后盾服务 Timeout 工夫是前台服务的 10 倍,200s;
- Broadcast 触发 ANR BroadcastQueue Timeout:比方前台播送在 10s 内未执行实现,后盾 60s;
- Provider 触发 ANR ContentProvider Timeout:内容提供者,在 publish 过超时 10s;
- Input 触发 ANR InputDispatching Timeout:输出事件散发超时 5s,包含按键和触摸事件。
//ActiveServices.java
// How long we wait for a service to finish executing.
static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10;
// How long the startForegroundService() grace period is to get around to
// calling startForeground() before we ANR + stop it.
static final int SERVICE_START_FOREGROUND_TIMEOUT = 10*1000;
//ActivityManagerService.java
// How long we allow a receiver to run before giving up on it.
static final int BROADCAST_FG_TIMEOUT = 10*1000;
static final int BROADCAST_BG_TIMEOUT = 60*1000;
// How long we wait until we timeout on key dispatching.
static final int KEY_DISPATCHING_TIMEOUT = 5*1000;
ANR 的触发流程
对于 Service、Broadcast、Provider 触发的 ANR 来说,相似于是一个埋炸弹、拆炸弹、引爆炸弹的过程,例如 startService 的时候:
惯例场景:埋炸弹 — 拆炸弹
App->AMS: 启动 Service
App->Handler: 启动规范工夫 20s 倒计时(埋炸弹)AMS-->App: 启动实现
App->Handler: 勾销倒计时(拆炸弹)
ANR 场景:埋炸弹 — 引爆炸弹
App->AMS: 启动 Service
App->Handler: 启动规范工夫 20s 倒计时(埋炸弹)Note right of AMS: 20s 内未实现 Service 启动
Note right of Handler: 20s 内未勾销
Handler->AMS: 触发 ANR(引爆炸弹)
对于 Input 触发的 ANR 来说,与 Service、Broadcast、Provider 触发的 ANR 有所不同,其并非工夫到了就肯定会引爆炸弹(触发 ANR),而是在后续的 Input 事件上报的过程中,才会去被动检测前一个正在解决的事件是否超时。若超时则触发 ANR,反之则重置计时器。
ANR 的 dump 流程
不论 ANR 是怎么产生的,最终逻辑都会调用 appNotResponding
函数:
graph TD
ActivityManagerService#inputDispatchingTimedOut
--> AnrHelper#appNotResponding
--> AnrConsumerThread#run
--> AnrRecord#appNotResponding
--> ProcessRecord#appNotResponding
//com.android.server.am.ProcessRecord.java
void appNotResponding(String activityShortComponentName, ApplicationInfo aInfo,
String parentShortComponentName, WindowProcessController parentProcess,
boolean aboveSystem, String annotation, boolean onlyDumpSelf) {ArrayList<Integer> firstPids = new ArrayList<>(5);
SparseArray<Boolean> lastPids = new SparseArray<>(20);
mWindowProcessController.appEarlyNotResponding(annotation, () -> kill("anr",
ApplicationExitInfo.REASON_ANR, true));
long anrTime = SystemClock.uptimeMillis();
if (isMonitorCpuUsage()) {mService.updateCpuStatsNow();
}
final boolean isSilentAnr;
synchronized (mService) {
// 正文 1
// PowerManager.reboot() can block for a long time, so ignore ANRs while shutting down.
// 正在重启
if (mService.mAtmInternal.isShuttingDown()) {Slog.i(TAG, "During shutdown skipping ANR:" + this + " " + annotation);
return;
} else if (isNotResponding()) {
// 曾经处于 ANR 流程中
Slog.i(TAG, "Skipping duplicate ANR:" + this + " " + annotation);
return;
} else if (isCrashing()) {
// 正在 crash 的状态
Slog.i(TAG, "Crashing app skipping ANR:" + this + " " + annotation);
return;
} else if (killedByAm) {
//app 曾经被 killed
Slog.i(TAG, "App already killed by AM skipping ANR:" + this + " " + annotation);
return;
} else if (killed) {
//app 曾经死亡了
Slog.i(TAG, "Skipping died app ANR:" + this + " " + annotation);
return;
}
// In case we come through here for the same app before completing
// this one, mark as anring now so we will bail out.
// 做个标记
setNotResponding(true);
// Log the ANR to the event log.
EventLog.writeEvent(EventLogTags.AM_ANR, userId, pid, processName, info.flags,
annotation);
// Dump thread traces as quickly as we can, starting with "interesting" processes.
firstPids.add(pid);
// Don't dump other PIDs if it's a background ANR or is requested to only dump self.
// 正文 2
// 缄默的 anr : 这里示意后盾 anr
isSilentAnr = isSilentAnr();
if (!isSilentAnr && !onlyDumpSelf) {
int parentPid = pid;
if (parentProcess != null && parentProcess.getPid() > 0) {parentPid = parentProcess.getPid();
}
if (parentPid != pid) firstPids.add(parentPid);
if (MY_PID != pid && MY_PID != parentPid) firstPids.add(MY_PID);
// 抉择须要 dump 的过程
for (int i = getLruProcessList().size() - 1; i >= 0; i--) {ProcessRecord r = getLruProcessList().get(i);
if (r != null && r.thread != null) {
int myPid = r.pid;
if (myPid > 0 && myPid != pid && myPid != parentPid && myPid != MY_PID) {if (r.isPersistent()) {firstPids.add(myPid);
if (DEBUG_ANR) Slog.i(TAG, "Adding persistent proc:" + r);
} else if (r.treatLikeActivity) {firstPids.add(myPid);
if (DEBUG_ANR) Slog.i(TAG, "Adding likely IME:" + r);
} else {lastPids.put(myPid, Boolean.TRUE);
if (DEBUG_ANR) Slog.i(TAG, "Adding ANR proc:" + r);
}
}
}
}
}
}
......
int[] pids = nativeProcs == null ? null : Process.getPidsForCommands(nativeProcs);
ArrayList<Integer> nativePids = null;
if (pids != null) {nativePids = new ArrayList<>(pids.length);
for (int i : pids) {nativePids.add(i);
}
}
// For background ANRs, don't pass the ProcessCpuTracker to
// avoid spending 1/2 second collecting stats to rank lastPids.
StringWriter tracesFileException = new StringWriter();
// To hold the start and end offset to the ANR trace file respectively.
final long[] offsets = new long[2];
// 正文 4
File tracesFile = ActivityManagerService.dumpStackTraces(firstPids,
isSilentAnr ? null : processCpuTracker, isSilentAnr ? null : lastPids,
nativePids, tracesFileException, offsets);
......
}
正文 1 处针对几种非凡状况进行解决:正在重启、曾经处于 ANR 流程中、正在 crash、app 曾经被 killed 和 app 曾经死亡了,不必解决 ANR,间接 return。
正文 2 处 isSilentAnr 是示意以后是否为一个后盾 ANR,前台 ANR 会弹出无响应的 Dialog,后盾 ANR 会间接杀死过程。什么是前台 ANR:产生 ANR 的过程对用户来说有感知,就是前台 ANR,否则就是后盾 ANR。
正文 3 处,抉择须要 dump 的过程。产生 ANR 时,为了不便定位问题,会 dump 很多信息到 Trace 文件中。而 Trace 文件里蕴含着与 ANR 相关联的过程的 Trace 信息,因为产生 ANR 的起因有可能是其余的过程抢占了太多资源,或者 IPC 到其余过程的时候卡住导致的。须要被 dump 的过程分为 3 类:
- firstPids:firstPids 是须要首先 dump 的重要过程,产生 ANR 的过程无论如何是肯定要被 dump 的,也是首先被 dump 的,所以第一个被加到 firstPids 中。如果是 SilentAnr(即后盾 ANR),不必再退出任何其余的过程。如果不是,须要进一步增加其余的过程:如果产生 ANR 的过程不是 system_server 过程的话,须要增加 system_server 过程;接下来轮询 AMS 保护的一个 LRU 的过程 List,如果最近拜访的过程蕴含了 persistent 的过程,或者带有 BIND_TREAT_LIKE_ACTVITY 标签的过程,都增加到 firstPids 中。
- extraPids:LRU 过程 List 中的其余过程,都会首先增加到 lastPids 中,而后 lastPids 会进一步被选出最近 CPU 使用率高的过程,进一步组成 extraPids;
- nativePids:nativePids 最为简略,是一些固定的 native 的零碎过程,定义在 WatchDog.java 中
正文 4 处,拿到须要 dump 的所有过程的 pid 后,AMS 开始依照 firstPids、nativePids、extraPids 的程序 dump 这些过程的堆栈:
public static Pair<Long, Long> dumpStackTraces(String tracesFile, ArrayList<Integer> firstPids,
ArrayList<Integer> nativePids, ArrayList<Integer> extraPids) {
// 最多 dump 20 秒
long remainingTime = 20 * 1000;
// First collect all of the stacks of the most important pids.
if (firstPids != null) {int num = firstPids.size();
for (int i = 0; i < num; i++) {final int pid = firstPids.get(i);
final long timeTaken = dumpJavaTracesTombstoned(pid, tracesFile, remainingTime);
remainingTime -= timeTaken;
if (remainingTime <= 0) {
Slog.e(TAG, "Aborting stack trace dump (current firstPid=" + pid
+ "); deadline exceeded.");
return firstPidStart >= 0 ? new Pair<>(firstPidStart, firstPidEnd) : null;
}
}
}
......
}
依据程序取出后面传入的 firstPids、nativePids、extraPids 的 pid,而后逐个去 dump 这些过程中所有的线程。因为此处有多个过程且每个过程通常有多个线程,所以这里规定了个最长 dump 工夫为 20 秒,超过则及时返回,这样能够确保 ANR 弹窗能够及时弹出(或者被 kill 掉)。
接下来的调用链为:
graph TD
ActivityManagerService#dumpJavaTracesTombstoned
--> Debug#dumpJavaBacktraceToFileTimeout
--> android_os_Debug#android_os_Debug_dumpJavaBacktraceToFileTimeout
--> android_os_Debug#dumpTraces
--> debuggerd_client#dump_backtrace_to_file_timeout
--> debuggerd_client#debuggerd_trigger_dump
bool debuggerd_trigger_dump(pid_t tid, DebuggerdDumpType dump_type, unsigned int timeout_ms, unique_fd output_fd) {
//pid 是从 AMS 那边传过来的,即须要 dump 堆栈的过程
pid_t pid = tid;
//......
// Send the signal.
// 从 android_os_Debug_dumpJavaBacktraceToFileTimeout 过去的,dump_type 为 kDebuggerdJavaBacktrace
const int signal = (dump_type == kDebuggerdJavaBacktrace) ? SIGQUIT : BIONIC_SIGNAL_DEBUGGER;
sigval val = {.sival_int = (dump_type == kDebuggerdNativeBacktrace) ? 1 : 0};
//sigqueue:在队列中向指定过程发送一个信号和数据,胜利返回 0
if (sigqueue(pid, signal, val) != 0) {log_error(output_fd, errno, "failed to send signal to pid %d", pid);
return false;
}
//......
LOG(INFO) << TAG "done dumping process" << pid;
return true;
}
除 Zygote 过程外,每个过程都会创立一个 SignalCatcher 守护线程,用于捕捉 SIGQUIT、SIGUSR1 信号,并采取相应的行为。
AMS 过程间接给须要 dump 堆栈那个过程发送了一个 SIGQUIT 信号,过程收到 SIGQUIT 信号之后便开始 dump。也就是说,每当一个过程产生 ANR 时,则会收到 SIGQUIT 信号。换言之,如果能监控到零碎发送的 SIGQUIT 信号,就能监控到产生了 ANR。
总结 ANR 的 dump 流程:
系统监控到 app 产生 ANR 后,收集了一些相干过程 pid(包含产生 ANR 的过程),筹备让这些过程 dump 堆栈,从而生成 ANR Trace 文件,接着零碎开始向这些过程发送 SIGQUIT 信号,过程收到 SIGQUIT 信号之后开始 dump 堆栈。
ANR 的剖析
获取 ANR 日志的形式:
adb pull /data/anr/
adb bugreport
trace 文件剖析
----- pid 7761 at 2022-11-02 07:02:26 -----
Cmd line: com.xfhy.watchsignaldemo
Build fingerprint: 'HUAWEI/LYA-AL00/HWLYA:10/HUAWEILYA-AL00/10.1.0.163C00:user/release-keys'
ABI: 'arm64'
Build type: optimized
Zygote loaded classes=11918 post zygote classes=729
Dumping registered class loaders
#0 dalvik.system.PathClassLoader: [], parent #1
#1 java.lang.BootClassLoader: [], no parent
#2 dalvik.system.PathClassLoader: [/system/app/FeatureFramework/FeatureFramework.apk], no parent
#3 dalvik.system.PathClassLoader: [/data/app/com.xfhy.watchsignaldemo-4tkKMWojrpHAf-Q3iecaHQ==/base.apk:/data/app/com.xfhy.watchsignaldemo-4tkKMWojrpHAf-Q3iecaHQ==/base.apk!classes2.dex:/data/app/com.xfhy.watchsignaldemo-4tkKMWojrpHAf-Q3iecaHQ==/base.apk!classes4.dex:/data/app/com.xfhy.watchsignaldemo-4tkKMWojrpHAf-Q3iecaHQ==/base.apk!classes3.dex], parent #1
Done dumping class loaders
Intern table: 44132 strong; 436 weak
JNI: CheckJNI is off; globals=681 (plus 67 weak)
Libraries: /data/app/com.xfhy.watchsignaldemo-4tkKMWojrpHAf-Q3iecaHQ==/lib/arm64/libwatchsignaldemo.so libandroid.so libcompiler_rt.so libhitrace_jni.so libhiview_jni.so libhwapsimpl_jni.so libiAwareSdk_jni.so libimonitor_jni.so libjavacore.so libjavacrypto.so libjnigraphics.so libmedia_jni.so libopenjdk.so libsoundpool.so libwebviewchromium_loader.so (15)
// 已调配堆内存大小 26M, 其中 2442kb 医用,总调配 74512 个对象
Heap: 90% free, 2442KB/26MB; 74512 objects
Total number of allocations 120222 // 过程创立到当初一共创立了多少对象
Total bytes allocated 10MB // 过程创立到当初一共申请了多少内存
Total bytes freed 8173KB // 过程创立到当初一共开释了多少内存
Free memory 23MB // 不扩大堆的状况下可用的内存
Free memory until GC 23MB //GC 前的可用内存
Free memory until OOME 381MB //OOM 之前的可用内存, 这个值很小的话,阐明曾经处于内存紧张状态,app 可能是占用了过多的内存
Total memory 26MB // 以后总内存(已用 + 可用)Max memory 384MB // 过程最多能申请的内存
.....// 省略 GC 相干信息
// 以后过程共 17 个线程
DALVIK THREADS (17):
//Signal Catcher 线程调用栈
"Signal Catcher" daemon prio=5 tid=4 Runnable
| group="system" sCount=0 dsCount=0 flags=0 obj=0x18c84570 self=0x7252417800
| sysTid=7772 nice=0 cgrp=default sched=0/0 handle=0x725354ad50
| state=R schedstat=(16273959 1085938 5) utm=0 stm=1 core=4 HZ=100
| stack=0x7253454000-0x7253456000 stackSize=991KB
| held mutexes= "mutator lock"(shared held)
native: #00 pc 000000000042f8e8 /apex/com.android.runtime/lib64/libart.so (art::DumpNativeStack(std::__1::basic_ostream<char, std::__1::char_traits<char>>&, int, BacktraceMap*, char const*, art::ArtMethod*, void*, bool)+140)
native: #01 pc 0000000000523590 /apex/com.android.runtime/lib64/libart.so (art::Thread::DumpStack(std::__1::basic_ostream<char, std::__1::char_traits<char>>&, bool, BacktraceMap*, bool) const+508)
native: #02 pc 000000000053e75c /apex/com.android.runtime/lib64/libart.so (art::DumpCheckpoint::Run(art::Thread*)+844)
native: #03 pc 000000000053735c /apex/com.android.runtime/lib64/libart.so (art::ThreadList::RunCheckpoint(art::Closure*, art::Closure*)+504)
native: #04 pc 0000000000536744 /apex/com.android.runtime/lib64/libart.so (art::ThreadList::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char>>&, bool)+1048)
native: #05 pc 0000000000536228 /apex/com.android.runtime/lib64/libart.so (art::ThreadList::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char>>&)+884)
native: #06 pc 00000000004ee4d8 /apex/com.android.runtime/lib64/libart.so (art::Runtime::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char>>&)+196)
native: #07 pc 000000000050250c /apex/com.android.runtime/lib64/libart.so (art::SignalCatcher::HandleSigQuit()+1356)
native: #08 pc 0000000000501558 /apex/com.android.runtime/lib64/libart.so (art::SignalCatcher::Run(void*)+268)
native: #09 pc 00000000000cf7c0 /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+36)
native: #10 pc 00000000000721a8 /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64)
(no managed stack frames)
"main" prio=5 tid=1 Sleeping
| group="main" sCount=1 dsCount=0 flags=1 obj=0x73907540 self=0x725f010800
| sysTid=7761 nice=-10 cgrp=default sched=1073741825/2 handle=0x72e60080d0
| state=S schedstat=(281909898 5919799 311) utm=20 stm=7 core=4 HZ=100
| stack=0x7fca180000-0x7fca182000 stackSize=8192KB
| held mutexes=
at java.lang.Thread.sleep(Native method)
- sleeping on <0x00f895d9> (a java.lang.Object)
at java.lang.Thread.sleep(Thread.java:443)
- locked <0x00f895d9> (a java.lang.Object)
at java.lang.Thread.sleep(Thread.java:359)
at android.os.SystemClock.sleep(SystemClock.java:131)
at com.xfhy.watchsignaldemo.MainActivity.makeAnr(MainActivity.kt:35)
at java.lang.reflect.Method.invoke(Native method)
at androidx.appcompat.app.AppCompatViewInflater$DeclaredOnClickListener.onClick(AppCompatViewInflater.java:441)
at android.view.View.performClick(View.java:7317)
at com.google.android.material.button.MaterialButton.performClick(MaterialButton.java:1219)
at android.view.View.performClickInternal(View.java:7291)
at android.view.View.access$3600(View.java:838)
at android.view.View$PerformClick.run(View.java:28247)
at android.os.Handler.handleCallback(Handler.java:900)
at android.os.Handler.dispatchMessage(Handler.java:103)
at android.os.Looper.loop(Looper.java:219)
at android.app.ActivityThread.main(ActivityThread.java:8668)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:513)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1109)
... // 此处省略残余的 N 个线程
trace 参数具体解读:
"Signal Catcher" daemon prio=5 tid=4 Runnable
| group="system" sCount=0 dsCount=0 flags=0 obj=0x18c84570 self=0x7252417800
| sysTid=7772 nice=0 cgrp=default sched=0/0 handle=0x725354ad50
| state=R schedstat=(16273959 1085938 5) utm=0 stm=1 core=4 HZ=100
| stack=0x7253454000-0x7253456000 stackSize=991KB
| held mutexes= "mutator lock"(shared held)
第 1 行:”Signal Catcher” daemon prio=5 tid=4 Runnable
- “Signal Catcher”daemon:线程名,有 daemon 示意守护线程
- prio:线程优先级
- tid:线程外部 id
-
线程状态:Runnable
一般来说,main 线程处于 BLOCK、WAITING、TIMEWAITING 状态,基本上是函数阻塞导致的 ANR,如果 main 线程无异样,则应该排查 CPU 负载和内存环境。
第 2 行:| group=”system” sCount=0 dsCount=0 flags=0 obj=0x18c84570 self=0x7252417800
- group:线程所属的线程组
- sCount:线程挂起次数
- dsCount:用于调试的线程挂起次数
- obj:以后线程关联的 Java 线程对象
- self:以后线程地址
第 3 行:| sysTid=7772 nice=0 cgrp=default sched=0/0 handle=0x725354ad50
- sysTid:线程真正意义上的 tid
- nice:调度优先级,值越小则优先级越高
- cgrp:过程所属的过程调度组
- sched:调度策略
- handle:函数解决地址
第 4 行:| state=R schedstat=(16273959 1085938 5) utm=0 stm=1 core=4 HZ=100
- state:线程状态
- schedstat:CPU 调度工夫统计(schedstat 括号中的 3 个数字顺次是 Running、Runable、Switch,Running 工夫:CPU 运行的工夫,单位 ns,Runable 工夫:RQ 队列的等待时间,单位 ns,Switch 次数:CPU 调度切换次数)
- utm/stm:用户态 / 内核态的 CPU 工夫
- core:该线程的最初运行所在核
- HZ:时钟频率
第 5 行:| stack=0x7253454000-0x7253456000 stackSize=991KB
- stack:线程栈的地址区间
- stackSize:栈的大小
第 6 行:| held mutexes= “mutator lock”(shared held)
- mutex:所持有 mutex 类型,有独占锁 exclusive 和共享锁 shared 两类
案例剖析
主线程无卡顿,处于失常状态堆栈
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x74b38080 self=0x7ad9014c00
| sysTid=23081 nice=0 cgrp=default sched=0/0 handle=0x7b5fdc5548
| state=S schedstat=(284838633 166738594 505) utm=21 stm=7 core=1 HZ=100
| stack=0x7fc95da000-0x7fc95dc000 stackSize=8MB
| held mutexes=
kernel: __switch_to+0xb0/0xbc
kernel: SyS_epoll_wait+0x288/0x364
kernel: SyS_epoll_pwait+0xb0/0x124
kernel: cpu_switch_to+0x38c/0x2258
native: #00 pc 000000000007cd8c /system/lib64/libc.so (__epoll_pwait+8)
native: #01 pc 0000000000014d48 /system/lib64/libutils.so (android::Looper::pollInner(int)+148)
native: #02 pc 0000000000014c18 /system/lib64/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+60)
native: #03 pc 00000000001275f4 /system/lib64/libandroid_runtime.so (android::android_os_MessageQueue_nativePollOnce(_JNIEnv*, _jobject*, long, int)+44)
at android.os.MessageQueue.nativePollOnce(Native method)
at android.os.MessageQueue.next(MessageQueue.java:330)
at android.os.Looper.loop(Looper.java:169)
at android.app.ActivityThread.main(ActivityThread.java:7073)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:536)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:876)
主线程闲暇,因为它正处于 nativePollOnce,正在期待新音讯。处于这个状态还产生了 ANR,可能有 2 个起因:
- dump 堆栈机会太晚了,ANR 曾经产生过了,才去 dump 堆栈,此时主线程曾经恢复正常了
- CPU 抢占或者内存缓和等其余因素引起
遇到这种状况,要先去剖析 CPU、内存的应用状况。其次能够关注抓取日志的工夫和 ANR 产生的工夫是否相隔太久,工夫太久这个堆栈就没有剖析的意义了。
主线程执行耗时操作
suspend all histogram: Sum: 206us 99% C.I. 0.098us-46us Avg: 7.629us Max: 46us
DALVIK THREADS (16):
"main" prio=5 tid=1 Runnable
| group="main" sCount=0 dsCount=0 flags=0 obj=0x73907540 self=0x725f010800
| sysTid=32298 nice=-10 cgrp=default sched=1073741825/2 handle=0x72e60080d0
| state=R schedstat=(6746757297 5887495 256) utm=670 stm=4 core=6 HZ=100
| stack=0x7fca180000-0x7fca182000 stackSize=8192KB
| held mutexes= "mutator lock"(shared held)
at com.xfhy.watchsignaldemo.MainActivity.makeAnr(MainActivity.kt:58)
at java.lang.reflect.Method.invoke(Native method)
at androidx.appcompat.app.AppCompatViewInflater$DeclaredOnClickListener.onClick(AppCompatViewInflater.java:441)
at android.view.View.performClick(View.java:7317)
at com.google.android.material.button.MaterialButton.performClick(MaterialButton.java:1219)
at android.view.View.performClickInternal(View.java:7291)
at android.view.View.access$3600(View.java:838)
at android.view.View$PerformClick.run(View.java:28247)
at android.os.Handler.handleCallback(Handler.java:900)
at android.os.Handler.dispatchMessage(Handler.java:103)
at android.os.Looper.loop(Looper.java:219)
at android.app.ActivityThread.main(ActivityThread.java:8668)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:513)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1109)
主线程处于执行状态,不是闲暇状态,导致 ANR 了,阐明 com.xfhy.watchsignaldemo.MainActivity.makeAnr 有耗时操作。
主线程被锁阻塞
"main" prio=5 tid=1 Blocked
| group="main" sCount=1 dsCount=0 flags=1 obj=0x73907540 self=0x725f010800
| sysTid=19900 nice=-10 cgrp=default sched=0/0 handle=0x72e60080d0
| state=S schedstat=(542745832 9516666 182) utm=48 stm=5 core=4 HZ=100
| stack=0x7fca180000-0x7fca182000 stackSize=8192KB
| held mutexes=
at com.xfhy.watchsignaldemo.MainActivity.makeAnr(MainActivity.kt:59)
- waiting to lock <0x0c6f8c52> (a java.lang.Object) held by thread 22 // 正文 1
- locked <0x01abeb23> (a java.lang.Object)
at java.lang.reflect.Method.invoke(Native method)
at androidx.appcompat.app.AppCompatViewInflater$DeclaredOnClickListener.onClick(AppCompatViewInflater.java:441)
at android.view.View.performClick(View.java:7317)
at com.google.android.material.button.MaterialButton.performClick(MaterialButton.java:1219)
at android.view.View.performClickInternal(View.java:7291)
at android.view.View.access$3600(View.java:838)
at android.view.View$PerformClick.run(View.java:28247)
at android.os.Handler.handleCallback(Handler.java:900)
at android.os.Handler.dispatchMessage(Handler.java:103)
at android.os.Looper.loop(Looper.java:219)
at android.app.ActivityThread.main(ActivityThread.java:8668)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:513)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1109)
"卧槽" prio=5 tid=22 Blocked // 正文 2
| group="main" sCount=1 dsCount=0 flags=1 obj=0x12c8a118 self=0x71d625f800
| sysTid=20611 nice=0 cgrp=default sched=0/0 handle=0x71d4513d50
| state=S schedstat=(486459 0 3) utm=0 stm=0 core=4 HZ=100
| stack=0x71d4411000-0x71d4413000 stackSize=1039KB
| held mutexes=
at com.xfhy.watchsignaldemo.MainActivity$makeAnr$1.invoke(MainActivity.kt:52)
- waiting to lock <0x01abeb23> (a java.lang.Object) held by thread 1
- locked <0x0c6f8c52> (a java.lang.Object)
at com.xfhy.watchsignaldemo.MainActivity$makeAnr$1.invoke(MainActivity.kt:49)
at kotlin.concurrent.ThreadsKt$thread$thread$1.run(Thread.kt:30)
......
其中:
"main" prio=5 tid=1 Blocked
- waiting to lock <0x0c6f8c52> (a java.lang.Object) held by thread 22
- locked <0x01abeb23> (a java.lang.Object)
"卧槽" prio=5 tid=22 Blocked
- waiting to lock <0x01abeb23> (a java.lang.Object) held by thread 1
- locked <0x0c6f8c52> (a java.lang.Object)
主线程的 tid 是 1,线程状态是 Blocked,正在期待 0x0c6f8c52 这个 Object,而这个 Object 被 thread 22 这个线程所持有,主线程以后持有的是 0x01abeb23 的锁。而卧槽的 tid 是 22,也是 Blocked 状态,它想申请的和已有的锁刚好与主线程相同。这样的话,ANR 起因也就找到了。线程 22 持有了一把锁,并且始终不开释,主线程期待这把锁产生超时。在线上环境,常见因锁而 ANR 的场景是 SharePreference 写入。
CPU 被抢占
CPU usage from 0ms to 10625ms later (2020-03-09 14:38:31.633 to 2020-03-09 14:38:42.257):
543% 2045/com.test.demo: 54% user + 89% kernel / faults: 4608 minor 1 major // 留神看这里
99% 674/android.hardware.camera.provider@2.4-service: 81% user + 18% kernel / faults: 403 minor
24% 32589/com.wang.test: 22% user + 1.4% kernel / faults: 7432 minor 1 major
......
该过程占据 CPU 高达 543%,抢占了大部分 CPU 资源,因为导致产生 ANR,这种 ANR 通常与 App 无关。
内存缓和导致 ANR
10-31 22:37:19.749 20733 20733 E Runtime : onTrimMemory level:80,pid:com.xxx.xxx:Launcher0
10-31 22:37:33.458 20733 20733 E Runtime : onTrimMemory level:80,pid:com.xxx.xxx:Launcher0
10-31 22:38:00.153 20733 20733 E Runtime : onTrimMemory level:80,pid:com.xxx.xxx:Launcher0
10-31 22:38:58.731 20733 20733 E Runtime : onTrimMemory level:80,pid:com.xxx.xxx:Launcher0
10-31 22:39:02.816 20733 20733 E Runtime : onTrimMemory level:80,pid:com.xxx.xxx:Launcher0
如果一份 ANR 日志的 CPU 和堆栈都很失常,能够思考是内存缓和。看一下 ANR 日志外面的内存相干局部。还能够去日志外面搜一下 onTrimMemory,如果 dump ANR 日志的工夫左近有相干日志,可能是内存比拟缓和了。
零碎服务超时导致 ANR
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x727851e8 self=0x78d7060e00
| sysTid=4894 nice=0 cgrp=default sched=0/0 handle=0x795cc1e9a8
| state=S schedstat=(8292806752 1621087524 7167) utm=707 stm=122 core=5 HZ=100
| stack=0x7febb64000-0x7febb66000 stackSize=8MB
| held mutexes=
kernel: __switch_to+0x90/0xc4
kernel: binder_thread_read+0xbd8/0x144c
kernel: binder_ioctl_write_read.constprop.58+0x20c/0x348
kernel: binder_ioctl+0x5d4/0x88c
kernel: do_vfs_ioctl+0xb8/0xb1c
kernel: SyS_ioctl+0x84/0x98
kernel: cpu_switch_to+0x34c/0x22c0
native: #00 pc 000000000007a2ac /system/lib64/libc.so (__ioctl+4)
native: #01 pc 00000000000276ec /system/lib64/libc.so (ioctl+132)
native: #02 pc 00000000000557d4 /system/lib64/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+252)
native: #03 pc 0000000000056494 /system/lib64/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+60)
native: #04 pc 00000000000562d0 /system/lib64/libbinder.so (android::IPCThreadState::transact(int, unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+216)
native: #05 pc 000000000004ce1c /system/lib64/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+72)
native: #06 pc 00000000001281c8 /system/lib64/libandroid_runtime.so (???)
native: #07 pc 0000000000947ed4 /system/framework/arm64/boot-framework.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+196)
at android.os.BinderProxy.transactNative(Native method) ————————————————要害行!!!at android.os.BinderProxy.transact(Binder.java:804)
at android.net.IConnectivityManager$Stub$Proxy.getActiveNetworkInfo(IConnectivityManager.java:1204)—要害行!at android.net.ConnectivityManager.getActiveNetworkInfo(ConnectivityManager.java:800)
at com.xiaomi.NetworkUtils.getNetworkInfo(NetworkUtils.java:2)
at com.xiaomi.frameworkbase.utils.NetworkUtils.getNetWorkType(NetworkUtils.java:1)
at com.xiaomi.frameworkbase.utils.NetworkUtils.isWifiConnected(NetworkUtils.java:1)
零碎服务超时个别会蕴含 BinderProxy.transactNative 关键字。
从日志堆栈中能够看到是获取网络信息产生了 ANR:getActiveNetworkInfo。零碎的服务都是 Binder 机制(16 个线程),服务能力也是无限的,有可能零碎服务长时间不响应导致 ANR。如果其余利用占用了所有 Binder 线程,那么以后利用只能期待。可进一步搜寻 blockUntilThreadAvailable 关键字,at android.os.Binder.blockUntilThreadAvailable(Native method)。
如果有发现某个线程的堆栈,蕴含此字样,可进一步看其堆栈,确定是调用了什么零碎服务。此类 ANR 也是属于零碎环境的问题,如果某类型手机上频繁产生此问题,应用层能够思考躲避策略。
参考
https://mp.weixin.qq.com/s/qQ…
https://xfhy666.blog.csdn.net…