本系列是 我 TM 人傻了 系列第三期[捂脸],往期精彩回顾:
- 降级到 Spring 5.3.x 之后,GC 次数急剧减少,我 TM 人傻了
- 这个大表走索引字段查问的 SQL 怎么就成全扫描了,我 TM 人傻了
最近组里用第三方给的 SDK 搞了点开发,最近线上忽然开始报错,并且 发现一个特地奇怪的问题 ,组员和我说, 代码运行到一半不走了,跳过了一段(这代码是刚加入东奥会加入跳远么???)。
代码如下,逻辑非常简单:
try {log.info("initiate client with conf: {}", conf);
SDKClient client = new SDKClient(conf);
client.init();
log.info("client initiated");
} catch (Exception e) {log.error("initiate client failed", e);
}
log.info("start to manipulate...");
咱们发现 client 实际上没有初始化胜利,前面的业务解决始终在报错。查看日志,发现:
initiate client with conf: xxxxx
start to manipulate...
这就是组员说的代码产生了跳跃。因为既没有打印 client initiated
,也没有打印 initiate client failed
… 就间接 start to manipulate...
了。
老读者晓得,咱们的线上是 k8s + Docker,并且每个 镜像中内置了 Arthas,并且 Java 版本是 Java 16,并且启用了 JFR。日志中具备链路信息,通过 ELK Agent 拉取到对立日志服务器。
这个 SDK 外面要拜访的近程地址都有 IP 白名单,咱们为了平安本地并不能间接应用 SDK 拜访对方的线上环境。在本地测试连贯的是对方的测试环境,是没有问题的。所以这里,咱们还是得 通过 Arthas 进行定位。
首先得看看线上运行的源码是否和本地咱们看到的统一呢?这个能够通过 jad 命令:
jad 要看的类全限定名称
查看后发现,反编译后的代码,和咱们的源码统一诶。
而后咱们看看代码的理论执行:
trace 要看的类全限定名称 办法
之后从新执行这个办法,查看 trace 发现,初始化的时候的确抛出异样了:
# 省略咱们这里不关怀的
+---[min=0.010174ms,max=0.01184ms,total=0.022014ms,count=2] org.apache.logging.log4j.Logger:info() #130
+---[min=599.388978ms,max=630.23967ms,total=1229.628648ms,count=2] com.dasha13.sdk.SDKClient:<init>() #131
+---[min=203.617545ms,max=221.785512ms,total=425.403057ms,count=2] com.dasha13.sdk.SDKClient:init() #132 [throws Exception,2]
+---[min=0.034798ms,max=0.084505ms,total=0.119303ms,count=2] org.apache.logging.log4j.Logger:error() #136
+---[min=0.010174ms,max=0.01184ms,total=0.022014ms,count=2] org.apache.logging.log4j.Logger:info() #138
然而,这个异样日志,为何没有打印进去呢 ?咱们持续查看下这个异样,应用 watch 办法,并且指定查看深度为 2,这样 冀望能打印出堆栈以及 Message:
watch com.dasha13.sdk.SDKClient init {throwExp} -x 2
然而,这里 只打印了一个看似是 Message 的信息:
method=com.dasha13.sdk.SDKClient init location=AtExceptionExit
ts=2021-08-10 02:58:15; [cost=131.20209ms] result=ERROR DATA!!! object class: class java.util.ArrayList, exception class: class com.google.common.util.concurrent.UncheckedExecutionException, exception message: java.lang.IllegalArgumentException
这很奇怪,失常来说,指定深度为 2,如果有异样抛出,那么这个输入信息,会蕴含异样的 Message 以及堆栈信息的。这是怎么回事呢?咱们来别离获取堆栈以及信息试试:
首先获取堆栈:
watch com.dasha13.sdk.SDKClient init {throwExp.getStackTrace()} -x 2
从新执行出问题的办法,堆栈失常输入,没啥问题,不过看堆栈应该问题和 Google 的依赖翻转 Bean 治理框架(相似于 Spring)Guice 载入某个 Bean 出异样无关:
ts=2021-08-10 03:03:37; [cost=146.644563ms] result=@ArrayList[@StackTraceElement[][@StackTraceElement[com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1025)],
@StackTraceElement[com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1051)],
@StackTraceElement[com.dasha13.sdk.SDKClient.init(SDKClient.java:482)],
# 省略之后的
再来看异样信息:
watch com.dasha13.sdk.SDKClient init {throwExp.getMessage()} -x 2
从新执行出问题的办法,这时候发现 watch 失败:
watch failed, condition is: null, express is: {throwExp.getMessage()}, com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalArgumentException, visit /app/arthas/arthas.log for more details.
咱们依照提醒,查看 arthas 日志,发现的异样堆栈:
2021-08-10 03:07:11 [XNIO-2 task-3] ERROR c.t.a.c.command.express.OgnlExpress -Error during evaluating the expression:
com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalArgumentException
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203)
at com.google.common.cache.LocalCache.get(LocalCache.java:3937)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941)
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824)
at com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4830)
at com.google.inject.internal.util.StackTraceElements.forMember(StackTraceElements.java:66)
at com.google.inject.internal.Errors.formatSource(Errors.java:806)
at com.google.inject.internal.Errors.formatSource(Errors.java:785)
at com.google.inject.internal.Errors.formatInjectionPoint(Errors.java:839)
at com.google.inject.internal.Errors.formatSource(Errors.java:800)
at com.google.inject.internal.Errors.formatSource(Errors.java:785)
at com.google.inject.internal.Errors.format(Errors.java:584)
at com.google.inject.ProvisionException.getMessage(ProvisionException.java:60)
cause by: MethodNotFoundException: Method not found: class com.google.common.xxxxxxxxx
咱们发现,竟然是 ProvisionException 的 getMessage()
产生了异样,也就是 异样的 getMessage()
产生了异样 . 查看 异样的 Cause 咱们也定位进去,是 Guava 版本与 guice 版本不兼容导致 ,其根本原因是三方接口超时,导致初始化异样,有异样抛出被封装成 ProvisionException
,ProvisionException
异样的 getMessage
依赖 Guava Cache 缓存一些异样信息,然而咱们我的项目中 Guava 版本与 guice 版本不兼容,导致某些办法不存在,所以 ProvisionException
异样的 getMessage
也会有异样。 之前运行没问题是因为三方没有还没有过初始化的时候接口超时抛异样。。。
咱们应用的 log4j2 异步日志配置,并且将异样作为最初一个参数传入日志办法中,失常状况下,会输入这个 异样的 Message 以及异样堆栈 . 但从下面的剖析咱们晓得,获取 Message 的时候,抛出了异样。Log4j 的设计是应用了 日志事件的生产生产 这种架构。这里是消费者获取异样的 Message 以及异样堆栈,并且在获取 Message 的时候,发现有异样。对于 Log4j2 异步日志,发现有异样的时候,原有日志事件会被间接摈弃,并将异样输入到 StatusLogger 中(底层其实就是规范异样输入)中,这里对应 log4j 的源码:
AppenderControl
private void tryCallAppender(final LogEvent event) {
try {
// 调用 appender 输入日志
appender.append(event);
} catch (final RuntimeException error) {
// 解决 RuntimeException
handleAppenderError(event, error);
} catch (final Exception error) {
// 解决其余 Exception
handleAppenderError(event, new AppenderLoggingException(error));
}
}
private void handleAppenderError(final LogEvent event, final RuntimeException ex) {appender.getHandler().error(createErrorMsg("An exception occurred processing Appender"), event, ex);
if (!appender.ignoreExceptions()) {throw ex;}
}
ErrorHandler 个别都是默认实现,即 DefaultErrorHandler;DefaultErrorHandler 是输入到一个 StatusLogger:
DefaultErrorHandler
private static final Logger LOGGER = StatusLogger.getLogger();
public void error(final String msg, final LogEvent event, final Throwable t) {final long current = System.nanoTime();
if (current - lastException > EXCEPTION_INTERVAL || exceptionCount++ < MAX_EXCEPTIONS) {LOGGER.error(msg, t);
}
lastException = current;
if (!appender.ignoreExceptions() && t != null && !(t instanceof AppenderLoggingException)) {throw new AppenderLoggingException(msg, t);
}
}
StatusLogger 其实就是规范异样输入 System.err
:
StatusLogger
this.logger = new SimpleLogger("StatusLogger", Level.ERROR, false, true, showDateTime, false,
dateFormat, messageFactory, PROPS,
// 规范异样输入
System.err);
咱们部署架构中,将规范异样输入放到了一个很偏远的地位,根本没有人看,所以没留神到。。。查看规范异样输入,会发现确实有异样:
2021-08-10 03:30:29,810 Log4j2-TF-10-AsyncLoggerConfig-3 ERROR An exception occurred processing Appender file com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalArgumentException
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203)
at com.google.common.cache.LocalCache.get(LocalCache.java:3937)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941)
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824)
at com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4830)
at com.google.inject.internal.util.StackTraceElements.forMember(StackTraceElements.java:66)
at com.google.inject.internal.Errors.formatSource(Errors.java:806)
at com.google.inject.internal.Errors.formatSource(Errors.java:785)
at com.google.inject.internal.Errors.formatInjectionPoint(Errors.java:839)
at com.google.inject.internal.Errors.formatSource(Errors.java:800)
at com.google.inject.internal.Errors.formatSource(Errors.java:785)
at com.google.inject.internal.Errors.format(Errors.java:584)
at com.google.inject.ProvisionException.getMessage(ProvisionException.java:60)
at org.apache.logging.log4j.core.impl.ThrowableProxy.<init>(ThrowableProxy.java:105)
at org.apache.logging.log4j.core.impl.ThrowableProxy.<init>(ThrowableProxy.java:93)
at org.apache.logging.log4j.core.impl.Log4jLogEvent.getThrownProxy(Log4jLogEvent.java:629)
at org.apache.logging.log4j.core.pattern.ExtendedThrowablePatternConverter.format(ExtendedThrowablePatternConverter.java:63)
at org.springframework.boot.logging.log4j2.ExtendedWhitespaceThrowablePatternConverter.format(ExtendedWhitespaceThrowablePatternConverter.java:50)
at org.apache.logging.log4j.core.pattern.PatternFormatter.format(PatternFormatter.java:38)
at org.apache.logging.log4j.core.layout.PatternLayout$PatternSerializer.toSerializable(PatternLayout.java:345)
at org.apache.logging.log4j.core.layout.PatternLayout.toText(PatternLayout.java:244)
at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:229)
at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:59)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.directEncodeEvent(AbstractOutputStreamAppender.java:197)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutputStreamAppender.java:190)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:181)
at org.apache.logging.log4j.core.appender.RollingFileAppender.append(RollingFileAppender.java:312)
at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:156)
at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:129)
at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:120)
at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:84)
at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:543)
at org.apache.logging.log4j.core.async.AsyncLoggerConfig.callAppenders(AsyncLoggerConfig.java:127)
at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:502)
at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:485)
at org.apache.logging.log4j.core.async.AsyncLoggerConfig.log(AsyncLoggerConfig.java:121)
at org.apache.logging.log4j.core.async.AsyncLoggerConfig.logToAsyncLoggerConfigsOnCurrentThread(AsyncLoggerConfig.java:169)
at org.apache.logging.log4j.core.async.AsyncLoggerConfigDisruptor$Log4jEventWrapperHandler.onEvent(AsyncLoggerConfigDisruptor.java:111)
at org.apache.logging.log4j.core.async.AsyncLoggerConfigDisruptor$Log4jEventWrapperHandler.onEvent(AsyncLoggerConfigDisruptor.java:97)
at com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:168)
at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.IllegalArgumentException
at com.google.inject.internal.asm.$ClassReader.<init>(Unknown Source)
at com.google.inject.internal.asm.$ClassReader.<init>(Unknown Source)
at com.google.inject.internal.asm.$ClassReader.<init>(Unknown Source)
at com.google.inject.internal.util.LineNumbers.<init>(LineNumbers.java:65)
at com.google.inject.internal.util.StackTraceElements$1.load(StackTraceElements.java:46)
at com.google.inject.internal.util.StackTraceElements$1.load(StackTraceElements.java:43)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
... 41 more
并且,在这之后,会依据 Appender 的 ignoreExceptions 配置(默认都是 true),决定调用日志办法的中央是否会抛出异样,但这个是 针对同步日志的 ,异步日志行将异样抛到 Disruptor 的异样处理器,Log4j2 Disruptor 的异样解决也是将异样输入到 System.err
也就是规范异样输入。 默认状况下是不抛出的,毕竟对于同步日志没人心愿因为日志有异样就让业务不能失常进行,异步日志因为后面的解决曾经输入到规范异样输入这里就没必要多此一举了。
微信搜寻“我的编程喵”关注公众号,每日一刷,轻松晋升技术,斩获各种 offer: