乐趣区

关于java:排查高并发下线程池假死的情况

排查高并发下线程池假死的状况

问题形容

我的项目中应用到线程池,该线程池次要工作就是通过 HttpClient 发送 http 申请。在网易云音乐环境会偶发性的呈现这个问题,在严选环境先更重大些。(严选环境并发压力更大一点)。只能通过哨兵始终监控线程池大小,超过阈值时重启服务。这样做会失落大量须要长久化的 SQL 查问。在严选上尤为重大,三天两头须要重启。

初步狐疑

  • 线程死锁
  • HttpClient 未设置连贯超时

进一步狐疑论证

首先是查看日志文件

发现日志文件没有报错,线程池工作数逐步增大,出现一种假死的景象

再浏览我的项目代码,剖析可能会呈现假死的状况,排查死锁状况

通过浏览代码排除死锁和未设置连贯超时的状况

通过 jstack 查看堆栈信息

通过哨兵监控线程池大小,超过阈值时,先导出 jstack 文件,再重启服务

"receive-thread-pool-79962" #275147 prio=5 os_prio=0 tid=0x0000000033134000 nid=0x60e28 in Object.wait() [0x00007fc3e63d8000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:509)
- locked <0x00000000b95165b0> (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:394)
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:152)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:324)
at com.netease.impala.util.HttpUtil.httpRequest(HttpUtil.java:55)
at com.netease.impala.util.HttpUtil.getRequest(HttpUtil.java:38)
at com.netease.impala.service.ImpalaService.getThriftInfo(ImpalaService.java:96)
at com.netease.impala.service.ImpalaService.getDetailInfo(ImpalaService.java:64)
at com.netease.impala.service.RecordService.getQueryInfo(RecordService.java:294)
at com.netease.impala.service.RecordService.access$000(RecordService.java:27)
at com.netease.impala.service.RecordService$1.run(RecordService.java:60)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

发现线程都梗塞在 MultiThreadedHttpConnectionManager.doGetConnection 办法

再浏览 doGetConnection 办法的源码发现在未将连接池

在未设置从连接池中获取链接的超时,doGetConnection 会陷入循环。因为应用了 MultiThreadedHttpConnectionManager 治理连接池后,除了须要设置连贯超时和 socket 超时,还须要设置获取连贯超时,至此解决了我的项目假死日志不报错的起因,也就是获取不到连贯却没有超时。回去再去浏览我的项目代码发现 未手动将连贯偿还至连接池

手动偿还的代码如下:

  method.releaseConnection();
private HttpConnection doGetConnection(HostConfiguration hostConfiguration, 
    long timeout) throws ConnectionPoolTimeoutException {

    HttpConnection connection = null;

    int maxHostConnections = this.params.getMaxConnectionsPerHost(hostConfiguration);
    int maxTotalConnections = this.params.getMaxTotalConnections();
    
    synchronized (connectionPool) {

        // we clone the hostConfiguration
        // so that it cannot be changed once the connection has been retrieved
        hostConfiguration = new HostConfiguration(hostConfiguration);
        HostConnectionPool hostPool = connectionPool.getHostPool(hostConfiguration, true);
        WaitingThread waitingThread = null;

        boolean useTimeout = (timeout > 0);
        long timeToWait = timeout;
        long startWait = 0;
        long endWait = 0;

        while (connection == null) {if (shutdown) {throw new IllegalStateException("Connection factory has been shutdown.");
            }
            
            // happen to have a free connection with the right specs
            //
            if (hostPool.freeConnections.size() > 0) {connection = connectionPool.getFreeConnection(hostConfiguration);

            // have room to make more
            //
            } else if ((hostPool.numConnections < maxHostConnections) 
                && (connectionPool.numConnections < maxTotalConnections)) {connection = connectionPool.createConnection(hostConfiguration);

            // have room to add host connection, and there is at least one free
            // connection that can be liberated to make overall room
            //
            } else if ((hostPool.numConnections < maxHostConnections) 
                && (connectionPool.freeConnections.size() > 0)) {connectionPool.deleteLeastUsedConnection();
                connection = connectionPool.createConnection(hostConfiguration);

            // otherwise, we have to wait for one of the above conditions to
            // become true
            //
            } else {
                // TODO: keep track of which hostConfigurations have waiting
                // threads, so they avoid being sacrificed before necessary

                try {if (useTimeout && timeToWait <= 0) {throw new ConnectionPoolTimeoutException("Timeout waiting for connection");
                    }
                    
                    if (LOG.isDebugEnabled()) {LOG.debug("Unable to get a connection, waiting..., hostConfig=" + hostConfiguration);
                    }
                    
                    if (waitingThread == null) {waitingThread = new WaitingThread();
                        waitingThread.hostConnectionPool = hostPool;
                        waitingThread.thread = Thread.currentThread();} else {waitingThread.interruptedByConnectionPool = false;}
                                
                    if (useTimeout) {startWait = System.currentTimeMillis();
                    }
                    
                    hostPool.waitingThreads.addLast(waitingThread);
                    connectionPool.waitingThreads.addLast(waitingThread);
                    connectionPool.wait(timeToWait);
                } catch (InterruptedException e) {if (!waitingThread.interruptedByConnectionPool) {LOG.debug("Interrupted while waiting for connection", e);
                        throw new IllegalThreadStateException("Interrupted while waiting in MultiThreadedHttpConnectionManager");
                    }
                    // Else, do nothing, we were interrupted by the connection pool
                    // and should now have a connection waiting for us, continue
                    // in the loop and let's get it.
                } finally {if (!waitingThread.interruptedByConnectionPool) {
                        // Either we timed out, experienced a "spurious wakeup", or were
                        // interrupted by an external thread.  Regardless we need to 
                        // cleanup for ourselves in the wait queue.
                        hostPool.waitingThreads.remove(waitingThread);
                        connectionPool.waitingThreads.remove(waitingThread);
                    }
                    
                    if (useTimeout) {endWait = System.currentTimeMillis();
                        timeToWait -= (endWait - startWait);
                    }
                }
            }
        }
    }
    return connection;
}

浏览连贯偿还逻辑发现只有在,Response 被失常生产时能力被主动偿还。具体逻辑如下。

private InputStream readResponseBody(HttpConnection conn)
    throws HttpException, IOException {LOG.trace("enter HttpMethodBase.readResponseBody(HttpConnection)");

    responseBody = null;
    InputStream is = conn.getResponseInputStream();
    if (Wire.CONTENT_WIRE.enabled()) {is = new WireLogInputStream(is, Wire.CONTENT_WIRE);
    }
    boolean canHaveBody = canResponseHaveBody(statusLine.getStatusCode());
    InputStream result = null;
    Header transferEncodingHeader = responseHeaders.getFirstHeader("Transfer-Encoding");
    // We use Transfer-Encoding if present and ignore Content-Length.
    // RFC2616, 4.4 item number 3
    if (transferEncodingHeader != null) {String transferEncoding = transferEncodingHeader.getValue();
        if (!"chunked".equalsIgnoreCase(transferEncoding) 
            && !"identity".equalsIgnoreCase(transferEncoding)) {if (LOG.isWarnEnabled()) {LOG.warn("Unsupported transfer encoding:" + transferEncoding);
            }
        }
        HeaderElement[] encodings = transferEncodingHeader.getElements();
        // The chunked encoding must be the last one applied
        // RFC2616, 14.41
        int len = encodings.length;            
        if ((len > 0) && ("chunked".equalsIgnoreCase(encodings[len - 1].getName()))) { 
            // if response body is empty
            if (conn.isResponseAvailable(conn.getParams().getSoTimeout())) {result = new ChunkedInputStream(is, this);
            } else {if (getParams().isParameterTrue(HttpMethodParams.STRICT_TRANSFER_ENCODING)) {throw new ProtocolException("Chunk-encoded body declared but not sent");
                } else {LOG.warn("Chunk-encoded body missing");
                }
            }
        } else {LOG.info("Response content is not chunk-encoded");
            // The connection must be terminated by closing 
            // the socket as per RFC 2616, 3.6
            setConnectionCloseForced(true);
            result = is;  
        }
    } else {long expectedLength = getResponseContentLength();
        if (expectedLength == -1) {if (canHaveBody && this.effectiveVersion.greaterEquals(HttpVersion.HTTP_1_1)) {Header connectionHeader = responseHeaders.getFirstHeader("Connection");
                String connectionDirective = null;
                if (connectionHeader != null) {connectionDirective = connectionHeader.getValue();
                }
                if (!"close".equalsIgnoreCase(connectionDirective)) {LOG.info("Response content length is not known");
                    setConnectionCloseForced(true);
                }
            }
            result = is;            
        } else {result = new ContentLengthInputStream(is, expectedLength);
        }
    } 

    // See if the response is supposed to have a response body
    if (!canHaveBody) {result = null;}
    // if there is a result - ALWAYS wrap it in an observer which will
    // close the underlying stream as soon as it is consumed, and notify
    // the watcher that the stream has been consumed.
    if (result != null) {

        result = new AutoCloseInputStream(
            result,
            new ResponseConsumedWatcher() {public void responseConsumed() {responseBodyConsumed();
                }
            }
        );
    }

    return result;
}

网上解答

再排查问题的时候也发现网上有相似的问题,倡议调大 xHostConnections and maxTotalConnections。

Your threads are waiting on synchronized (connectionPool) monitor in
MultiThreadedHttpConnectionManager.doGetConnection which isn’t responsible for interruption. According to the documentation of getConnectionWithTimeout increasing number of maxHostConnections and maxTotalConnections can help. It’s also possible to specify timeout value in http.connection-manager.timeout which is 0 by default so threads are waiting for connection indefinitely.

总结

此次排查根本性的解决了我的项目中 老大难 的问题,为治理服务器稳固部署在严选环境提供了保障。对 jstack,线程池,连接池,HTTP 协定有肯定的理解。

退出移动版