简述

erlang 的 cowboy 是一个 web server 框架。它在客户端提前断开(nginx http code 499)时,会间接杀掉handler过程。这很容易造成bug。

示例代码

参考 https://ninenines.eu/docs/en/...
有handler代码如下:

-module(hello_handler).-behavior(cowboy_handler).-export([init/2]).init(Req, State) ->      erlang:display("before_sleep"),      timer:sleep(3000),      erlang:display("after_sleep"),    Req = cowboy_req:reply(        200,        #{<<"content-type">> => <<"text/plain">>},        <<"Hello Erlang!">>,        Req    ),    {ok, Req, State}.

curl http://localhost:8080

时,有输入:

([email protected])1> "before_sleep""after_sleep"

如果

curl http://localhost:8080 --max-time 0.001curl: (28) Resolving timed out after 4 milliseconds

有输入:

([email protected])1> "before_sleep"

这个阐明handler过程的执行被抢行掐断了。如果代码中有对过程内部资源的拜访,比方加锁,显然会造成锁开释问题。

问题起因

见 cowboy_http.erl:loop

loop(State=#state{parent=Parent, socket=Socket, transport=Transport, opts=Opts,        buffer=Buffer, timer=TimerRef, children=Children, in_streamid=InStreamID,        last_streamid=LastStreamID}) ->    Messages = Transport:messages(),    InactivityTimeout = maps:get(inactivity_timeout, Opts, 300000),    receive        %% Discard data coming in after the last request        %% we want to process was received fully.        {OK, Socket, _} when OK =:= element(1, Messages), InStreamID > LastStreamID ->            loop(State);        %% Socket messages.        {OK, Socket, Data} when OK =:= element(1, Messages) ->            parse(<< Buffer/binary, Data/binary >>, State);        {Closed, Socket} when Closed =:= element(2, Messages) ->            terminate(State, {socket_error, closed, 'The socket has been closed.'});        {Error, Socket, Reason} when Error =:= element(3, Messages) ->            terminate(State, {socket_error, Reason, 'An error has occurred on the socket.'});        {Passive, Socket} when Passive =:= element(4, Messages);                %% Hardcoded for compatibility with Ranch 1.x.                Passive =:= tcp_passive; Passive =:= ssl_passive ->            setopts_active(State),            loop(State);        %% Timeouts.

最终会通过发送exit音讯形式,杀掉children过程。

-spec terminate(children()) -> ok.terminate(Children) ->    %% For each child, either ask for it to shut down,    %% or cancel its shutdown timer if it already is.    %%    %% We do not need to flush stray timeout messages out because    %% we are either terminating or switching protocols,    %% and in the latter case we flush all messages.    _ = [case TRef of        undefined -> exit(Pid, shutdown);        _ -> erlang:cancel_timer(TRef, [{async, true}, {info, false}])    end || #child{pid=Pid, timer=TRef} <- Children],    before_terminate_loop(Children).

因为children没有trap exit,在没有任何日志输入,任何机会解决的状况下退出了。

总结

因为cowboy在对端断开时,会间接杀掉handler过程,这个很容易造成bug。能够应用nginx的 proxy_ignore_client_abort on。让客户端断开不传递至后端,从而躲避这个问题。