简述
erlang 的 cowboy 是一个 web server 框架。它在客户端提前断开(nginx http code 499)时,会间接杀掉handler过程。这很容易造成bug。
示例代码
参考 https://ninenines.eu/docs/en/...
有handler代码如下:
-module(hello_handler).-behavior(cowboy_handler).-export([init/2]).init(Req, State) -> erlang:display("before_sleep"), timer:sleep(3000), erlang:display("after_sleep"), Req = cowboy_req:reply( 200, #{<<"content-type">> => <<"text/plain">>}, <<"Hello Erlang!">>, Req ), {ok, Req, State}.
在
curl http://localhost:8080
时,有输入:
([email protected])1> "before_sleep""after_sleep"
如果
curl http://localhost:8080 --max-time 0.001curl: (28) Resolving timed out after 4 milliseconds
有输入:
([email protected])1> "before_sleep"
这个阐明handler过程的执行被抢行掐断了。如果代码中有对过程内部资源的拜访,比方加锁,显然会造成锁开释问题。
问题起因
见 cowboy_http.erl:loop
loop(State=#state{parent=Parent, socket=Socket, transport=Transport, opts=Opts, buffer=Buffer, timer=TimerRef, children=Children, in_streamid=InStreamID, last_streamid=LastStreamID}) -> Messages = Transport:messages(), InactivityTimeout = maps:get(inactivity_timeout, Opts, 300000), receive %% Discard data coming in after the last request %% we want to process was received fully. {OK, Socket, _} when OK =:= element(1, Messages), InStreamID > LastStreamID -> loop(State); %% Socket messages. {OK, Socket, Data} when OK =:= element(1, Messages) -> parse(<< Buffer/binary, Data/binary >>, State); {Closed, Socket} when Closed =:= element(2, Messages) -> terminate(State, {socket_error, closed, 'The socket has been closed.'}); {Error, Socket, Reason} when Error =:= element(3, Messages) -> terminate(State, {socket_error, Reason, 'An error has occurred on the socket.'}); {Passive, Socket} when Passive =:= element(4, Messages); %% Hardcoded for compatibility with Ranch 1.x. Passive =:= tcp_passive; Passive =:= ssl_passive -> setopts_active(State), loop(State); %% Timeouts.
最终会通过发送exit音讯形式,杀掉children过程。
-spec terminate(children()) -> ok.terminate(Children) -> %% For each child, either ask for it to shut down, %% or cancel its shutdown timer if it already is. %% %% We do not need to flush stray timeout messages out because %% we are either terminating or switching protocols, %% and in the latter case we flush all messages. _ = [case TRef of undefined -> exit(Pid, shutdown); _ -> erlang:cancel_timer(TRef, [{async, true}, {info, false}]) end || #child{pid=Pid, timer=TRef} <- Children], before_terminate_loop(Children).
因为children没有trap exit,在没有任何日志输入,任何机会解决的状况下退出了。
总结
因为cowboy在对端断开时,会间接杀掉handler过程,这个很容易造成bug。能够应用nginx的 proxy_ignore_client_abort on。让客户端断开不传递至后端,从而躲避这个问题。