动机前段时间的两个工作.一个是entity集群库, 可以通过entity_id调用任意节点上的entity. 一个是名字服务, 可以为一系列pid注册名字, 并可以以这些名字调用对应的pid.都会遇到同一些问题: 当我们使用GenServer.call/2时, 发生了什么, 会有什么异常情况发生? 哪些异常应该捕获? 以什么样的方式处理这些异常/错误?当call的pid所在的node崩溃时, 会有什么异常? 在调用开始前/中崩溃, 有什么不同.当call的pid所在的node网络突然中断呢? 会有什么表现?当call的pid崩溃时呢?是否应该捕获timeout?这些问题在文档中并没有答案. 所以, 探索一下.深挖实现源码版本erlang: OTP-21.0.9inside gen_server.erl callgen_server.erl:203%% —————————————————————–%% Make a call to a generic server.%% If the server is located at another node, that node will%% be monitored.%% If the client is trapping exits and is linked server termination%% is handled here (? Shall we do that here (or rely on timeouts) ?).%% —————————————————————– call(Name, Request) -> case catch gen:call(Name, ‘$gen_call’, Request) of {ok,Res} -> Res; {‘EXIT’,Reason} -> exit({Reason, {?MODULE, call, [Name, Request]}}) end.call(Name, Request, Timeout) -> case catch gen:call(Name, ‘$gen_call’, Request, Timeout) of {ok,Res} -> Res; {‘EXIT’,Reason} -> exit({Reason, {?MODULE, call, [Name, Request, Timeout]}}) end.gen.erl:160do_call(Process, Label, Request, Timeout) when is_atom(Process) =:= false -> Mref = erlang:monitor(process, Process), %% OTP-21: %% Auto-connect is asynchronous. But we still use ’noconnect’ to make sure %% we send on the monitored connection, and not trigger a new auto-connect. %% erlang:send(Process, {Label, {self(), Mref}, Request}, [noconnect]), receive {Mref, Reply} -> erlang:demonitor(Mref, [flush]), {ok, Reply}; {‘DOWN’, Mref, _, _, noconnection} -> Node = get_node(Process), exit({nodedown, Node}); {‘DOWN’, Mref, _, _, Reason} -> exit(Reason) after Timeout -> erlang:demonitor(Mref, [flush]), exit(timeout) end.可以看到, call一个process的过程:monitor processsend_msg to process and receive for reply.可能的情况有正常返回, demonitornoconnectionpid down for any reasontimeout那么, 前面的各种异常, 会对应到哪些情况呢? 有没有意外?先看看monitor一个process时到底做了什么.inside monitorerlang.erl:1291-type registered_name() :: atom().-type registered_process_identifier() :: registered_name() | {registered_name(), node()}.-type monitor_process_identifier() :: pid() | registered_process_identifier().-type monitor_port_identifier() :: port() | registered_name().%% monitor/2-spec monitor (process, monitor_process_identifier()) -> MonitorRef when MonitorRef :: reference(); (port, monitor_port_identifier()) -> MonitorRef when MonitorRef :: reference(); (time_offset, clock_service) -> MonitorRef when MonitorRef :: reference().monitor(_Type, _Item) -> erlang:nif_error(undefined).在monitor process时, 可以是一个pid, name, name node tuple. 但这里没有具体实现, 找一下nif.inside receive
...