乐趣区

关于hadoop:hadoop节点地址localhost问题

问题形容

hadoop 集群装置结束,在 yarn 的控制台显示节点 id 和节点地址都是 localhost

hadoop@master sbin]$ yarn node -list
20/12/17 12:21:19 INFO client.RMProxy: Connecting to ResourceManager at master/172.16.8.42:18040
Total Nodes:1
         Node-Id             Node-State Node-Http-Address       Number-of-Running-Containers
 localhost:43141                RUNNING    localhost:8042                                  0

提交作业时在 yarn 的日志中也打印出节点信息为 127.0.0.1,并且应用该 ip 作为节点 IP,必定连贯出错

2020-12-17 00:53:30,721 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting up container Container: [ContainerId: container_1607916354082_0008_01_000001, AllocationRequestId: 0, Version: 0, NodeId: localhost:43141, NodeHttpAddress: localhost:8042, Resource: <memory:2048, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 127.0.0.1:35845}, ExecutionType: GUARANTEED, ] for AM appattempt_1607916354082_0008_000001

020-12-17 00:56:30,801 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1607916354082_0008_000001. Got exception: java.net.ConnectException: Call From master/172.16.8.42 to localhost:43141 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
       at sun.reflect.GeneratedConstructorAccessor46.newInstance(Unknown Source)
       at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
       at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
       at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:827)
       at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:757)
       at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1553)
       at org.apache.hadoop.ipc.Client.call(Client.java:1495)
       at org.apache.hadoop.ipc.Client.call(Client.java:1394)
       at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
       at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)

问题起因

在 hadoop 的源码中,获取节点信息的代码如下

private NodeId buildNodeId(InetSocketAddress connectAddress,String hostOverride) {if (hostOverride != null) {
           connectAddress = NetUtils.getConnectAddress(new InetSocketAddress(hostOverride, connectAddress.getPort()));
       }
       return NodeId.newInstance(connectAddress.getAddress().getCanonicalHostName(),
               connectAddress.getPort());
   }

其中主机名是通过 connectAddress.getAddress().getCanonicalHostName() 进行获取,咱们晓得获取主机名还能够通过 getHostName 获取,那么这两种有什么区别?getCanonicalHostName 获取的是全域名,getHostName 获取的是主机名,比方主机名是 definesys 但可能 dns 下面配的域名是 definesys.com,getCanonicalHostName 就是通过 dns 进行解析获取全域名,实际上 getAddress 获取到的是 127.0.0.1,在 hosts 文件中是这样配置的

127.0.0.1     localhost       localhost.localdomain

因而解析成了 localhost

解决方案

在 hadoop 的举荐计划里是这么写的

  • If the error message says the remote service is on “127.0.0.1” or “localhost” that means the configuration file is telling the client that the service is on the local server. If your client is trying to talk to a remote system, then your configuration is broken.
  • Check that there isn’t an entry for your hostname mapped to 127.0.0.1 or 127.0.1.1 in /etc/hosts (Ubuntu is notorious for this).

翻译过去是倡议删除 127.0.0.1 和 127.0.1.1 在 hosts 中的配置,删除后恢复正常,问题解决。

退出移动版