关于hadoop:深入浅出-Yarn-架构与实现33-Yarn-Application-Master-编写

434次阅读

共计 4659 个字符，预计需要花费 12 分钟才能阅读完成。

本篇文章持续介绍 Yarn Application 中 ApplicationMaster 局部的编写办法。

上一节讲了 Client 提交工作给 RM 的全流程，RM 收到工作后，由 ApplicationsManager 向 NM 申请 Container，并依据 Client 提供的 ContainerLaunchContext 启动 ApplicationMaster。
本篇代码已上传 Github：
Github – MyApplicationMaster

在 AM 中须要别离启动 NMClient 和 RMClient 进行通信。
两个客户端中都注册了咱们自定义的 eventHandler，将会在前面进行介绍。
在 amRMClient 中会定义 AM 向 RM 定时发送心跳的距离。（在 RM 中会有心跳容忍工夫，留神不要超过 RM 配置的工夫）

 // logInformation();
Configuration conf = new Configuration();
 
// 1 create amRMClient
// 第一个参数是心跳工夫 ms
amRMClient = AMRMClientAsync.createAMRMClientAsync(1000, new RMCallbackHandler());
amRMClient.init(conf);
amRMClient.start();
 
// 2 Create nmClientAsync
amNMClient = new NMClientAsyncImpl(new NMCallbackHandler());
amNMClient.init(conf);
amNMClient.start();

 // 3 register with RM and this will heart beating to RM
RegisterApplicationMasterResponse response = amRMClient
                .registerApplicationMaster(NetUtils.getHostname(), -1, "");

首先须要从 response 中确认资源池残余资源，而后再依据需要申请 container

 // 4 Request containers
response.getContainersFromPreviousAttempts();
 
// 4.1 check resource
long maxMem = response.getMaximumResourceCapability().getMemorySize();
int maxVCores = response.getMaximumResourceCapability().getVirtualCores();
 
// 4.2 request containers base on avail resource
for (int i = 0; i < numTotalContainers.get(); i++) {
    ContainerRequest containerAsk = new ContainerRequest(
            //100*10M + 1vcpu
            Resource.newInstance(100, 1), null, null,
            Priority.newInstance(0));
    amRMClient.addContainerRequest(containerAsk);
}

将在 RMCallbackHandler 中的 onContainersAllocated 回调函数中解决，并在其中调用 NMCallbackHandler 的办法，执行对应的 task。
（RMCallbackHandler、NMCallbackHandler 将在前面进行具体介绍。）

 // RMCallbackHandler
public void onContainersAllocated(List<Container> containers) {for (Container c : containers) {log.info("Container Allocated, id =" + c.getId() + ", containerNode =" + c.getNodeId());
        // LaunchContainerTask 实现在上面
        exeService.submit(new LaunchContainerTask(c));
    }
}
 
private class LaunchContainerTask implements Runnable {
    @Override
    public void run() {
        // ……
        // 发送事件交给 nm 解决
        amNMClient.startContainerAsync(container, ctx);
    }
}

当全副子工作实现后，须要做收尾工作，将 amNMClient 和 amRMClient 进行。

 while(numTotalContainers.get() != numCompletedContainers.get()){
    try{Thread.sleep(1000);
        log.info("waitComplete" +
                ", numTotalContainers=" + numTotalContainers.get() +
                ", numCompletedConatiners=" + numCompletedContainers.get());
    } catch (InterruptedException ex){}}
log.info("ShutDown exeService Start");
exeService.shutdown();
log.info("ShutDown exeService Complete");
amNMClient.stop();
log.info("amNMClient stop Complete");
amRMClient.unregisterApplicationMaster(FinalApplicationStatus.SUCCEEDED, "dummy Message", null);
log.info("unregisterApplicationMaster Complete");
amRMClient.stop();
log.info("amRMClient stop Complete");

实质是个 eventHandler，对事件库不相熟的同学能够翻之前的文章「2-3 Yarn 根底库 – 服务库与事件库」进行学习。
其会解决 Container 启动、进行、更新等事件。
收到不同的事件时，会执行相应的回调函数。这里仅给出两个函数的实现。

💡 思考：之前版本中（2.6 之前）还是实现 CallbackHandler 接口，为何前面改为了抽象类？
A：对原接口有了扩大减少了办法 onContainersUpdated。揣测是因为防止应用接口继承。

 private class RMCallbackHandler extends AMRMClientAsync.AbstractCallbackHandler {
    @Override
    public void onContainersCompleted(List<ContainerStatus> statuses) {for (ContainerStatus status : statuses) {log.info("Container completed:" + status.getContainerId().toString()
                    + "exitStatus=" + status.getExitStatus());
            if (status.getExitStatus() != 0) {log.error("Container return error status:" + status.getExitStatus());
                log.warn("Need rerun container!");
                // do something restart container
                continue;
            }
            ContainerId containerId = status.getContainerId();
            runningContainers.remove(containerId);
            numCompletedContainers.addAndGet(1);
        }
    }
    
    @Override
    // 这里在 container 中启动相应的 task
    public void onContainersAllocated(List<Container> containers) {for (Container c : containers) {log.info("Container Allocated, id =" + c.getId() + ", containerNode =" + c.getNodeId());
            // LaunchContainerTask 实现在上面
            exeService.submit(new LaunchContainerTask(c));
        }
    }
    // 其余办法实现…… 
}
        
 
private class LaunchContainerTask implements Runnable {
    Container container;
    public LaunchContainerTask(Container container) {this.container = container;}
    
    @Override
    public void run() {LinkedList<String> commands = new LinkedList<>();
        commands.add("sleep" + sleepSeconds.addAndGet(1));
        ContainerLaunchContext ctx = ContainerLaunchContext.newInstance(null, null, commands, null, null, null);
        // 这里去执行 amNMClient 的回调
        amNMClient.startContainerAsync(container, ctx);
    }
}

定义 nm container 须要执行的各种事件处理。

 private class NMCallbackHandler extends NMClientAsync.AbstractCallbackHandler {
    @Override
    public void onContainerStarted(ContainerId containerId, Map<String, ByteBuffer> allServiceResponse) {log.info("Container Stared" + containerId.toString());
    }
    
    // ……

AM 与 RM

AM 与 NM

至此咱们学习了编写 Yarn Application 的整体流程和实现办法，置信各位同学对其有了更深的意识。之后能够从 hadoop 提供的 DistributedShell 动手，再到其余框架（Hive、Flink）等探索工业级框架是如何提交 Application 的。

参考文章：
Hadoop Doc: Writing an ApplicationMaster (AM)
《Hadoop 技术底细 – 深刻解析 Yarn 结构设计与实现原理》第四章

正文完

hadoop

发表至： hadoop

2022-11-18

0

关于hadoop:Hadoop集群的部署二

关于hadoop:Hadoop-入门笔记-十-二-HDFS-Federation联邦机制

PySpark-SQL-相关知识介绍

hive小结

关于vue.js:P14-vue3-向子组件传值-Prop

关于hadoop:深入浅出-Yarn-架构与实现33-Yarn-Application-Master-编写

一、Application Master 编写办法

一）整体流程

1&2、启动 NMClient 和 RMClient

3、向 RM 注册 ApplicationMaster

4、申请 Containers

5、运行工作

6、结束任务

二）NMClient 和 RMClient Callback Handler 编写

1、RMCallbackHandler

2、NMCallbackHandler

三）波及的通信协议

二、小结

Just My Socks（注册教程内含优惠码）

	// logInformation();
	Configuration conf = new Configuration();

	// 1 create amRMClient
	// 第一个参数是心跳工夫 ms
	amRMClient = AMRMClientAsync.createAMRMClientAsync(1000, new RMCallbackHandler());
	amRMClient.init(conf);
	amRMClient.start();

	// 2 Create nmClientAsync
	amNMClient = new NMClientAsyncImpl(new NMCallbackHandler());
	amNMClient.init(conf);
	amNMClient.start();

	// 3 register with RM and this will heart beating to RM
	RegisterApplicationMasterResponse response = amRMClient
	.registerApplicationMaster(NetUtils.getHostname(), -1, "");

	// 4 Request containers
	response.getContainersFromPreviousAttempts();

	// 4.1 check resource
	long maxMem = response.getMaximumResourceCapability().getMemorySize();
	int maxVCores = response.getMaximumResourceCapability().getVirtualCores();

	// 4.2 request containers base on avail resource
	for (int i = 0; i < numTotalContainers.get(); i++) {
	ContainerRequest containerAsk = new ContainerRequest(
	//100*10M + 1vcpu
	Resource.newInstance(100, 1), null, null,
	Priority.newInstance(0));
	amRMClient.addContainerRequest(containerAsk);
	}

	// RMCallbackHandler
	public void onContainersAllocated(List<Container> containers) {for (Container c : containers) {log.info("Container Allocated, id =" + c.getId() + ", containerNode =" + c.getNodeId());
	// LaunchContainerTask 实现在上面
	exeService.submit(new LaunchContainerTask(c));
	}
	}

	private class LaunchContainerTask implements Runnable {
	@Override
	public void run() {
	// ……
	// 发送事件交给 nm 解决
	amNMClient.startContainerAsync(container, ctx);
	}
	}

	while(numTotalContainers.get() != numCompletedContainers.get()){
	try{Thread.sleep(1000);
	log.info("waitComplete" +
	", numTotalContainers=" + numTotalContainers.get() +
	", numCompletedConatiners=" + numCompletedContainers.get());
	} catch (InterruptedException ex){}}
	log.info("ShutDown exeService Start");
	exeService.shutdown();
	log.info("ShutDown exeService Complete");
	amNMClient.stop();
	log.info("amNMClient stop Complete");
	amRMClient.unregisterApplicationMaster(FinalApplicationStatus.SUCCEEDED, "dummy Message", null);
	log.info("unregisterApplicationMaster Complete");
	amRMClient.stop();
	log.info("amRMClient stop Complete");

	private class RMCallbackHandler extends AMRMClientAsync.AbstractCallbackHandler {
	@Override
	public void onContainersCompleted(List<ContainerStatus> statuses) {for (ContainerStatus status : statuses) {log.info("Container completed:" + status.getContainerId().toString()
	+ "exitStatus=" + status.getExitStatus());
	if (status.getExitStatus() != 0) {log.error("Container return error status:" + status.getExitStatus());
	log.warn("Need rerun container!");
	// do something restart container
	continue;
	}
	ContainerId containerId = status.getContainerId();
	runningContainers.remove(containerId);
	numCompletedContainers.addAndGet(1);
	}
	}

	@Override
	// 这里在 container 中启动相应的 task
	public void onContainersAllocated(List<Container> containers) {for (Container c : containers) {log.info("Container Allocated, id =" + c.getId() + ", containerNode =" + c.getNodeId());
	// LaunchContainerTask 实现在上面
	exeService.submit(new LaunchContainerTask(c));
	}
	}
	// 其余办法实现……
	}


	private class LaunchContainerTask implements Runnable {
	Container container;
	public LaunchContainerTask(Container container) {this.container = container;}

	@Override
	public void run() {LinkedList<String> commands = new LinkedList<>();
	commands.add("sleep" + sleepSeconds.addAndGet(1));
	ContainerLaunchContext ctx = ContainerLaunchContext.newInstance(null, null, commands, null, null, null);
	// 这里去执行 amNMClient 的回调
	amNMClient.startContainerAsync(container, ctx);
	}
	}

	private class NMCallbackHandler extends NMClientAsync.AbstractCallbackHandler {
	@Override
	public void onContainerStarted(ContainerId containerId, Map<String, ByteBuffer> allServiceResponse) {log.info("Container Stared" + containerId.toString());
	}

	// ……

关于hadoop:深入浅出-Yarn-架构与实现33-Yarn-Application-Master-编写

一、Application Master 编写办法

一）整体流程

1&2、启动 NMClient 和 RMClient

3、向 RM 注册 ApplicationMaster

4、申请 Containers

5、运行工作

6、结束任务

二）NMClient 和 RMClient Callback Handler 编写

1、RMCallbackHandler

2、NMCallbackHandler

三）波及的通信协议

二、小结

Just My Socks（注册教程 内含优惠码）

Just My Socks（注册教程内含优惠码）