一、前期准备
本篇是在上一篇 https://segmentfault.com/a/11… 的基础上进行的操作。
1.1 修改 / 添加部分
#修改主机名
hostnamectl set-hostname hadoop104
#ssh 免密钥登陆配置
## 删除现有的 ssh 信息
[admin@hadoop104 ~]$ cd ~/.ssh
[admin@hadoop104 .ssh]$ rm -rf *
## 然后不输入密码(直接按三次回车)生成私钥和公钥
[admin@hadoop104 .ssh]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/admin/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/admin/.ssh/id_rsa.
Your public key has been saved in /home/admin/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:kjL6k939tz4wgrdYIlA7/r5EgzGVJ12YlQB7BfkNMSU admin@hadoop104
The key’s randomart image is:
+—[RSA 2048]—-+
| o+oOE+. |
| ..o.*.oo |
| .o..o.. o |
| . o= . . . |
| oo+.S. |
| . oooo.+ o |
| . o +.* o o |
| .o ..+ o o |
| .. .o. ..ooo |
+—-[SHA256]—–+
[admin@hadoop104 .ssh]$ ll
总用量 8
-rw——- 1 admin admin 1675 4 月 2 21:26 id_rsa #id_rsa 为私钥文件
-rw-r–r– 1 admin admin 397 4 月 2 21:26 id_rsa.pub #id_rsa.pub 为公钥文件
## 将公钥发送给从节点 hadoop104
[admin@hadoop104 .ssh]$ ssh-copy-id hadoop104
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: “/home/admin/.ssh/id_rsa.pub”
The authenticity of host ‘hadoop104 (192.168.119.104)’ can’t be established.
ECDSA key fingerprint is SHA256:X25gXFFr2vsKVxn7LLOpQtYBb1OHOmRGj9XmJpQQ9Vs.
ECDSA key fingerprint is MD5:d6:55:be:36:9b:b6:33:f7:4d:75:5a:c5:40:89:a1:7c.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed — if you are prompted now it is to install the new keys
admin@hadoop104’s password:
Number of key(s) added: 1
Now try logging into the machine, with: “ssh ‘hadoop104′”
and check to make sure that only the key(s) you wanted were added.
## 然后就可以了
[admin@hadoop104 .ssh]$ ssh hadoop104
Last login: Thu Apr 4 10:48:25 2019 from hadoop104
为“二、实际操作 -2.1 HDFS 上运行 MapReduce 程序”进行的配置,配置完请先进行 2.1 HDFS 上运行 MapReduce 程序 操作。
# core-site.xml
vi /opt/module/hadoop-3.1.1/etc/hadoop/core-site.xml
<!– 指定 HDFS 中 NameNode 的地址 –>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop104:9000</value>
</property>
<!– 指定 hadoop 运行时产生文件的存储目录 –>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-3.1.1/data/tmp</value>
</property>
# hdfs-site.xml
vi /opt/module/hadoop-3.1.1/etc/hadoop/hdfs-site.xml
<!– 指定 HDFS 副本的数量 –>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
为“二、实际操作 -2.2 YARN 上运行 MapReduce 程序”进行的配置,配置完进行 2.2 YARN 上运行 MapReduce 程序 操作。
#配置 yarn-site.xml
<!– reducer 获取数据的方式 –>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!– 指定 YARN 的 ResourceManager 的地址 –>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop104</value>
</property>
#配置 mapred-site.xml
<!– 指定 mr 运行在 yarn 上 –>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
二、实际操作
2.1 HDFS 上运行 MapReduce 程序
#格式化 namenode(第一次启动时格式化,以后不要总格式化)
[admin@centos104 hadoop-3.1.1]$ bin/hdfs namenode -format
#启动
[admin@hadoop104 hadoop-3.1.1]$ sbin/start-dfs.sh
#查看
[admin@hadoop104 hadoop-3.1.1]$ jps
14448 NameNode
14769 SecondaryNameNode
14571 DataNode
14892 Jps
#浏览器查看 HDFS 文件系统,Hadoop3.0 中 namenode 的默认端口配置发生变化:从 50070 改为 9870
http://192.168.119.104:9870/dfshealth.html#tab-overview 或者
http://hadoop104:9870/dfshealth.html#tab-overview (本地 windows 需要配 hosts)
# 在 hdfs 文件系统上创建一个 input 文件夹
[admin@hadoop104 hadoop-3.1.1]$ bin/hdfs dfs -mkdir -p /user/qianxkun/mapreduce/wordcount/input
#将测试文件内容上传到文件系统上
[admin@hadoop104 hadoop-3.1.1]$ bin/hdfs dfs -put wcinput/wc.input /user/qianxkun/mapreduce/wordcount/input/
#查看上传的文件
[admin@hadoop104 hadoop-3.1.1]$ bin/hdfs dfs -ls /user/qianxkun/mapreduce/wordcount/input/
Found 1 items
-rw-r–r– 1 admin supergroup 47 2019-04-04 16:07 /user/qianxkun/mapreduce/wordcount/input/wc.input
[admin@hadoop104 hadoop-3.1.1]$ bin/hdfs dfs -cat /user/qianxkun/mapreduce/wordcount/input/wc.input
hadoop yarn
hadoop mapreduce
qianxkun
qianxkun
#在 HDFS 上运行 mapreduce 程序
[admin@hadoop104 hadoop-3.1.1]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.jar wordcount /user/qianxkun/mapreduce/wordcount/input/ /user/qianxkun/mapreduce/wordcount/output
查看输出结果
命令行查看
[admin@hadoop104 hadoop-3.1.1]$ bin/hdfs dfs -cat /user/qianxkun/mapreduce/wordcount/output/*
hadoop 2
mapreduce 1
qianxkun 2
yarn 1
浏览器查看
# 将测试文件内容下载到本地
[admin@hadoop104 hadoop-3.1.1]$ hadoop fs -get /user/qianxkun/mapreduce/wordcount/output/part-r-00000 ./wcoutput/
#删除输出结果
[admin@hadoop104 hadoop-3.1.1]$ hdfs dfs -rm -r /user/qianxkun/mapreduce/wordcount/output
2.2 YARN 上运行 MapReduce 程序
#启动
[admin@hadoop104 hadoop-3.1.1]$ sbin/start-yarn.sh
#查看
[admin@hadoop104 hadoop-3.1.1]$ jps
14448 NameNode
14769 SecondaryNameNode
15939 ResourceManager
16374 Jps
14571 DataNode
16063 NodeManager
#yarn 的浏览器页面查看
http://192.168.119.104:8088/cluster 或者
http://hadoop104:8088/cluster
# 删除文件系统上的 output 文件
[admin@hadoop104 hadoop-3.1.1]$ bin/hdfs dfs -rm -R /user/qianxkun/mapreduce/wordcount/output
Deleted /user/qianxkun/mapreduce/wordcount/output
#执行 mapreduce 程序
[admin@hadoop104 hadoop-3.1.1]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.jar wordcount /user/qianxkun/mapreduce/wordcount/input /user/qianxkun/mapreduce/wordcount/output
## 报错 ” 错误: 找不到或无法加载主类 org.apache.hadoop.mapreduce.v2.app.MRAppMaster”
## 解决
### 停止 yarn
[admin@hadoop104 hadoop-3.1.1]$ sbin/stop-yarn.sh
### 将 hadoop classpath 下内容配置到 yarn-site.xml 文件中
[admin@hadoop104 hadoop-3.1.1]$ hadoop classpath
/opt/module/hadoop-3.1.1/etc/hadoop:/opt/module/hadoop-3.1.1/share/hadoop/common/lib/*:/opt/module/hadoop-3.1.1/share/hadoop/common/*:/opt/module/hadoop-3.1.1/share/hadoop/hdfs:/opt/module/hadoop-3.1.1/share/hadoop/hdfs/lib/*:/opt/module/hadoop-3.1.1/share/hadoop/hdfs/*:/opt/module/hadoop-3.1.1/share/hadoop/mapreduce/lib/*:/opt/module/hadoop-3.1.1/share/hadoop/mapreduce/*:/opt/module/hadoop-3.1.1/share/hadoop/yarn:/opt/module/hadoop-3.1.1/share/hadoop/yarn/lib/*:/opt/module/hadoop-3.1.1/share/hadoop/yarn/*
[admin@hadoop104 hadoop-3.1.1]$ vi etc/hadoop/yarn-site.xml
<property>
<name>yarn.application.classpath</name>
<value>
/opt/module/hadoop-3.1.1/etc/hadoop:/opt/module/hadoop-3.1.1/share/hadoop/common/lib/*:/opt/module/hadoop-3.1.1/share/hadoop/common/*:/opt/module/hadoop-3.1.1/share/hadoop/hdfs:/opt/module/hadoop-3.1.1/share/hadoop/hdfs/lib/*:/opt/module/hadoop-3.1.1/share/hadoop/hdfs/*:/opt/module/hadoop-3.1.1/share/hadoop/mapreduce/lib/*:/opt/module/hadoop-3.1.1/share/hadoop/mapreduce/*:/opt/module/hadoop-3.1.1/share/hadoop/yarn:/opt/module/hadoop-3.1.1/share/hadoop/yarn/lib/*:/opt/module/hadoop-3.1.1/share/hadoop/yarn/*
</value>
</property>
## 重新启动后执行成功
[admin@hadoop104 hadoop-3.1.1]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.jar wordcount /user/qianxkun/mapreduce/wordcount/input /user/qianxkun/mapreduce/wordcount/output
#查看运行结果
[admin@hadoop104 hadoop-3.1.1]$ bin/hdfs dfs -cat /user/qianxkun/mapreduce/wordcount/output/*
hadoop 2
mapreduce 1
qianxkun 2
yarn 1
三、历史服务配置启动查看
# 配置 mapred-site.xml
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop104:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop104:19888</value>
</property>
#查看启动历史服务器文件目录:
[admin@hadoop104 hadoop-3.1.1]$ ls sbin/ |grep mr
mr-jobhistory-daemon.sh
#启动历史服务器
[admin@hadoop104 hadoop-3.1.1]$ sbin/mr-jobhistory-daemon.sh start historyserver
#查看历史服务器是否启动
[admin@hadoop104 hadoop-3.1.1]$ jps
19442 SecondaryNameNode
19800 NodeManager
19257 DataNode
19146 NameNode
19692 ResourceManager
20142 Jps
18959 JobHistoryServer
#查看 jobhistory
http://192.168.119.104:19888/jobhistory 或者
http://hadoop104:19888/jobhistory
四、日志的聚集
#配置 yarn-site.xml
<!– 日志聚集功能使能 –>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!– 日志保留时间设置 7 天 –>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
#启动 hdfs、yarn 和 historymanager
[admin@hadoop104 hadoop-3.1.1]$ sbin/start-dfs.sh
[admin@hadoop104 hadoop-3.1.1]$ sbin/start-yarn.sh
[admin@hadoop104 hadoop-3.1.1]$ sbin/mr-jobhistory-daemon.sh start historyserver
#删除 hdfs 上已经存在的 hdfs 文件
[admin@hadoop104 hadoop-3.1.1]$ bin/hdfs dfs -rm -R /user/qianxkun/mapreduce/wordcount/output
#执行 wordcount 程序
[admin@hadoop104 hadoop-3.1.1]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.jar wordcount /user/qianxkun/mapreduce/wordcount/input /user/qianxkun/mapreduce/wordcount/output
2019-04-04 18:23:59,294 INFO client.RMProxy: Connecting to ResourceManager at hadoop104/192.168.119.104:8032
2019-04-04 18:24:00,525 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/admin/.staging/job_1554373388828_0001
# 查看日志
http://192.168.119.104:19888/jobhistory 或者
http://hadoop104:19888/jobhistory