Hadoop概述与部署

参考: http://hadoop.apache.org/docs...

1.Hadoop概述

  • 什么是Hadoop?

Hadoop是由Apache基金会开发的分布式系统基础架构,用来解决海量数据的存储和剖析计算问题。

  • Hadoop的劣势

    • 高牢靠:Hadoop数据存储底层采纳多正本
    • 高扩大:集群部署,能够轻松进行节点的扩大
    • 高效性:Hadoop并行工作,放慢工作处理速度
    • 高容错:可能主动地将失败的工作重新分配

2.Hadoop组成

  • 2.x版本和3.x版本组成如下:

    • MapReduce:计算
    • Yarn:资源调度
    • HDFS:数据存储
    • Common:辅助工具

备注:1.x版本中,没有Yarn,MapRedcuer承当计算和资源调度

3.部署布局

  • 三台虚拟机
IP- 主机名操作系统配置节点
192.168.122.10-Hadoop10CentOS 7.51核/4G内存/50G硬盘NameNode、DataNode、NodeManager
192.168.122.11-Hadoop11CentOS 7.51核/4G内存/50G硬盘ResourceManager、DataNode、NodeManager
192.168.122.12-Hadoop12CentOS 7.51核/4G内存/50G硬盘SecondaryNameNode、DataNode、NodeManager

4.集群部署

4.1 零碎更新和ssh免密配置

  • 更新降级
yum install  -y epel-releaseyum update
  • 配置ssh免密登录
[v2admin@hadoop10 ~]$ ssh-keygen -t rsa//...间断回车即可生成私钥id_rsa和id_rsa.pub// 我的用户是v2admin,后续操作都是以这个用户[v2admin@hadoop10 ~]$ ssh-copy-id hadoop10[v2admin@hadoop10 ~]$ ssh-copy-id hadoop11[v2admin@hadoop10 ~]$ ssh-copy-id hadoop12// hadoop11 hadoop12 执行同样操作
  • 上传jdk和Hadoop包至三台虚拟机的/home/v2admin目录下
// 我本人的用操作系统是Ubuntu 18.04,间接应用scp进行上传。// 如果应用windows零碎,能够装置lrzsz或者应用ftp形式上传至虚拟机scp jdk-8u212-linux-x64.tar.gz hadoop-3.1.3.tar.gz  v2admin@192.168.122.10:/home/v2adminscp jdk-8u212-linux-x64.tar.gz hadoop-3.1.3.tar.gz  v2admin@192.168.122.11:/home/v2adminscp jdk-8u212-linux-x64.tar.gz hadoop-3.1.3.tar.gz  v2admin@192.168.122.12:/home/v2admin

4.2 装置JDK

[v2admin@hadoop10 ~]$tar zxvf jdk-8u212-linux-x64.tar.gz[v2admin@hadoop10 ~]$sudo mv jdk1.8.0_212/ /usr/local/jdk8

4.2 装置Hadoop

[v2admin@hadoop10 ~]$sudo  tar zxvf hadoop-3.1.3.tar.gz -C /opt[v2admin@hadoop10 ~]$ sudo chown -R v2admin:v2admin /opt/hadoop-3.1.3  // 批改所属用户和组为以后用户

4.3 配置jdk和Hadoop环境变量

[v2admin@hadoop10 ~]$sudo vim /etc/profile // 最初面增加......# set jdk hadoop envexport JAVA_HOME=/usr/local/jdk8export JRE_HOME=${JAVA_HOME}/jreexport CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/libexport HADOOP_HOME=/opt/hadoop-3.1.3export PATH=${PATH}:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin....[v2admin@hadoop10 ~]$source /etc/profile[v2admin@hadoop10 ~]$java -version // 验证下jdkjava version "1.8.0_212"Java(TM) SE Runtime Environment (build 1.8.0_212-b10)Java HotSpot(TM) 64-Bit Server VM (build 25.212-b10, mixed mode)[v2admin@hadoop10 ~]$hadoop version // 验证下hadoopHadoop 3.1.3Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r ba631c436b806728f8ec2f54ab1e289526c90579Compiled by ztang on 2019-09-12T02:47ZCompiled with protoc 2.5.0From source with checksum ec785077c385118ac91aadde5ec9799This command was run using /opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-common-3.1.3.jar

4.4 散发文件的脚本

因为三台虚拟机配置文件是一样的,如果没有这个脚本,则须要一台一台配置,很繁琐
脚本文件名xrsync.sh
赋予执行权限,将其放到bin目录下,这样能够像应用其余shell命令一样间接调用

#!bin/bashif [ $# -lt 1 ]then     echo 短少必要的参数fi# 遍历集群服务器for host in hadoop10 hadoop11 hadoop12do    for file in $@    do         if [ -e $file ]        then            # 获取父目录            pdir=$(cd -P $(dirname $file);pwd)            # 获取以后文件名称            filename=$(basename $file)            ssh $host "mkdir -p $pdir"            rsync -av $(pdir)/$(fname) $host:$pdir        else            echo $file not exists!        fi    donedone

4.5 集群配置

  • 4.5.1 配置下Hadoop的JAVA_HOME
[v2admin@hadoop10 ~]$ cd /opt/hadoop-3.1.3/etc/hadoop[v2admin@hadoop10 ~]$ vim hadoop-env.sh// 批改JAVA_HOME内容export JAVA_HOME=/usr/local/jdk8

[v2admin@hadoop10 ~]$xrsync hadoop-env.sh // 同步更新其余两台主机的配置文件

  • 4.5.2 外围配置

core-site.xml

<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration>   <!-- NameNode的地址 -->    <property>        <name>fs.defaultFS</name>        <value>hdfs://hadoop10:9820</value>    </property> <!-- hadoop 数据的存储目录 -->    <property>        <name>hadoop.data.dir</name>        <value>/opt/module/hadoop-3.1.3/data</value>    </property>    <property>        <name>hadoop.proxyuser.v2admin.hosts</name>        <value>*</value>    </property>    <property>        <name>hadoop.proxyuser.v2admin.groups</name>        <value>*</value>    </property><!-- 指定用户 --><property>        <name>hadoop.http.staticuser.user</name>        <value>v2admin</value> </property></configuration>
  • 4.5.3 HDFS 配置

hdfs-site.xml

<configuration> <!--NameNode数据的存储目录 -->  <property>    <name>dfs.namenode.name.dir</name>    <value>file://${hadoop.data.dir}/name</value>  </property><!--DataNode数据存储目录 -->  <property>    <name>dfs.datanode.data.dir</name>    <value>file://${hadoop.data.dir}/data</value>  </property><!-- 2n数据的存储目录-->    <property>    <name>dfs.namenode.checkpoint.dir</name>    <value>file://${hadoop.data.dir}/namesecondary</value>  </property>    <property>    <name>dfs.client.datanode-restart.timeout</name>    <value>30</value>  </property><!--nn的WEB拜访地址 -->    <property>    <name>dfs.namenode.http-address</name>    <value>hadoop10:9870</value>  </property></configuration>
  • 4.5.4 Yarn配置

yarn-site.xml

<configuration><!-- Site specific YARN configuration properties -->    <property>        <name>yarn.nodemanager.aux-services</name>        <value>mapreduce_shuffle</value>    </property>    <property>        <name>yarn.resourcemanager.hostname</name>        <value>hadoop11</value>    </property>    <property>        <name>yarn.nodemanager.env-whitelist</name>        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>    </property>    <property>        <name>yarn.log-aggregation-enable</name>        <value>true</value>    </property>    <property>          <name>yarn.log.server.url</name>          <value>http://hadoop10:19888/jobhistory/logs</value>      </property>    <property>        <name>yarn.log-aggregation.retain-seconds</name>        <value>604800</value></property></configuration>
  • 4.5.5 MapReduce配置

mapred-site.xml

<configuration>  <property>    <name>mapreduce.framework.name</name>    <value>yarn</value>  </property><!-- 历史服务器端地址 --><property>    <name>mapreduce.jobhistory.address</name>    <value>hadoop10:10020</value></property><!-- 历史服务器web端地址 --><property>    <name>mapreduce.jobhistory.webapp.address</name>    <value>hadoop10:19888</value></property></configuration>
  • 4.5.6 应用脚本集群上散发配置的好配置文件

4.6 启动集群的脚本

启动集群须要在每台服务器下来执行相干启动操作,为不便启动集群,和查看启动信息,编写一个启动脚本startMyCluster.sh

#!/bin/bashif [ $# -lt 1 ] then  echo "Not enough arguments Input !!!" exitficase $1 in# 启动"start")    echo "==========start hdfs============="    ssh hadoop10 /opt/module/hadoop-3.1.3/sbin/start-dfs.sh    echo "==========start historyServer============"    ssh hadoop10 /opt/module/hadoop-3.1.3/bin/mapred --daemon start historyserver    echo "==========start yarn============"    ssh hadoop11 /opt/module/hadoop-3.1.3/sbin/start-yarn.sh;;# 敞开"stop")        echo "==========stop hdfs============="        ssh hadoop10 /opt/module/hadoop-3.1.3/sbin/stop-dfs.sh        echo "==========stop yarn============"        ssh hadoop11 /opt/module/hadoop-3.1.3/sbin/stop-yarn.sh    echo "==========stop historyserver===="    ssh hadoop10 /opt/module/hadoop-3.1.3/bin/mapred --daemon stop historyserver;;# 查看启动信息"jps")    for i in hadoop10 hadoop11 hadoop12    do        echo "==============$i jps================"        ssh $i /usr/local/jdk8/bin/jps    done;;*) echo "Input Args Error!!!";;esac

同样将其放到/bin目录下不便间接调用

4.7 启动集群和查看启动信息

[v2admin@hadoop10 ~]$ startMyCluster.sh start //启动==========start hdfs=============Starting namenodes on [hadoop10]Starting datanodesStarting secondary namenodes [hadoop12]==========start historyServer======================start yarn============Starting resourcemanagerStarting nodemanagers[v2admin@hadoop10 ~]$ startMyCluster.sh jps //查看启动信息==============hadoop10 jps================1831 NameNode2504 Jps2265 JobHistoryServer1980 DataNode2382 NodeManager==============hadoop11 jps================1635 DataNode1814 ResourceManager2297 Jps1949 NodeManager==============hadoop12 jps================1795 NodeManager1590 DataNode1927 Jps1706 SecondaryNameNode

4.8 可能遇到的问题

在装置部署结束,启动时,可能会遇到NoClassDefFoundError: javax/activation/DataSource
我在以前装置时,没有呈现过,但之前用的2.x,这次打算装置3.x版本,遇到这个问题,起因是yarn的lib中短少相干jar包
解决办法:

cd /opt/hadoop-3.1.3/share/hadoop/yarn/libwget https://repo1.maven.org/maven2/javax/activation/activation/1.1.1/activation-1.1.1.jar