准备hadoop2(master), Hadoop3,hadoop4,三台机器

  1. vi /etc/profile.d/hadoop.sh

    export JAVA_HOME=/usr/local/src/jdk1.8.0_92export JRE_HOME=${JAVA_HOME}/jreexport CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATHexport JAVA_PATH=${JAVA_HOME}/bin:${JRE_HOME}/binexport PATH=$PATH:${JAVA_PATH}export HADOOP_HOME=/usr/local/src/hadoop-2.7.7export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbinexport HDFS_DATANODE_USER=rootexport HDFS_DATANODE_SECURE_USER=rootexport HDFS_SECONDARYNAMENODE_USER=rootexport HDFS_NAMENODE_USER=rootexport YARN_RESOURCEMANAGER_USER=rootexport YARN_NODEMANAGER_USER=root 

    mapred-env.sh hadoop-env.xml yarn-env.sh 至少有一个设置JAVA_HOME

  2. core-site.xml,配置hdfs端口和地址,临时文件存放地址

    更多参考core-site.xml

    <configuration><property>    <name>fs.default.name</name>    <value>hdfs://hadoop2:9091</value></property><property>    <name>hadoop.tmp.dir</name><value>/data/docker/hadoop/tmp</value></property></configuration>
  3. hdfs-site.xml, 配置HDFS组件属性,副本个数以及数据存放的路径

    更多参考hdfs-site.xml

    dfs.namenode.name.dir和dfs.datanode.data.dir不再单独配置,官网给出的配置是针对规模较大的集群的较高配置。

    <font color=red>注意:这里目录是每台机器上的,不要去使用volumes-from data_docker资源共享卷</font>

    三台机器同时做

    mkdir -p /opt/hadoop/tmp && mkdir -p /opt/hadoop/dfs/data && mkdir -p /opt/hadoop/dfs/name

    <configuration>    <property>        <name>dfs.namenode.http-address</name>        <value>hadoop2:9092</value>    </property>    <property>        <name>dfs.replication</name>        <value>2</value>    </property>    <property>        <name>dfs.namenode.name.dir</name>        <value>file:/opt/hadoop/dfs/name</value>    </property>    <property>        <name>dfs.datanode.data.dir</name>        <value>file:/opt/hadoop/dfs/data</value>    </property>    <property>        <name>dfs.namenode.handler.count</name>        <value>100</value>    </property></configuration>
  4. mapred-site.xml,配置使用yarn框架执行mapreduce处理程序

    更多参考mapred-site.xml

    <configuration>      <property>          <name>mapreduce.framework.name</name>          <value>yarn</value>      </property>      <property>          <name>mapreduce.jobhistory.address</name>          <value>hadoop2:9094</value>      </property>     <property>         <name>mapreduce.jobhistory.webapp.address</name>         <value>hadoop2:9095</value>     </property>   <property>        <name>mapreduce.application.classpath</name>        <value>            /usr/local/src/hadoop-3.1.2/etc/hadoop,            /usr/local/src/hadoop-3.1.2/share/hadoop/common/*,            /usr/local/src/hadoop-3.1.2/share/hadoop/common/lib/*,            /usr/local/src/hadoop-3.1.2/share/hadoop/hdfs/*,            /usr/local/src/hadoop-3.1.2/share/hadoop/hdfs/lib/*,            /usr/local/src/hadoop-3.1.2/share/hadoop/mapreduce/*,            /usr/local/src/hadoop-3.1.2/share/hadoop/mapreduce/lib/*,            /usr/local/src/hadoop-3.1.2/share/hadoop/yarn/*,            /usr/local/src/hadoop-3.1.2/share/share/hadoop/yarn/lib/*        </value>    </property></configuration>
  5. yarn-site.xml
    更多配置信息,请参考yarn-site.xml。

    <configuration>  <property>      <name>yarn.resourcemanager.hostname</name>      <value>bdfb9324ff7d</value>  </property>  <property>      <name>yarn.nodemanager.aux-services</name>      <value>mapreduce_shuffle</value>  </property>  <property>    <name>yarn.resourcemanager.webapp.address</name>    <value>hadoop2:9093</value>  </property>   <property>    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>    <value>org.apache.hadoop.mapred.ShuffleHandler</value>  </property></configuration>
  6. 配置ssh免密登录

    yum -y install openssh-server openssh-clientsssh-keygen -q -t rsa -b 2048 -f /etc/ssh/ssh_host_rsa_key -N ''  ssh-keygen -q -t ecdsa -f /etc/ssh/ssh_host_ecdsa_key -N ''ssh-keygen -t dsa -f /etc/ssh/ssh_host_ed25519_key -N ''ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa     #这样可以没有交互#进入~/.sshcp id_rsa.pub authorized_keyscp authorized_keys /data/docker/hadoop/    #拷贝到共享磁盘#在其他docker#1. 依次完成上述操作(1-4)#2. hadoop3 ,hadoop4操作如下cp /data/docker/hadoop/authorized_keys  ~/.sshcat id_rsa.pub >> authorized_keyscp authorized_keys /data/docker/hadoop/authorized_keys  #覆盖#再回到hadoop2容器cp  /data/docker/hadoop/authorized_keys  authorized_keys #覆盖,这样#测试#启动hadoop3,hadoop4的ssh /usr/sbin/sshdssh root@hadoop3ssh root@hadoop4
  7. 配置hosts

    172.17.0.9    hadoop2172.17.0.10    hadoop3172.17.0.11    hadoop4
  8. 配置works定义工作节点

    vi /usr/local/src/hadoop-3.1.2/etc/hadoop/workers ,2.7版本中应该是slave

    hadoop2                     #这台以既可以是namenode,也可以是datanode,不要浪费机器hadoop3                     #只做datanodehadoop4                     #只做datanode
  9. 停止docker容器并创建镜像

    172.17.0.0/24 可用ip: 1-255 ip总数256, 子网掩码:255.255.255.0

    172.17.0.0/16 可用ip: 可用地址就是172.16.0.1-172.16.255.254. ip总数:65536 子网掩码:255.255.0.0

docker commit hadoop2 image_cdocker run --privileged -tdi --volumes-from data_docker --name hadoop2 --hostname hadoop2 --add-host hadoop2:172.17.0.8 --add-host hadoop3:172.17.0.9 --add-host hadoop4:172.17.0.10 --link mysqlcontainer:mysqlcontainer  -p 5002:22 -p 8088:8088 -p 9090:9090 -p 9091:9091  -p 9092:9092  -p 9093:9093  -p 9094:9094  -p 9095:9095  -p 9096:9096  -p 9097:9097  -p 9098:9098  -p 9099:9099 centos:hadoop /bin/bash docker run --privileged -tdi --volumes-from data_docker --name hadoop3 --hostname hadoop3 --add-host hadoop2:172.17.0.8 --add-host hadoop3:172.17.0.9 --add-host hadoop4:172.17.0.10 --link mysqlcontainer:mysqlcontainer  -p 5003:22  centos:hadoop /bin/bash docker run --privileged -tdi --volumes-from data_docker --name hadoop4 --hostname hadoop4 --add-host hadoop2:172.17.0.8 --add-host hadoop3:172.17.0.9 --add-host hadoop4:172.17.0.10 --link mysqlcontainer:mysqlcontainer  -p 5004:22  centos:hadoop /bin/bash 
  1. 启动

    首次hdfs namenode -format

    你会看到最后倒数: util.ExitUtil: Exiting with status 0

    start-all.sh This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh

#start-dfs.sh----------------------# jps  可以在Master上看到如下进程:5252 DataNode5126 NameNode5547 Jps5423 SecondaryNameNode# jps slave可以看到1131 Jps1052 DataNode
# start-yarn.sh------------------# jps  可以在Master上看到如下进程:5890 NodeManager5252 DataNode5126 NameNode6009 Jps5423 SecondaryNameNode5615 ResourceManager# jps slave可以看到1177 NodeManager1052 DataNode1309 Jps

访问

http://hadoop2:9093

http://hadoop2:9092

试用hadoop

准备test

cat test.txt hadoop mapreduce hivehbase spark stormsqoop hadoop hivespark hadoop#hdfs dfs 看一下帮助#创建hadoop下的目录hadoop fs -mkdir /inputhadoop fs -ls /#上传hadoop fs -put test.txt /inputhadoop fs -ls /input#运行hadoop自带workcount程序#/hadoop-mapreduce-examples-2.7.7.jar里面有很多小程序yarn jar /usr/local/src/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /input/test.txt /output hadoop fs -ls /output-rw-r--r--   2 root supergroup          0 2019-06-03 01:28 /output/_SUCCESS-rw-r--r--   2 root supergroup         60 2019-06-03 01:28 /output/part-r-00000#查看结果hadoop fs -cat /output/part-r-00000#查看其他内置程序hadoop jar /usr/local/src/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar #可以看到grep的用法hadoop jar /usr/local/src/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar grep  

http://hadoop2:9093 看到任务信息

其他hadoop命令

#查看容量hadoop fs -df -hFilesystem              Size   Used  Available  Use%hdfs://hadoop2:9091  150.1 G  412 K    129.9 G    0%#查看各个机器状态hdfs dfsadmin -report

文章内容由海畅智慧http://www.hichannel.net原创出品,转载请注明!
更多技术文档和海畅产品,请关注海畅智慧官方网站。