hadoop配置

规划

NameNode,SecondaryNameNode,ResourceManager分别放在不同的机器上。

      master                                      sliver1                                     sliver2                                 

HDFS namenode secondarynamenode
YARN resourcemanager

      DataNode                                    DataNode                                    DataNode                                
      NodeManager                                 NodeManager                                 NodeManager                             
      jobhistory(历史服务器)                           timelinerserver (日志聚集)                                                              

进程 3667 NameNode<br/>5028 JobHistoryServer<br/>3814 DataNode<br/>6327 Jps<br/>4877 NodeManager 3841 Jps<br/>1523 DataNode<br/>2676 ResourceManager<br/>2797 NodeManager<br/>3181 ApplicationHistoryServer 1779 SecondaryNameNode<br/>3352 Jps<br/>2344 NodeManager<br/>1674 DataNode

配置host主机名称

#配置主机名称
vi /etc/hosts

192.168.133.150 master
192.168.133.151 sliver1
192.168.133.152 sliver2

创建用户

创建用户是为了启动hads时,有一个进程。和core-site 配置中指定的用户名一致

三台机器都要创建 。

[root@master home]# useradd hcb
[root@master home]# passwd hcb
Changing password for user hcb.#密码最少8位
New password: 
BAD PASSWORD: The password is shorter than 8 characters
Retype new password: 
Sorry, passwords do not match.
New password: 
BAD PASSWORD: The password fails the dictionary check - it is too simplistic/systematic
Retype new password: 
passwd: all authentication tokens updated successfully.

提升用户权限为root,三台机器都要操作

 #编辑以下文件
[root@master opt]# vi /etc/sudoers
#找到root,在下面添加hcb
## Allow root to run any commands anywhere
root    ALL=(ALL)       ALL
hcb     ALL=(ALL)       ALL
#使用:wq! 强制保存,因为是只读文件,所以要在wq后面加上!

jdk配置

#jdk tar包解压到/usr/java/

[hcb@sliver1 jdk1.8.0_144]$ pwd
/usr/java/jdk1.8.0_144

#配置环境变量
 vi /etc/profile.d/my_env.sh
#输入以下内容 ,注意自己的位置 
export JAVA_HOME=/usr/java/jdk1.8.0_144
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/opt/module/hadoop-3.1.3
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

免密登录-可跳过

分别在三台机器上执行以下命令,就可以了

ssh-keygen -t rsa
ssh-copy-id master
ssh-copy-id sliver1
ssh-copy-id sliver1

生成的文件在~/.ssh

[hcb@master .ssh]# ll
total 16
-rw-------. 1 hcb hcb 1181 Jan 25 03:39 authorized_keys
-rw-------. 1 hcb hcb 1675 Jan 25 03:24 id_rsa
-rw-r--r--. 1 hcb hcb  393 Jan 25 03:24 id_rsa.pub
-rw-r--r--. 1 hcb hcb  554 Jan 25 03:25 known_hosts

known_hosts 记录ssh访问过计算机的公钥(public key)
id_rsa 生成的私钥
id_rsa.pub 生成的公钥
authorized_keys 存放授权过的无密登录服务器公钥

同步脚本-可跳过

需要先安装rsync,

#安装
yum install rsync
#调用
xsync /bin/xsync

同步脚本命令

#!/bin/bash
#1. 判断参数个数
if [ $# -lt 1 ]
then
  echo Not Enough Arguement!
  exit;
fi
#2. 遍历集群所有机器
for host in master sliver1 sliver2
do
  echo ====================  $host  ====================
  #3. 遍历所有目录,挨个发送
  for file in $@
  do
    #4 判断文件是否存在
    if [ -e $file ]
    then
      #5. 获取父目录
      pdir=$(cd -P $(dirname $file); pwd)
      #6. 获取当前文件的名称
      fname=$(basename $file)
      ssh $host "mkdir -p $pdir"
      rsync -av $pdir/$fname $host:$pdir
    else
      echo $file does not exists!
    fi
  done
done

测试

[hcb@master opt]# mkdir test
[hcb@master opt]# vi ./test/test.text
[hcb@master opt]# xsync ./test/test.text 
==================== master ====================
sending incremental file list

sent 48 bytes  received 12 bytes  40.00 bytes/sec
total size is 5  speedup is 0.08
==================== sliver1 ====================
sending incremental file list
test.text

sent 100 bytes  received 35 bytes  90.00 bytes/sec
total size is 5  speedup is 0.04
==================== sliver2 ====================
sending incremental file list
test.text

sent 100 bytes  received 35 bytes  90.00 bytes/sec
total size is 5  speedup is 0.04

#查看其它机器 
[hcb@sliver1 ~]# cd /opt/
[hcb@sliver1 opt]# ls
module  software  test
[hcb@sliver1 opt]# cat test/test.text 
est

core-site配置

[root@master .ssh]# cd $HADOOP_HOME/etc/hadoop
[root@master hadoop]# vi core-site.xml 

    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://master:8020</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/module/hadoop-3.1.3/data</value>
    </property>
    <property>
        <name>hadoop.proxyuser.hcb.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.hcb.groups</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.http.staticuser.user</name>
        <value>hcb</value>
    </property>

hdfs-site配置

vi hdfs-site.xml
<configuration>
<property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>sliver2:9868</value>
    </property>
</configuration>


yarn-site配置修改

[root@master hadoop]# vi yarn-site.xml 
<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>sliver1</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>
    <property>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>512</value>
    </property>
    <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>4096</value>
    </property>
    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>4096</value>
    </property>
    <property>
        <name>yarn.nodemanager.pmem-check-enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
    </property>
</configuration>

Mapred-site配置

vi mapred-site.xml

<property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
</property>

同步到其两台机器

拷到到其他两台机器上

scp -r /opt/module/hadoop-3.1.3/etc/hadoop/ hcb@sliver2:/opt/module/hadoop-3.1.3/etc/
scp -r /opt/module/hadoop-3.1.3/etc/hadoop/ hcb@sliver1:/opt/module/hadoop-3.1.3/etc/

检查是否更新成功 。

配置worker

[hcb@master hadoop]# pwd
/opt/module/hadoop-3.1.3/etc/hadoop
[hcb@master hadoop]# vi workers 
#输入以下内容
master
sliver1
sliver2

拷贝到其它机子上

[hcb@master hadoop]# scp workers hcb@sliver1:/opt/module/hadoop-3.1.3/etc/hadoop/
workers                                               100%   23     1.7KB/s   00:00    
[hcb@master hadoop]# scp workers hcb@sliver2:/opt/module/hadoop-3.1.3/etc/hadoop/
workers                                               100%   23     2.9KB/s   00:00  

启动集群

1.格式化nodenode,

#1.格式化nodenode,要在hcb下格式化
[hcb@master hadoop]# hdfs namenode -format
WARNING: /opt/module/hadoop-3.1.3/logs does not exist. Creating.
2021-01-25 04:17:19,103 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = master/192.168.133.150
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 3.1.3
.....
2021-01-25 04:17:27,422 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/192.168.133.150
************************************************************/

2.启动hdfs

#master上启动hdfs
[hcb@master hadoop-3.1.3]$ sbin/start-dfs.sh 
Starting namenodes on [master]
Starting datanodes
Starting secondary namenodes [sliver2]

3.启动yarn

#在slvier1上启动yarn
[hcb@sliver1 hadoop-3.1.3]$ sbin/start-yarn.sh
Starting resourcemanager
Starting nodemanagers

4.上传文件

#新建一个文件夹
hadoop fs -mkdir -p /user/hcb/input

#把文件wc.input文件上传到hdfs上
hadoop fs -put $HADOOP_HOME/wcinput/wc.input /user/hcb/input

5.查看文件

上传的文件在hdfs中保存的位置和内容

#在hdfs中保存的位置 
[hcb@master subdir0]$ pwd
/opt/module/hadoop-3.1.3/data/dfs/data/current/BP-1099864744-192.168.133.150-1611625651191/current/finalized/subdir0/subdir0

#查看hdsf中的内容 
[hcb@master subdir0]$ cat blk_1073741825
hadoop yarn
hadoop mapreduce
ad
a
test


配置历史服务器

配置mapred-site

#打开mapred-site.xml  进行配置
[hcb@master hadoop]$ pwd
/opt/module/hadoop-3.1.3/etc/hadoop
[hcb@master hadoop]$ vi mapred-site.xml 

<!-- 历史服务器端地址 -->
<property>
    <name>mapreduce.jobhistory.address</name>
    <value>master:10020</value>
</property>

<!-- 历史服务器web端地址 -->
<property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>master:19888</value>
</property>

同步其它服务器

[hcb@master hadoop]$ xsync $HADOOP_HOME/etc/hadoop/mapred-site.xml
==================== master ====================
sending incremental file list

sent 63 bytes  received 12 bytes  50.00 bytes/sec
total size is 1,128  speedup is 15.04
==================== sliver1 ====================
sending incremental file list
mapred-site.xml

sent 542 bytes  received 47 bytes  392.67 bytes/sec
total size is 1,128  speedup is 1.92
==================== sliver2 ====================
sending incremental file list
mapred-site.xml

启动历史服务器

#在master上启动
[hcb@master hadoop]$ mapred --daemon start historyserver
#查看进程 多了一个JobHistoryServer
[hcb@master hadoop]$ jps
3667 NameNode
4645 Jps
3814 DataNode
4089 NodeManager
4588 JobHistoryServer

查看历史服务器

http://master:19888/jobhistory

开启日志聚集

配置yarn-site

[hcb@master hadoop]$ pwd
/opt/module/hadoop-3.1.3/etc/hadoop
[hcb@master hadoop]$ vi yarn-site.xml 

在文件的配置中添加以下的内容

<property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
</property>
<property>  
    <name>yarn.log.server.url</name>  
    <value>http://${yarn.timeline-service.webapp.address}/applicationhistory/logs</value>
</property>
<property>
    <name>yarn.log-aggregation.retain-seconds</name>
    <value>604800</value>
</property>
<property>
    <name>yarn.timeline-service.enabled</name>
    <value>true</value>
</property>
<property>
    <name>yarn.timeline-service.hostname</name>
    <value>${yarn.resourcemanager.hostname}</value>
</property>
<property>
    <name>yarn.timeline-service.http-cross-origin.enabled</name>
    <value>true</value>
</property>
<property>
    <name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
    <value>true</value>
</property>

同步其它服务器

[hcb@master hadoop]$ xsync yarn-site.xml 
==================== master ====================
sending incremental file list

sent 62 bytes  received 12 bytes  148.00 bytes/sec
total size is 2,482  speedup is 33.54
==================== sliver1 ====================
sending incremental file list
yarn-site.xml

sent 1,199 bytes  received 53 bytes  2,504.00 bytes/sec
total size is 2,482  speedup is 1.98
==================== sliver2 ====================
sending incremental file list
yarn-site.xml

sent 1,199 bytes  received 53 bytes  834.67 bytes/sec
total size is 2,482  speedup is 1.98

关闭重启

需要关闭NodeManager, ResourceManger,HistoryServer

#在sliver1上关闭yarn ->ResourceManager,nodemanager
[hcb@sliver1 hadoop]$ stop-yarn.sh
Stopping nodemanagers
Stopping resourcemanager

#在master上关闭historyserver
[hcb@master hadoop]$ mapred --daemon stop historyserver

#在sliver1上重启yarn, 启动timelinerserver
[hcb@sliver1 hadoop]$ start-yarn.sh
Starting resourcemanager
Starting nodemanagers
[hcb@sliver1 hadoop]$ yarn --daemon start timelineserver

#在mster上启动历史服务器
[hcb@master hadoop]$ mapred --daemon start historyserver

测试

[hcb@master wcinput]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /user/hcb/input /user/hcb/output

查看日志

http://master:19888/jobhistory

时间同步

必须在root用户下操作

1.关闭ntp服务和自启动

sudo systemctl stop ntpd
sudo systemctl disable ntpd

2.修改ntp配置文件

#vi /etc/ntp.conf
# Hosts on local network are less restricted.
# 放开注释,并改ip
restrict 192.168.133.0 mask 255.255.255.0 nomodify notrap

# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
#注册以下几个木server
#server 0.centos.pool.ntp.org iburst
#server 1.centos.pool.ntp.org iburst
#server 2.centos.pool.ntp.org iburst
#server 3.centos.pool.ntp.org iburst

#添加以下配置
server 127.127.1.0
fudge 127.127.1.0 stratum 10


3.修改ntpd配置文件

vi /etc/sysconfig/ntpd

#增加以下配置,让硬件和系统时间一致
SYNC_HWCLOCK=yes
            

4.重启ntpd

#重启ntpd服务,并设置开机自启动
[root@master wcinput]# systemctl start ntpd
[root@master wcinput]# systemctl enable ntpd
Created symlink from /etc/systemd/system/multi-user.target.wants/ntpd.service to /usr/lib/systemd/system/ntpd.service.

5.配置其它机器同步

[hcb@sliver1 hadoop]$ crontab -e
#输入以下内容,每十分钟同步一次
*/10 * * * * /usr/sbin/ntpdate master
#设置时间去测试
[root@master wcinput]# date -s "2021-01-26 11:31:15"
Tue Jan 26 11:31:15 EST 2021

检查各机器进程

[root@master ~]# jps
3667 NameNode
5028 JobHistoryServer
3814 DataNode
6327 Jps
4877 NodeManager
[root@sliver1 ~]# jps
3841 Jps
1523 DataNode
2676 ResourceManager
2797 NodeManager
3181 ApplicationHistoryServer
[root@sliver2 ~]# jps
1779 SecondaryNameNode
3352 Jps
2344 NodeManager
1674 DataNode

配置开发机环境变量

#系统变量新建HADOOP_HOME 
HADOOP_HOME:D:\codeEnv\hadoop-3.0.0
#path变量添加 
%HADOOP_HOME%\bin

本文由 hcb 创作,采用 知识共享署名 3.0,可自由转载、引用,但需署名作者且注明文章出处。

还不快抢沙发

添加新评论