hadoop配置
规划
NameNode,SecondaryNameNode,ResourceManager分别放在不同的机器上。
master sliver1 sliver2
HDFS namenode secondarynamenode
YARN resourcemanager
DataNode DataNode DataNode
NodeManager NodeManager NodeManager
jobhistory(历史服务器) timelinerserver (日志聚集)
进程 3667 NameNode<br/>5028 JobHistoryServer<br/>3814 DataNode<br/>6327 Jps<br/>4877 NodeManager 3841 Jps<br/>1523 DataNode<br/>2676 ResourceManager<br/>2797 NodeManager<br/>3181 ApplicationHistoryServer 1779 SecondaryNameNode<br/>3352 Jps<br/>2344 NodeManager<br/>1674 DataNode
配置host主机名称
#配置主机名称
vi /etc/hosts
192.168.133.150 master
192.168.133.151 sliver1
192.168.133.152 sliver2
创建用户
创建用户是为了启动hads时,有一个进程。和core-site 配置中指定的用户名一致
三台机器都要创建 。
[root@master home]# useradd hcb
[root@master home]# passwd hcb
Changing password for user hcb.#密码最少8位
New password:
BAD PASSWORD: The password is shorter than 8 characters
Retype new password:
Sorry, passwords do not match.
New password:
BAD PASSWORD: The password fails the dictionary check - it is too simplistic/systematic
Retype new password:
passwd: all authentication tokens updated successfully.
提升用户权限为root,三台机器都要操作
#编辑以下文件
[root@master opt]# vi /etc/sudoers
#找到root,在下面添加hcb
## Allow root to run any commands anywhere
root ALL=(ALL) ALL
hcb ALL=(ALL) ALL
#使用:wq! 强制保存,因为是只读文件,所以要在wq后面加上!
jdk配置
#jdk tar包解压到/usr/java/
[hcb@sliver1 jdk1.8.0_144]$ pwd
/usr/java/jdk1.8.0_144
#配置环境变量
vi /etc/profile.d/my_env.sh
#输入以下内容 ,注意自己的位置
export JAVA_HOME=/usr/java/jdk1.8.0_144
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/opt/module/hadoop-3.1.3
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
免密登录-可跳过
分别在三台机器上执行以下命令,就可以了
ssh-keygen -t rsa
ssh-copy-id master
ssh-copy-id sliver1
ssh-copy-id sliver1
生成的文件在~/.ssh
[hcb@master .ssh]# ll
total 16
-rw-------. 1 hcb hcb 1181 Jan 25 03:39 authorized_keys
-rw-------. 1 hcb hcb 1675 Jan 25 03:24 id_rsa
-rw-r--r--. 1 hcb hcb 393 Jan 25 03:24 id_rsa.pub
-rw-r--r--. 1 hcb hcb 554 Jan 25 03:25 known_hosts
known_hosts 记录ssh访问过计算机的公钥(public key)
id_rsa 生成的私钥
id_rsa.pub 生成的公钥
authorized_keys 存放授权过的无密登录服务器公钥
同步脚本-可跳过
需要先安装rsync,
#安装
yum install rsync
#调用
xsync /bin/xsync
同步脚本命令
#!/bin/bash
#1. 判断参数个数
if [ $# -lt 1 ]
then
echo Not Enough Arguement!
exit;
fi
#2. 遍历集群所有机器
for host in master sliver1 sliver2
do
echo ==================== $host ====================
#3. 遍历所有目录,挨个发送
for file in $@
do
#4 判断文件是否存在
if [ -e $file ]
then
#5. 获取父目录
pdir=$(cd -P $(dirname $file); pwd)
#6. 获取当前文件的名称
fname=$(basename $file)
ssh $host "mkdir -p $pdir"
rsync -av $pdir/$fname $host:$pdir
else
echo $file does not exists!
fi
done
done
测试
[hcb@master opt]# mkdir test
[hcb@master opt]# vi ./test/test.text
[hcb@master opt]# xsync ./test/test.text
==================== master ====================
sending incremental file list
sent 48 bytes received 12 bytes 40.00 bytes/sec
total size is 5 speedup is 0.08
==================== sliver1 ====================
sending incremental file list
test.text
sent 100 bytes received 35 bytes 90.00 bytes/sec
total size is 5 speedup is 0.04
==================== sliver2 ====================
sending incremental file list
test.text
sent 100 bytes received 35 bytes 90.00 bytes/sec
total size is 5 speedup is 0.04
#查看其它机器
[hcb@sliver1 ~]# cd /opt/
[hcb@sliver1 opt]# ls
module software test
[hcb@sliver1 opt]# cat test/test.text
est
core-site配置
[root@master .ssh]# cd $HADOOP_HOME/etc/hadoop
[root@master hadoop]# vi core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-3.1.3/data</value>
</property>
<property>
<name>hadoop.proxyuser.hcb.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hcb.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.http.staticuser.user</name>
<value>hcb</value>
</property>
hdfs-site配置
vi hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>sliver2:9868</value>
</property>
</configuration>
yarn-site配置修改
[root@master hadoop]# vi yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>sliver1</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>512</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>4096</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
</property>
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
</configuration>
Mapred-site配置
vi mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
同步到其两台机器
拷到到其他两台机器上
scp -r /opt/module/hadoop-3.1.3/etc/hadoop/ hcb@sliver2:/opt/module/hadoop-3.1.3/etc/
scp -r /opt/module/hadoop-3.1.3/etc/hadoop/ hcb@sliver1:/opt/module/hadoop-3.1.3/etc/
检查是否更新成功 。
配置worker
[hcb@master hadoop]# pwd
/opt/module/hadoop-3.1.3/etc/hadoop
[hcb@master hadoop]# vi workers
#输入以下内容
master
sliver1
sliver2
拷贝到其它机子上
[hcb@master hadoop]# scp workers hcb@sliver1:/opt/module/hadoop-3.1.3/etc/hadoop/
workers 100% 23 1.7KB/s 00:00
[hcb@master hadoop]# scp workers hcb@sliver2:/opt/module/hadoop-3.1.3/etc/hadoop/
workers 100% 23 2.9KB/s 00:00
启动集群
1.格式化nodenode,
#1.格式化nodenode,要在hcb下格式化
[hcb@master hadoop]# hdfs namenode -format
WARNING: /opt/module/hadoop-3.1.3/logs does not exist. Creating.
2021-01-25 04:17:19,103 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = master/192.168.133.150
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 3.1.3
.....
2021-01-25 04:17:27,422 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/192.168.133.150
************************************************************/
2.启动hdfs
#master上启动hdfs
[hcb@master hadoop-3.1.3]$ sbin/start-dfs.sh
Starting namenodes on [master]
Starting datanodes
Starting secondary namenodes [sliver2]
3.启动yarn
#在slvier1上启动yarn
[hcb@sliver1 hadoop-3.1.3]$ sbin/start-yarn.sh
Starting resourcemanager
Starting nodemanagers
4.上传文件
#新建一个文件夹
hadoop fs -mkdir -p /user/hcb/input
#把文件wc.input文件上传到hdfs上
hadoop fs -put $HADOOP_HOME/wcinput/wc.input /user/hcb/input
5.查看文件
上传的文件在hdfs中保存的位置和内容
#在hdfs中保存的位置
[hcb@master subdir0]$ pwd
/opt/module/hadoop-3.1.3/data/dfs/data/current/BP-1099864744-192.168.133.150-1611625651191/current/finalized/subdir0/subdir0
#查看hdsf中的内容
[hcb@master subdir0]$ cat blk_1073741825
hadoop yarn
hadoop mapreduce
ad
a
test
配置历史服务器
配置mapred-site
#打开mapred-site.xml 进行配置
[hcb@master hadoop]$ pwd
/opt/module/hadoop-3.1.3/etc/hadoop
[hcb@master hadoop]$ vi mapred-site.xml
<!-- 历史服务器端地址 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<!-- 历史服务器web端地址 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
同步其它服务器
[hcb@master hadoop]$ xsync $HADOOP_HOME/etc/hadoop/mapred-site.xml
==================== master ====================
sending incremental file list
sent 63 bytes received 12 bytes 50.00 bytes/sec
total size is 1,128 speedup is 15.04
==================== sliver1 ====================
sending incremental file list
mapred-site.xml
sent 542 bytes received 47 bytes 392.67 bytes/sec
total size is 1,128 speedup is 1.92
==================== sliver2 ====================
sending incremental file list
mapred-site.xml
启动历史服务器
#在master上启动
[hcb@master hadoop]$ mapred --daemon start historyserver
#查看进程 多了一个JobHistoryServer
[hcb@master hadoop]$ jps
3667 NameNode
4645 Jps
3814 DataNode
4089 NodeManager
4588 JobHistoryServer
查看历史服务器
http://master:19888/jobhistory
开启日志聚集
配置yarn-site
[hcb@master hadoop]$ pwd
/opt/module/hadoop-3.1.3/etc/hadoop
[hcb@master hadoop]$ vi yarn-site.xml
在文件的配置中添加以下的内容
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://${yarn.timeline-service.webapp.address}/applicationhistory/logs</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
<property>
<name>yarn.timeline-service.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.timeline-service.hostname</name>
<value>${yarn.resourcemanager.hostname}</value>
</property>
<property>
<name>yarn.timeline-service.http-cross-origin.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
<value>true</value>
</property>
同步其它服务器
[hcb@master hadoop]$ xsync yarn-site.xml
==================== master ====================
sending incremental file list
sent 62 bytes received 12 bytes 148.00 bytes/sec
total size is 2,482 speedup is 33.54
==================== sliver1 ====================
sending incremental file list
yarn-site.xml
sent 1,199 bytes received 53 bytes 2,504.00 bytes/sec
total size is 2,482 speedup is 1.98
==================== sliver2 ====================
sending incremental file list
yarn-site.xml
sent 1,199 bytes received 53 bytes 834.67 bytes/sec
total size is 2,482 speedup is 1.98
关闭重启
需要关闭NodeManager, ResourceManger,HistoryServer
#在sliver1上关闭yarn ->ResourceManager,nodemanager
[hcb@sliver1 hadoop]$ stop-yarn.sh
Stopping nodemanagers
Stopping resourcemanager
#在master上关闭historyserver
[hcb@master hadoop]$ mapred --daemon stop historyserver
#在sliver1上重启yarn, 启动timelinerserver
[hcb@sliver1 hadoop]$ start-yarn.sh
Starting resourcemanager
Starting nodemanagers
[hcb@sliver1 hadoop]$ yarn --daemon start timelineserver
#在mster上启动历史服务器
[hcb@master hadoop]$ mapred --daemon start historyserver
测试
[hcb@master wcinput]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /user/hcb/input /user/hcb/output
查看日志
http://master:19888/jobhistory
时间同步
必须在root用户下操作
1.关闭ntp服务和自启动
sudo systemctl stop ntpd
sudo systemctl disable ntpd
2.修改ntp配置文件
#vi /etc/ntp.conf
# Hosts on local network are less restricted.
# 放开注释,并改ip
restrict 192.168.133.0 mask 255.255.255.0 nomodify notrap
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
#注册以下几个木server
#server 0.centos.pool.ntp.org iburst
#server 1.centos.pool.ntp.org iburst
#server 2.centos.pool.ntp.org iburst
#server 3.centos.pool.ntp.org iburst
#添加以下配置
server 127.127.1.0
fudge 127.127.1.0 stratum 10
3.修改ntpd配置文件
vi /etc/sysconfig/ntpd
#增加以下配置,让硬件和系统时间一致
SYNC_HWCLOCK=yes
4.重启ntpd
#重启ntpd服务,并设置开机自启动
[root@master wcinput]# systemctl start ntpd
[root@master wcinput]# systemctl enable ntpd
Created symlink from /etc/systemd/system/multi-user.target.wants/ntpd.service to /usr/lib/systemd/system/ntpd.service.
5.配置其它机器同步
[hcb@sliver1 hadoop]$ crontab -e
#输入以下内容,每十分钟同步一次
*/10 * * * * /usr/sbin/ntpdate master
#设置时间去测试
[root@master wcinput]# date -s "2021-01-26 11:31:15"
Tue Jan 26 11:31:15 EST 2021
检查各机器进程
[root@master ~]# jps
3667 NameNode
5028 JobHistoryServer
3814 DataNode
6327 Jps
4877 NodeManager
[root@sliver1 ~]# jps
3841 Jps
1523 DataNode
2676 ResourceManager
2797 NodeManager
3181 ApplicationHistoryServer
[root@sliver2 ~]# jps
1779 SecondaryNameNode
3352 Jps
2344 NodeManager
1674 DataNode
配置开发机环境变量
#系统变量新建HADOOP_HOME
HADOOP_HOME:D:\codeEnv\hadoop-3.0.0
#path变量添加
%HADOOP_HOME%\bin
还不快抢沙发