Hadoop2.7.2 完全分布式集群部署

hadoop   完全分布式  

一、系统环境和软件环境

master.hdp.imdst.com      NameNode   SecondaryNameNode  
1.slave.hdp.imdst.com     DataNode  
2.slave.hdp.imdst.com     DataNode  
  • 生产环境建议NameNode 和 SecondaryNameNode 分开

二、安装过程

  • 解析主机名,对应服务器内网地址即可
  • 三台主机分别新建hadoop用户和设置密码
useradd hadoop  
echo "123456" |passwd --stdin hadoop  
  • 在master中配置ssh免密码登录
su hadoop -c "mkdir -p /home/hadoop/.ssh"  
su hadoop -c "ssh-keygen -t rsa"  #直接回车  
su hadoop -c "touch /home/hadoop/.ssh/authorized_keys"  
su hadoop -c "cat /home/hadoop/.ssh/id_rsa.pub >> /home/hadoop/.ssh/authorized_keys "

scp /home/hadoop/.ssh/id_rsa.pub hadoop@1.slave.imdst.com:/home/hadoop/.ssh/authorized_keys   #根据提示输入密码,2.slave.imdst.com重复此操作  
  • 安装JDK和配置JAVA_HOME
rpm -Uvh jdk-7u79-linux-x64.rpm  
echo "export PATH=$PATH:/usr/java/jdk1.7.0_79/bin/" >> /etc/profile  
su hadoop -c "echo 'export JAVA_HOME=/usr/java/jdk1.7.0_79' >> /home/hadoop/.bashrc"  
source /etc/profile  
  • 配置Hadoop集群
mkdir /home/hadoop/src  
tar zxf hadoop-2.7.2.tar.gz -C /home/hadoop/src  
chown hadoop.hadoop /home/hadoop -R  

三、配置文件修改 "$HOME/src/hadoop/etc/hadoop/"

  • core-site.xml
<configuration>  
        <property>  
                <name>fs.defaultFS</name>  
              <value>hdfs://master.hdp.imdst.com:9000</value>  
        </property>  
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/data/hadoop/tmp</value>
        </property>

        <property>
                <name>hadoop.proxyuser.hadoop.groups</name>
                <value>hadoop</value>
                <description>Allow the superuser oozie to impersonate any members of the group group1 and group2</description>
        </property>

        <property>
                <name>hadoop.proxyuser.hadoop.hosts</name>
                <value>master.hdp.imdst.com,1.slave.hdp.imdst.com,2.slave.hdp.imdst.com</value>
                <description>The superuser can connect only from host1 and host2 to impersonate a user</description>
        </property>

</configuration>  
  • hdfs-site.xml
<configuration>  
        <!--备份副本个数 -->
        <property> 
                <name>dfs.replication</name> 
                <value>3</value> 
        </property> 
        <!-- 热备NameNode -->
        <property> 
                <name>dfs.namenode.secondary.http-address</name> 
                <value>master.hdp.imdst.com:9001</value> 
        </property> 

        <property>
                <name>hadoop.tmp.dir</name>
                <value>/data/hadoop/tmp</value>
        </property>

        <property>
                <name>dfs.permissions</name>
                <value>false</value>
        </property>
</configuration>  
  • mapred-site.xml
<configuration>  
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>

        <property>
                <name>mapreduce.jobhistory.address</name>
                <value>master.hdp.imdst.com:10020</value>
        </property>

        <property>
                <name>mapreduce.jobhistory.webapp.address</name>
                <value>master.hdp.imdst.com:19888</value>
        </property>

        <property>
                <name>mapred.map.tasks.speculative.execution</name>
                <value>false</value>
        </property>

        <property>
                <name>mapred.map.tasks.speculative.execution</name>
                <value>flase</value>
        </property>
</configuration>  
  • yarn-site.xml
<configuration>

    <!-- Site specific YARN configuration properties -->
    <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
    </property>
    <property>                                                               
            <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
            <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
            <name>yarn.resourcemanager.address</name>
            <value>master.hdp.imdst.com:8032</value>
    </property>
    <property>
            <name>yarn.resourcemanager.scheduler.address</name>
            <value>master.hdp.imdst.com:8030</value>
    </property>
    <property>
            <name>yarn.resourcemanager.resource-tracker.address</name>
            <value>master.hdp.imdst.com:8031</value>
    </property>
    <property>
            <name>yarn.resourcemanager.admin.address</name>
            <value>master.hdp.imdst.com:8033</value>
    </property>
    <property>
            <name>yarn.resourcemanager.webapp.address</name>
            <value>master.hdp.imdst.com:8088</value>
    </property>
</configuration>  
  • slaves 中写入所有的DataNode
1.slave.hdp.imdst.com,2.slave.hdp.imdst.com  

四、配置好的hadoop部署到三台服务器

  • master上执行
su - hadoop  
cd src && tar zcf hadoop-2.7.2.tar.gz hadoop-2.7.2  
scp hadoop-2.7.2.tar.gz hadoop@1.slave.hdp.imdst.com:/home/hadoop/src  
scp hadoop-2.7.2.tar.gz hadoop@2.slave.hdp.imdst.com:/home/hadoop/src

五、格式化namenode

  • 在master上执行
su - hadoop  
cd src/hadoop  
bin/hdfs namenode -format  
#出现 successfully formatted 表示格式化成功了

六、启动和关闭hadoop集群

  • 启动
su - hadoop && cd src/hadoop  
sbin/start-all.sh  
  • 关闭
su - hadoop && cd src/hadoop  
sbin/stop-all.sh