Skip to content

Mac OS X EI Capitan 下安装及配置伪分布式 Hadoop 环境 #10

@joyking7

Description

@joyking7

Mac OS X EI Capitan 下安装及配置伪分布式 Hadoop 环境

一堆废话

前前后后几个星期都在看理论,所以趁着放小长假就搭了一下 hadoop 的环境,虽然教程一抓一大把,但是对于 Mac 上的伪分布搭建基本都是不怎么能跑的,各种博客都是互相转载,所以在撸了一部分官方文档之后,结合一些有点用的博客,总算是把这个环境打好了,正所以环境都不会搭,还谈什么开发,也是为了防止自己玩崩 hadoop 忘了怎么装,就写了这个,有兴趣的也可以考虑坑一下,对于 Linux 的话,教程很多,如果有时间,会再出一篇,各位看官往下看吧。

总环境配置

Mac OS X EI Captian 10.11.4

java version "1.8.0_77"

Hadoop 2.7.2

Xcode 7.3

Homebrew 0.9.5

一、预装环境配置

1. Homebrew

  • 打开<终端>窗口, 粘贴以下脚本

    /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
    

2. JAVA

  • Oracle 官网下载 JDK8 的 Mac OS X 安装包:Java SE Downloads

  • 打开下载的 dmg 文件,双击包中的 pkg 文件进行安装

  • 打开<终端>,输入

    java -version
    
  • 显示为

    java version "1.8.0_77"
    Java(TM) SE Runtime Environment (build 1.8.0_77-b03)
    Java HotSpot(TM) 64-Bit Server VM (build 25.77-b03, mixed mode)
    
  • JDK目录为

    /Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home
    

    3. Xcode

  • 打开 App Store 进行下载

  • PS:速度可能不是很快,但是官方的还是很安全

二、配置 SSH

为了保证远程登录管理 Hadoop 及 Hadoop 节点用户共享的安全性,Hadoop 需要配置使用 SSH 协议

  • 打开系统偏好设置-共享-远程登录-允许访问-所有用户

  • 打开<终端>,分别输入

    ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
    cat ~/.ssh/id_dsa.pub >>~/.ssh/authorized_keys
    
  • 配置好之后,输入

    ssh localhost
    
  • 显示

    Last login: Mon Apr  4 15:30:53 2016
    
  • 或者类似时间信息,即配置完成

    三、安装及配置 Hadoop

    1.安装 Hadoop

  • <终端>输入

    brew install hadoop
    
  • 显示如下即安装成功

    ==> Downloading https://www.apache.org/dyn/closer.cgi?path=hadoop/common/hadoop-
    ==> Best Mirror http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.7.2/hadoop-
    ######################################################################## 100.0%
    ==> Caveats
    In Hadoop's config file:
    /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/hadoop-env.sh,
    /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/mapred-env.sh and
    /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/yarn-env.sh
    $JAVA_HOME has been set to be the output of:
    /usr/libexec/java_home
    ==> Summary
    ?  /usr/local/Cellar/hadoop/2.7.2: 6,304 files, 309.8M, built in 2 minutes 43 seconds
    

2. 配置伪分布式 Hadoop

(1)配置 hadoop-env.sh
  • <终端>输入

    open /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/hadoop-env.sh
    
  •   export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
    
  • 修改为

    export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="
    
(2) 配置 yarn-env.sh
  • <终端>输入

    open /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/yarn-env.sh
    
  • 加入

    YARN_OPTS="$YARN_OPTS -Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"
    
(3)配置 core-site.xml
  • <终端>输入

    open /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/core-site.xml
    
  • 编辑

    <property>  
        <name>fs.defaultFS</name>             
        <value>hdfs://localhost:9000</value>          
    </property>
    
    (4) 配置 hdfs-core.xml
  • <终端>输入

    open /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/hdfs-site.xml
    
  • 编辑

     <property>
         <name>dfs.replication</name>
         <value>1</value>
    </property>
    
    (5) 配置 mapred-site.xml
  • <终端>依次输入

    cp /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/mapred-site.xml.template /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/mapred-site.xml
    open /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/mapred-site.xml
    
  • 编辑

     <property>
         <name>mapreduce.framework.name</name>
         <value>yarn</value>
    </property>
    
    (6) 配置 yarn-site.xml
  • <终端>输入

    open /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/yarn-site.xml
    
  • 编辑

     <property>
         <name>yarn.nodemanager.aux-services</name>
         <value>mapreduce_shuffle</value>
     </property>
    

    3. 格式化 HDFS

  • <终端>输入

    rm -rf /tmp/hadoop-tanjiti #如果之前安装过需要清除
    hadoop namenode -format
    

    4.启动

  • 找到sbin目录

    cd /usr/local/Cellar/hadoop/2.7.2/sbin
    
    (1)启动 HDFS
    ./start-dfs.sh
    
    (2) 启动 MapReduce
    ./start-yarn.sh
    
    (3) 检查启动情况
    jps
    
  • 结果

    6467 Jps
    5991 DataNode
    6343 NodeManager
    6106 SecondaryNameNode
    6251 ResourceManager
    5901 NameNode
    

    5.运行 MapReduce 自带实例

  • 测算pi值的实例

    hadoop jar /usr/local/Cellar/hadoop/2.7.2/libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar pi 2 5
    
  • 结果

Number of Maps  = 2
Samples per Map = 5
16/04/04 16:34:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Wrote input for Map #0
Wrote input for Map #1
Starting Job
16/04/04 16:34:52 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/04/04 16:34:53 INFO input.FileInputFormat: Total input paths to process : 2
16/04/04 16:34:53 INFO mapreduce.JobSubmitter: number of splits:2
16/04/04 16:34:53 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1459758345965_0002
16/04/04 16:34:53 INFO impl.YarnClientImpl: Submitted application application_1459758345965_0002
16/04/04 16:34:53 INFO mapreduce.Job: The url to track the job: http://mbp.local:8088/proxy/application_1459758345965_0002/
16/04/04 16:34:53 INFO mapreduce.Job: Running job: job_1459758345965_0002
16/04/04 16:34:59 INFO mapreduce.Job: Job job_1459758345965_0002 running in uber mode : false
16/04/04 16:34:59 INFO mapreduce.Job:  map 0% reduce 0%
16/04/04 16:35:06 INFO mapreduce.Job:  map 100% reduce 0%
16/04/04 16:35:12 INFO mapreduce.Job:  map 100% reduce 100%
16/04/04 16:35:12 INFO mapreduce.Job: Job job_1459758345965_0002 completed successfully
16/04/04 16:35:12 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=50
        FILE: Number of bytes written=353319
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=526
        HDFS: Number of bytes written=215
        HDFS: Number of read operations=11
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=3
    Job Counters
        Launched map tasks=2
        Launched reduce tasks=1
        Data-local map tasks=2
        Total time spent by all maps in occupied slots (ms)=7821
        Total time spent by all reduces in occupied slots (ms)=2600
        Total time spent by all map tasks (ms)=7821
        Total time spent by all reduce tasks (ms)=2600
        Total vcore-milliseconds taken by all map tasks=7821
        Total vcore-milliseconds taken by all reduce tasks=2600
        Total megabyte-milliseconds taken by all map tasks=8008704
        Total megabyte-milliseconds taken by all reduce tasks=2662400
    Map-Reduce Framework
        Map input records=2
        Map output records=4
        Map output bytes=36
        Map output materialized bytes=56
        Input split bytes=290
        Combine input records=0
        Combine output records=0
        Reduce input groups=2
        Reduce shuffle bytes=56
        Reduce input records=4
        Reduce output records=0
        Spilled Records=8
        Shuffled Maps =2
        Failed Shuffles=0
        Merged Map outputs=2
        GC time elapsed (ms)=196
        CPU time spent (ms)=0
        Physical memory (bytes) snapshot=0
        Virtual memory (bytes) snapshot=0
        Total committed heap usage (bytes)=547356672
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters
        Bytes Read=236
    File Output Format Counters
        Bytes Written=97
Job Finished in 20.021 seconds
Estimated value of Pi is 3.60000000000000000000

6.可视化查看

四、总结

其实配置起来,如果按照上面的话,其实很快,但摸索的时候坑多,网速什么,路径什么,没事就会崩一崩。

环境搭好,继续撸理论,与一些也做这个的朋友们讨论了一下,还是要补一下统计学的知识,如果部门谁有兴趣,可以试一试哦。

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions