-
Notifications
You must be signed in to change notification settings - Fork 31
Open
Labels
Description
Mac OS X EI Capitan 下安装及配置伪分布式 Hadoop 环境
一堆废话
前前后后几个星期都在看理论,所以趁着放小长假就搭了一下 hadoop 的环境,虽然教程一抓一大把,但是对于 Mac 上的伪分布搭建基本都是不怎么能跑的,各种博客都是互相转载,所以在撸了一部分官方文档之后,结合一些有点用的博客,总算是把这个环境打好了,正所以环境都不会搭,还谈什么开发,也是为了防止自己玩崩 hadoop 忘了怎么装,就写了这个,有兴趣的也可以考虑坑一下,对于 Linux 的话,教程很多,如果有时间,会再出一篇,各位看官往下看吧。
总环境配置
Mac OS X EI Captian 10.11.4
java version "1.8.0_77"
Hadoop 2.7.2
Xcode 7.3
Homebrew 0.9.5
一、预装环境配置
1. Homebrew
-
打开<终端>窗口, 粘贴以下脚本
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
2. JAVA
-
Oracle 官网下载 JDK8 的 Mac OS X 安装包:Java SE Downloads
-
打开下载的 dmg 文件,双击包中的 pkg 文件进行安装
-
打开<终端>,输入
java -version
-
显示为
java version "1.8.0_77" Java(TM) SE Runtime Environment (build 1.8.0_77-b03) Java HotSpot(TM) 64-Bit Server VM (build 25.77-b03, mixed mode)
-
JDK目录为
/Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home
3. Xcode
-
打开 App Store 进行下载
-
PS:速度可能不是很快,但是官方的还是很安全
二、配置 SSH
为了保证远程登录管理 Hadoop 及 Hadoop 节点用户共享的安全性,Hadoop 需要配置使用 SSH 协议
-
打开系统偏好设置-共享-远程登录-允许访问-所有用户
-
打开<终端>,分别输入
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa cat ~/.ssh/id_dsa.pub >>~/.ssh/authorized_keys
-
配置好之后,输入
ssh localhost
-
显示
Last login: Mon Apr 4 15:30:53 2016
-
或者类似时间信息,即配置完成
三、安装及配置 Hadoop
1.安装 Hadoop
-
<终端>输入
brew install hadoop
-
显示如下即安装成功
==> Downloading https://www.apache.org/dyn/closer.cgi?path=hadoop/common/hadoop- ==> Best Mirror http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.7.2/hadoop- ######################################################################## 100.0% ==> Caveats In Hadoop's config file: /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/hadoop-env.sh, /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/mapred-env.sh and /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/yarn-env.sh $JAVA_HOME has been set to be the output of: /usr/libexec/java_home ==> Summary ? /usr/local/Cellar/hadoop/2.7.2: 6,304 files, 309.8M, built in 2 minutes 43 seconds
2. 配置伪分布式 Hadoop
(1)配置 hadoop-env.sh
-
<终端>输入
open /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/hadoop-env.sh
-
将
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
-
修改为
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="
(2) 配置 yarn-env.sh
-
<终端>输入
open /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/yarn-env.sh
-
加入
YARN_OPTS="$YARN_OPTS -Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"
(3)配置 core-site.xml
-
<终端>输入
open /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/core-site.xml
-
编辑
<property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property>
(4) 配置 hdfs-core.xml
-
<终端>输入
open /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/hdfs-site.xml
-
编辑
<property> <name>dfs.replication</name> <value>1</value> </property>
(5) 配置 mapred-site.xml
-
<终端>依次输入
cp /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/mapred-site.xml.template /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/mapred-site.xml open /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/mapred-site.xml
-
编辑
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
(6) 配置 yarn-site.xml
-
<终端>输入
open /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/yarn-site.xml
-
编辑
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property>
3. 格式化 HDFS
-
<终端>输入
rm -rf /tmp/hadoop-tanjiti #如果之前安装过需要清除 hadoop namenode -format
4.启动
-
找到sbin目录
cd /usr/local/Cellar/hadoop/2.7.2/sbin
(1)启动 HDFS
./start-dfs.sh
(2) 启动 MapReduce
./start-yarn.sh
(3) 检查启动情况
jps
-
结果
6467 Jps 5991 DataNode 6343 NodeManager 6106 SecondaryNameNode 6251 ResourceManager 5901 NameNode
5.运行 MapReduce 自带实例
-
测算pi值的实例
hadoop jar /usr/local/Cellar/hadoop/2.7.2/libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar pi 2 5
-
结果
Number of Maps = 2
Samples per Map = 5
16/04/04 16:34:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Wrote input for Map #0
Wrote input for Map #1
Starting Job
16/04/04 16:34:52 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/04/04 16:34:53 INFO input.FileInputFormat: Total input paths to process : 2
16/04/04 16:34:53 INFO mapreduce.JobSubmitter: number of splits:2
16/04/04 16:34:53 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1459758345965_0002
16/04/04 16:34:53 INFO impl.YarnClientImpl: Submitted application application_1459758345965_0002
16/04/04 16:34:53 INFO mapreduce.Job: The url to track the job: http://mbp.local:8088/proxy/application_1459758345965_0002/
16/04/04 16:34:53 INFO mapreduce.Job: Running job: job_1459758345965_0002
16/04/04 16:34:59 INFO mapreduce.Job: Job job_1459758345965_0002 running in uber mode : false
16/04/04 16:34:59 INFO mapreduce.Job: map 0% reduce 0%
16/04/04 16:35:06 INFO mapreduce.Job: map 100% reduce 0%
16/04/04 16:35:12 INFO mapreduce.Job: map 100% reduce 100%
16/04/04 16:35:12 INFO mapreduce.Job: Job job_1459758345965_0002 completed successfully
16/04/04 16:35:12 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=50
FILE: Number of bytes written=353319
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=526
HDFS: Number of bytes written=215
HDFS: Number of read operations=11
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=7821
Total time spent by all reduces in occupied slots (ms)=2600
Total time spent by all map tasks (ms)=7821
Total time spent by all reduce tasks (ms)=2600
Total vcore-milliseconds taken by all map tasks=7821
Total vcore-milliseconds taken by all reduce tasks=2600
Total megabyte-milliseconds taken by all map tasks=8008704
Total megabyte-milliseconds taken by all reduce tasks=2662400
Map-Reduce Framework
Map input records=2
Map output records=4
Map output bytes=36
Map output materialized bytes=56
Input split bytes=290
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=56
Reduce input records=4
Reduce output records=0
Spilled Records=8
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=196
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=547356672
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=236
File Output Format Counters
Bytes Written=97
Job Finished in 20.021 seconds
Estimated value of Pi is 3.60000000000000000000
6.可视化查看
- 通过web接口查看
- Cluster Status http://localhost:8088
- HDFS status http://localhost:50070
- secondaryNamenode http://localhost:50090
四、总结
其实配置起来,如果按照上面的话,其实很快,但摸索的时候坑多,网速什么,路径什么,没事就会崩一崩。
环境搭好,继续撸理论,与一些也做这个的朋友们讨论了一下,还是要补一下统计学的知识,如果部门谁有兴趣,可以试一试哦。
ChangLiuuu, nagaame, xinshangshangxin, limboinf, hheedat and 5 moreizhangzhihaoizhangzhihaoizhangzhihao