Thursday, May 19, 2016

hadoop : sort 排序研究

參考：
http://blog.csdn.net/xw13106209/article/details/6881081

log 分析：

16/05/19 14:47:39 INFO mapreduce.Job: Counters: 23
File System Counters
FILE: Number of bytes read=2192404
FILE: Number of bytes written=4203064
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=4860966398
HDFS: Number of bytes written=4860283767
HDFS: Number of read operations=212
HDFS: Number of large read operations=0
HDFS: Number of write operations=80
Map-Reduce Framework
Map input records=102319
Map output records=102319
Input split bytes=848
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=37
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=2618818560
File Input Format Counters
Bytes Read=1077457472
File Output Format Counters
Bytes Written=1077292254
Job ended: Thu May 19 14:47:39 CST 2016
The job took 19 seconds.

排序結果：

hadoop : teragen 範例

實驗commands:
16.
teragen: Generate data for the terasort

17.
terasort: Run the terasort

18.
teravalidate: Checking results of terasort

1.參考:

https://discuss.zendesk.com/hc/en-us/articles/200927666-Running-TeraSort-MapReduce-Benchmark

command:
1.
yarn jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar teragen 100000000 /tera3

16/05/19 15:06:09 INFO mapreduce.Job: map 35% reduce 0%
16/05/19 15:06:11 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:12 INFO mapreduce.Job: map 38% reduce 0%
16/05/19 15:06:14 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:15 INFO mapreduce.Job: map 42% reduce 0%
16/05/19 15:06:17 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:18 INFO mapreduce.Job: map 45% reduce 0%
16/05/19 15:06:20 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:21 INFO mapreduce.Job: map 49% reduce 0%
16/05/19 15:06:23 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:24 INFO mapreduce.Job: map 52% reduce 0%
16/05/19 15:06:26 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:27 INFO mapreduce.Job: map 56% reduce 0%
16/05/19 15:06:29 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:30 INFO mapreduce.Job: map 59% reduce 0%
16/05/19 15:06:32 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:33 INFO mapreduce.Job: map 63% reduce 0%
16/05/19 15:06:35 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:36 INFO mapreduce.Job: map 66% reduce 0%
16/05/19 15:06:38 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:39 INFO mapreduce.Job: map 70% reduce 0%
16/05/19 15:06:41 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:42 INFO mapreduce.Job: map 73% reduce 0%
16/05/19 15:06:44 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:45 INFO mapreduce.Job: map 77% reduce 0%
16/05/19 15:06:47 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:48 INFO mapreduce.Job: map 80% reduce 0%
16/05/19 15:06:50 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:51 INFO mapreduce.Job: map 84% reduce 0%
16/05/19 15:06:53 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:54 INFO mapreduce.Job: map 87% reduce 0%
16/05/19 15:06:56 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:57 INFO mapreduce.Job: map 90% reduce 0%
16/05/19 15:06:59 INFO mapred.LocalJobRunner: map > map
16/05/19 15:07:00 INFO mapreduce.Job: map 94% reduce 0%
16/05/19 15:07:02 INFO mapred.LocalJobRunner: map > map
16/05/19 15:07:03 INFO mapreduce.Job: map 97% reduce 0%
16/05/19 15:07:04 INFO mapred.LocalJobRunner: map > map
16/05/19 15:07:04 INFO mapred.Task: Task:attempt_local302140298_0001_m_000000_0 is done. And is in the process of committing
16/05/19 15:07:04 INFO mapred.LocalJobRunner: map > map
16/05/19 15:07:04 INFO mapred.Task: Task attempt_local302140298_0001_m_000000_0 is allowed to commit now
16/05/19 15:07:04 INFO output.FileOutputCommitter: Saved output of task 'attempt_local302140298_0001_m_000000_0' to hdfs://localhost:9000/tera3/_temporary/0/task_local302140298_0001_m_000000

檢查：9.31G產生

command
2:
yarn jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar terasort /tera3 /tera3-sort

log分析

INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=433799246528
FILE: Number of bytes written=831560527500
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=393214197000
HDFS: Number of bytes written=10000000000
HDFS: Number of read operations=7525
HDFS: Number of large read operations=0
HDFS: Number of write operations=154
Map-Reduce Framework
Map input records=100000000
Map output records=100000000
Map output bytes=10200000000
Map output materialized bytes=10400000450
Input split bytes=7875
Combine input records=0
Combine output records=0
Reduce input groups=100000000
Reduce shuffle bytes=10400000450
Reduce input records=100000000
Reduce output records=100000000
Spilled Records=346976200
Shuffled Maps =75
Failed Shuffles=0
Merged Map outputs=75
GC time elapsed (ms)=12738
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=77072957440
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=10000000000
File Output Format Counters
Bytes Written=10000000000
16/05/19 15:23:16 INFO terasort.TeraSort: done

3. 結果

＝＝＝＝＝＝＝＝＝＝＝＝＝＝
yarn jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar teravalidate -D mapred.reduce.tasks=8 /tera3-sort /teraValidate

Log 分析：

16/05/19 15:31:09 INFO output.FileOutputCommitter: Saved output of task 'attempt_local112206802_0001_r_000000_0' to hdfs://localhost:9000/teraValidate/_temporary/0/task_local112206802_0001_r_000000

16/05/19 15:31:09 INFO mapred.LocalJobRunner: reduce > reduce

16/05/19 15:31:09 INFO mapred.Task: Task 'attempt_local112206802_0001_r_000000_0' done.

16/05/19 15:31:09 INFO mapred.LocalJobRunner: Finishing task: attempt_local112206802_0001_r_000000_0

16/05/19 15:31:09 INFO mapred.LocalJobRunner: reduce task executor complete.

16/05/19 15:31:10 INFO mapreduce.Job: map 100% reduce 100%

16/05/19 15:31:10 INFO mapreduce.Job: Job job_local112206802_0001 completed successfully

16/05/19 15:31:10 INFO mapreduce.Job: Counters: 38

File System Counters

FILE: Number of bytes read=541210

FILE: Number of bytes written=1046767

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=20000000000

HDFS: Number of bytes written=25

HDFS: Number of read operations=15

HDFS: Number of large read operations=0

HDFS: Number of write operations=4

Map-Reduce Framework

Map input records=100000000

Map output records=3

Map output bytes=83

Map output materialized bytes=95

Input split bytes=110

Combine input records=0

Combine output records=0

Reduce input groups=3

Reduce shuffle bytes=95

Reduce input records=3

Reduce output records=1

Spilled Records=6

Shuffled Maps =1

Failed Shuffles=0

Merged Map outputs=1

GC time elapsed (ms)=362

CPU time spent (ms)=0

Physical memory (bytes) snapshot=0

Virtual memory (bytes) snapshot=0

Total committed heap usage (bytes)=747634688

Shuffle Errors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters

Bytes Read=10000000000

File Output Format Counters

Bytes Written=25

hadoop : RandomWriter

參考：
http://wiki.apache.org/hadoop/RandomWriter
http://blog.csdn.net/xw13106209/article/details/6881001

step2:
command:
yarn jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar randomwriter /random

step3:
結果：產生1GB的資料:

細部過程：

16/05/19 14:43:33 INFO output.FileOutputCommitter: Saved output of task 'attempt_local636254_0001_m_000000_0' to hdfs://localhost:9000/user/hduser/rand/_temporary/0/task_local636254_0001_m_000000

--------------- 所有的範例命令----------------------------------------------------

An example program must be given as the first argument.
Valid program names are:

1.
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.

2.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.

3.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.

4.
dbcount: An example job that count the pageview counts from a database.

5.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.

6.
grep: A map/reduce program that counts the matches of a regex in the input.

7.
join: A job that effects a join over sorted, equally partitioned datasets

8.
multifilewc: A job that counts words from several files.

9.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.

10.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.

11.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.

12.
randomwriter: A map/reduce program that writes 10GB of random data per node.

13.
secondarysort: An example defining a secondary sort to the reduce.

14. --> done 5.19
sort: A map/reduce program that sorts the data written by the random writer.

15.
sudoku: A sudoku solver.

16.
teragen: Generate data for the terasort

17.
terasort: Run the terasort

18.
teravalidate: Checking results of terasort

19.
wordcount: A map/reduce program that counts the words in the input files.

20.
wordmean: A map/reduce program that counts the average length of the words in the input files.

21.
wordmedian: A map/reduce program that counts the median length of the words in the input files.

22.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

Wednesday, May 18, 2016

01_build the hadoop source code

建置環境：
http://learngeb-ebook.readbook.tw/install/index.html

1.maven
2.docker

curl

BUILDING APACHE HADOOP FROM SOURCE

https://pravinchavan.wordpress.com/2013/04/14/building-apache-hadoop-from-source/

[ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:3.0.0-alpha1-SNAPSHOT:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: 'protoc --version' did not return a version -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn <goals> -rf :hadoop-common

成功了～

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop Main ................................ SUCCESS [11.189s]
[INFO] Apache Hadoop Build Tools ......................... SUCCESS [0.259s]
[INFO] Apache Hadoop Project POM ......................... SUCCESS [0.285s]
[INFO] Apache Hadoop Annotations ......................... SUCCESS [1.363s]
[INFO] Apache Hadoop Assemblies .......................... SUCCESS [0.053s]
[INFO] Apache Hadoop Project Dist POM .................... SUCCESS [0.798s]
[INFO] Apache Hadoop Maven Plugins ....................... SUCCESS [1.930s]
[INFO] Apache Hadoop MiniKDC ............................. SUCCESS [1.126s]
[INFO] Apache Hadoop Auth ................................ SUCCESS [1.912s]
[INFO] Apache Hadoop Auth Examples ....................... SUCCESS [1.882s]
[INFO] Apache Hadoop Common .............................. SUCCESS [1:06.348s]
[INFO] Apache Hadoop NFS ................................. SUCCESS [3.134s]
[INFO] Apache Hadoop KMS ................................. SUCCESS [16.271s]
[INFO] Apache Hadoop Common Project ...................... SUCCESS [0.070s]
[INFO] Apache Hadoop HDFS Client ......................... SUCCESS [1:33.562s]
[INFO] Apache Hadoop HDFS ................................ SUCCESS [1:16.122s]
[INFO] Apache Hadoop HDFS Native Client .................. SUCCESS [1.007s]
[INFO] Apache Hadoop HttpFS .............................. SUCCESS [42.760s]
[INFO] Apache Hadoop HDFS BookKeeper Journal ............. SUCCESS [19.795s]
[INFO] Apache Hadoop HDFS-NFS ............................ SUCCESS [3.031s]
[INFO] Apache Hadoop HDFS Project ........................ SUCCESS [0.064s]
[INFO] Apache Hadoop YARN ................................ SUCCESS [0.043s]
[INFO] Apache Hadoop YARN API ............................ SUCCESS [31.200s]
[INFO] Apache Hadoop YARN Common ......................... SUCCESS [35.082s]
[INFO] Apache Hadoop YARN Server ......................... SUCCESS [0.046s]
[INFO] Apache Hadoop YARN Server Common .................. SUCCESS [10.946s]
[INFO] Apache Hadoop YARN NodeManager .................... SUCCESS [9.749s]
[INFO] Apache Hadoop YARN Web Proxy ...................... SUCCESS [1.870s]
[INFO] Apache Hadoop YARN ApplicationHistoryService ...... SUCCESS [7.364s]
[INFO] Apache Hadoop YARN ResourceManager ................ SUCCESS [15.382s]
[INFO] Apache Hadoop YARN Server Tests ................... SUCCESS [2.757s]
[INFO] Apache Hadoop YARN Client ......................... SUCCESS [3.846s]
[INFO] Apache Hadoop YARN SharedCacheManager ............. SUCCESS [1.708s]
[INFO] Apache Hadoop YARN Timeline Plugin Storage ........ SUCCESS [1.536s]
[INFO] Apache Hadoop YARN Applications ................... SUCCESS [0.018s]
[INFO] Apache Hadoop YARN DistributedShell ............... SUCCESS [1.518s]
[INFO] Apache Hadoop YARN Unmanaged Am Launcher .......... SUCCESS [1.106s]
[INFO] Apache Hadoop YARN Site ........................... SUCCESS [0.016s]
[INFO] Apache Hadoop YARN Registry ....................... SUCCESS [2.672s]
[INFO] Apache Hadoop YARN Project ........................ SUCCESS [3.012s]
[INFO] Apache Hadoop MapReduce Client .................... SUCCESS [0.031s]
[INFO] Apache Hadoop MapReduce Core ...................... SUCCESS [16.635s]
[INFO] Apache Hadoop MapReduce Common .................... SUCCESS [10.791s]
[INFO] Apache Hadoop MapReduce Shuffle ................... SUCCESS [1.988s]
[INFO] Apache Hadoop MapReduce App ....................... SUCCESS [5.929s]
[INFO] Apache Hadoop MapReduce HistoryServer ............. SUCCESS [3.182s]
[INFO] Apache Hadoop MapReduce JobClient ................. SUCCESS [5.845s]
[INFO] Apache Hadoop MapReduce HistoryServer Plugins ..... SUCCESS [1.015s]
[INFO] Apache Hadoop MapReduce NativeTask ................ SUCCESS [2.589s]
[INFO] Apache Hadoop MapReduce Examples .................. SUCCESS [3.414s]
[INFO] Apache Hadoop MapReduce ........................... SUCCESS [2.297s]
[INFO] Apache Hadoop MapReduce Streaming ................. SUCCESS [14.670s]
[INFO] Apache Hadoop Distributed Copy .................... SUCCESS [5.451s]
[INFO] Apache Hadoop Archives ............................ SUCCESS [1.438s]
[INFO] Apache Hadoop Archive Logs ........................ SUCCESS [1.368s]
[INFO] Apache Hadoop Rumen ............................... SUCCESS [3.274s]
[INFO] Apache Hadoop Gridmix ............................. SUCCESS [2.620s]
[INFO] Apache Hadoop Data Join ........................... SUCCESS [1.367s]
[INFO] Apache Hadoop Ant Tasks ........................... SUCCESS [1.102s]
[INFO] Apache Hadoop Extras .............................. SUCCESS [1.284s]
[INFO] Apache Hadoop Pipes ............................... SUCCESS [0.011s]
[INFO] Apache Hadoop OpenStack support ................... SUCCESS [2.503s]
[INFO] Apache Hadoop Amazon Web Services support ......... SUCCESS [12.085s]
[INFO] Apache Hadoop Azure support ....................... SUCCESS [6.075s]
[INFO] Apache Hadoop Client .............................. SUCCESS [4.251s]
[INFO] Apache Hadoop Mini-Cluster ........................ SUCCESS [0.129s]
[INFO] Apache Hadoop Scheduler Load Simulator ............ SUCCESS [2.218s]
[INFO] Apache Hadoop Tools Dist .......................... SUCCESS [3.678s]
[INFO] Apache Hadoop Kafka Library support ............... SUCCESS [24.893s]
[INFO] Apache Hadoop Tools ............................... SUCCESS [0.016s]
[INFO] Apache Hadoop Distribution ........................ SUCCESS [19.842s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 10:36.200s
[INFO] Finished at: Wed May 18 09:26:34 CST 2016
[INFO] Final Memory: 246M/964M
[INFO] ------------------------------------------------------------------------
[WARNING] The requested profile "doc" could not be activated because it does not exist.

檔案位於：

Hadoop dist tar available at: 00_hadoop/hadoop/hadoop-dist/target/hadoop-3.0.0-alpha1-SNAPSHOT.tar.gz

3. good site:

http://www.inside.com.tw/2015/03/12/big-data-4-hadoop

Tuesday, May 17, 2016

hadoop : 使用自己寫的word count

http://glj8989332.blogspot.tw/2015/09/windows-hadoop-eclipse-mapreduce-wordcount.html

step 1:

Maven 的設定

step 2:

build my word count successfully.

public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {

/* 初始化 */

Configuration conf = new Configuration();

/* 建立MapReduce Job, 該job的名稱為MyWordcount */

@SuppressWarnings("deprecation")

Job job = new Job(conf,"KWordCnt");

/* 啟動job的jar class 為MyWordcount */

job.setJarByClass(KWordCnt.class);

/* 啟動job的map class 為MyMapper */

job.setMapperClass(MyMapper.class);

/* 啟動job的reduce class 為MyReducer */

job.setReducerClass(MyReducer.class);

/* 輸入資料的HDFS路徑 */

//FileInputFormat.addInputPath(job, new Path("/input02"));

FileInputFormat.addInputPath(job, new Path("/1.csv"));

step 3:

比較結果：使用原始的jar (一樣1489928）

yarn jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /1.csv /2

get hadoop source code

step 1: setup ssh for github
git

step 2: git gui

tig:

giggle

step 3: import the hadoop into eclipse

Monday, May 16, 2016

hadoop : 範例練習

Pi的計算:

yarn jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 16 1000000

wordcount 範例
\yarn jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /1.csv /2

結果：

pig command 練習

請參考：
http://glj8989332.blogspot.tw/2015/11/hadoop-pig-0150.html

step1 : put the file to server
hadoop fs -put Employee_Salaries_2014.csv hdfs://localhost:9000/salary/1.csv

failure:

Input(s):
Failed to read data from "hdfs://localhost:9000/salary/1.csv"

解法：salary刪掉

grunt> salarydata = LOAD 'hdfs://localhost:9000/1.csv' USING PigStorage(',') AS ( FullName1:chararray , FullName2:chararray ,Gender:chararray , CurrentAnnualSalary : chararray,GrossPayReceived2014 : chararray ,OvertimePay2014:chararray , Department:chararray , DepartmentName:chararray,Division:chararray, AssignmentCategory: chararray , PositionTitle:chararray, UnderfilledJobTitle: chararray,DateFirstHired:chararray);

DUMP salarydata

結果：

install hadoop in ubuntu

1. link
http://www.bogotobogo.com/Hadoop/BigData_hadoop_Install_on_ubuntu_single_node_cluster.php

2.

Exception in thread "main" java.lang.IllegalArgumentException: Invalid URI for NameNode address (check fs.defaultFS): file:/// has no authority.

hduser@layer1athome:/usr/local$ ls -all /app/hadoop/tmp

總計 8

drwxr-xr-x 2 hduser hadoop 4096 5月 16 11:49 .

drwxr-xr-x 3 root root 4096 5月 16 11:49 ..

hduser@layer1athome:/usr/local$

WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

start-dfs.sh

16/05/16 15:58:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.

add this 有解：

export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=/usr/local/hadoop/lib/native"

Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.

解法：

link:http://solaimurugan.blogspot.tw/2014/06/issues-solution-hdfs-high-availability.html

結果：namenode and data node 出現了

manager-layer1athome.out

localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hduser-nodemanager-layer1athome.out

hduser@layer1athome:/$ jps

1804 SecondaryNameNode

2000 ResourceManager

2254 NodeManager

2679 Jps

1603 DataNode

1436 NameNode

hduser@layer1athome:/$

建立HDFS

http://glj8989332.blogspot.tw/2015/09/hadoop-hdfs-mapreduce-wordcount.html

0.jps

成功上傳：

1.建立HDFS

安裝好Hadoop Cluster後，接著要在HDFS放上資料，並執行Hadoop經典的程式範例 - Wordcount。其程式名稱如字面所示，是可以計算文字檔裡面詞彙的數量。
　　
　　安裝的那一篇前言提到，HDFS(Hadoop Distributed File System)是分散式的檔案系統，要透過Hadoop做運算，都得從HDFS存取資料。
　　
　　首先我們把local的資料搬到HDFS上，將hadoop/etc/hadoop/目錄下各種參數設定的檔案搬過去，在hadoop01(slaves也行)要下此指令：

1	`hadoop dfs -put ~/hadoop-2.7.1/etc/hadoop` `/input01`

　　先來分析這指令，

dfs：要做HDFS的存取，都要用這參數，或者使用fs行，兩種功能是一樣的。
-put：要從server local的資料搬到HDFS上，要用此-put參數
src_dir1 src_dir2 ...：-put後面接著的參數是local資料的路徑，其目錄可以不只1個，而本篇是只有用1個目錄：~/hadoop-2.7.1/etc/hadoop。
des_dir：-put最後一個的參數則是HDFS的目錄，本篇是放在/input01下。原先我的HDFS沒有/input01這目錄，使用-put後，Hadoop會自動創立此目錄。

更多的HDFS command指令請參考Hadoop Documentation File System Shell Guide ，和Linux的檔案系統指令非常相像，若有熟悉使用Linux將會很快上手。

　　放上去之後，一起用web介面查看是否有上傳成功，其畫面如下：

應用筆記

Pages