Thursday, May 19, 2016

hadoop : sort 排序研究


參考:
http://blog.csdn.net/xw13106209/article/details/6881081



log 分析:

16/05/19 14:47:39 INFO mapreduce.Job: Counters: 23
File System Counters
FILE: Number of bytes read=2192404
FILE: Number of bytes written=4203064
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=4860966398
HDFS: Number of bytes written=4860283767
HDFS: Number of read operations=212
HDFS: Number of large read operations=0
HDFS: Number of write operations=80
Map-Reduce Framework
Map input records=102319
Map output records=102319
Input split bytes=848
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=37
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=2618818560
File Input Format Counters
Bytes Read=1077457472
File Output Format Counters
Bytes Written=1077292254
Job ended: Thu May 19 14:47:39 CST 2016
The job took 19 seconds.





排序結果:






hadoop : teragen 範例


實驗commands:
16.
teragen: Generate data for the terasort

17.
terasort: Run the terasort

18.
teravalidate: Checking results of terasort

1.參考:

https://discuss.zendesk.com/hc/en-us/articles/200927666-Running-TeraSort-MapReduce-Benchmark


command:
1.
yarn jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar teragen 100000000 /tera3


16/05/19 15:06:09 INFO mapreduce.Job:  map 35% reduce 0%
16/05/19 15:06:11 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:12 INFO mapreduce.Job:  map 38% reduce 0%
16/05/19 15:06:14 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:15 INFO mapreduce.Job:  map 42% reduce 0%
16/05/19 15:06:17 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:18 INFO mapreduce.Job:  map 45% reduce 0%
16/05/19 15:06:20 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:21 INFO mapreduce.Job:  map 49% reduce 0%
16/05/19 15:06:23 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:24 INFO mapreduce.Job:  map 52% reduce 0%
16/05/19 15:06:26 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:27 INFO mapreduce.Job:  map 56% reduce 0%
16/05/19 15:06:29 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:30 INFO mapreduce.Job:  map 59% reduce 0%
16/05/19 15:06:32 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:33 INFO mapreduce.Job:  map 63% reduce 0%
16/05/19 15:06:35 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:36 INFO mapreduce.Job:  map 66% reduce 0%
16/05/19 15:06:38 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:39 INFO mapreduce.Job:  map 70% reduce 0%
16/05/19 15:06:41 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:42 INFO mapreduce.Job:  map 73% reduce 0%
16/05/19 15:06:44 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:45 INFO mapreduce.Job:  map 77% reduce 0%
16/05/19 15:06:47 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:48 INFO mapreduce.Job:  map 80% reduce 0%
16/05/19 15:06:50 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:51 INFO mapreduce.Job:  map 84% reduce 0%
16/05/19 15:06:53 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:54 INFO mapreduce.Job:  map 87% reduce 0%
16/05/19 15:06:56 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:57 INFO mapreduce.Job:  map 90% reduce 0%
16/05/19 15:06:59 INFO mapred.LocalJobRunner: map > map
16/05/19 15:07:00 INFO mapreduce.Job:  map 94% reduce 0%
16/05/19 15:07:02 INFO mapred.LocalJobRunner: map > map
16/05/19 15:07:03 INFO mapreduce.Job:  map 97% reduce 0%
16/05/19 15:07:04 INFO mapred.LocalJobRunner: map > map
16/05/19 15:07:04 INFO mapred.Task: Task:attempt_local302140298_0001_m_000000_0 is done. And is in the process of committing
16/05/19 15:07:04 INFO mapred.LocalJobRunner: map > map
16/05/19 15:07:04 INFO mapred.Task: Task attempt_local302140298_0001_m_000000_0 is allowed to commit now
16/05/19 15:07:04 INFO output.FileOutputCommitter: Saved output of task 'attempt_local302140298_0001_m_000000_0' to hdfs://localhost:9000/tera3/_temporary/0/task_local302140298_0001_m_000000


檢查:9.31G產生


command
2:
yarn jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar terasort /tera3 /tera3-sort

log分析

INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=433799246528
FILE: Number of bytes written=831560527500
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=393214197000
HDFS: Number of bytes written=10000000000
HDFS: Number of read operations=7525
HDFS: Number of large read operations=0
HDFS: Number of write operations=154
Map-Reduce Framework
Map input records=100000000
Map output records=100000000
Map output bytes=10200000000
Map output materialized bytes=10400000450
Input split bytes=7875
Combine input records=0
Combine output records=0
Reduce input groups=100000000
Reduce shuffle bytes=10400000450
Reduce input records=100000000
Reduce output records=100000000
Spilled Records=346976200
Shuffled Maps =75
Failed Shuffles=0
Merged Map outputs=75
GC time elapsed (ms)=12738
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=77072957440
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=10000000000
File Output Format Counters
Bytes Written=10000000000
16/05/19 15:23:16 INFO terasort.TeraSort: done


3. 結果



==============
 yarn jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar teravalidate -D mapred.reduce.tasks=8 /tera3-sort /teraValidate


Log 分析:

16/05/19 15:31:09 INFO output.FileOutputCommitter: Saved output of task 'attempt_local112206802_0001_r_000000_0' to hdfs://localhost:9000/teraValidate/_temporary/0/task_local112206802_0001_r_000000
16/05/19 15:31:09 INFO mapred.LocalJobRunner: reduce > reduce
16/05/19 15:31:09 INFO mapred.Task: Task 'attempt_local112206802_0001_r_000000_0' done.
16/05/19 15:31:09 INFO mapred.LocalJobRunner: Finishing task: attempt_local112206802_0001_r_000000_0
16/05/19 15:31:09 INFO mapred.LocalJobRunner: reduce task executor complete.
16/05/19 15:31:10 INFO mapreduce.Job:  map 100% reduce 100%
16/05/19 15:31:10 INFO mapreduce.Job: Job job_local112206802_0001 completed successfully
16/05/19 15:31:10 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=541210
FILE: Number of bytes written=1046767
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=20000000000
HDFS: Number of bytes written=25
HDFS: Number of read operations=15
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Map-Reduce Framework
Map input records=100000000
Map output records=3
Map output bytes=83
Map output materialized bytes=95
Input split bytes=110
Combine input records=0
Combine output records=0
Reduce input groups=3
Reduce shuffle bytes=95
Reduce input records=3
Reduce output records=1
Spilled Records=6
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=362
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=747634688
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters 
Bytes Read=10000000000
File Output Format Counters 
Bytes Written=25



hadoop : RandomWriter


參考:
http://wiki.apache.org/hadoop/RandomWriter
http://blog.csdn.net/xw13106209/article/details/6881001



step2:
command:
yarn jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar randomwriter /random


step3:
結果:產生1GB的資料:


細部過程:

16/05/19 14:43:33 INFO output.FileOutputCommitter: Saved output of task 'attempt_local636254_0001_m_000000_0' to hdfs://localhost:9000/user/hduser/rand/_temporary/0/task_local636254_0001_m_000000




--------------- 所有的範例 命令----------------------------------------------------

An example program must be given as the first argument.
Valid program names are:

1.
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.

2.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.

3.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.

4.
dbcount: An example job that count the pageview counts from a database.

5.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.

6.
grep: A map/reduce program that counts the matches of a regex in the input.

7.
join: A job that effects a join over sorted, equally partitioned datasets

8.
multifilewc: A job that counts words from several files.

9.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.

10.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.

11.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.

12.
randomwriter: A map/reduce program that writes 10GB of random data per node.

13.
secondarysort: An example defining a secondary sort to the reduce.

14.     --> done 5.19
sort: A map/reduce program that sorts the data written by the random writer.

15.
sudoku: A sudoku solver.

16.
teragen: Generate data for the terasort

17.
terasort: Run the terasort

18.
teravalidate: Checking results of terasort

19.
wordcount: A map/reduce program that counts the words in the input files.

20.
wordmean: A map/reduce program that counts the average length of the words in the input files.

21.
wordmedian: A map/reduce program that counts the median length of the words in the input files.

22.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.



Wednesday, May 18, 2016

01_build the hadoop source code


建置環境:
http://learngeb-ebook.readbook.tw/install/index.html


1.maven
2.docker

curl

BUILDING APACHE HADOOP FROM SOURCE


https://pravinchavan.wordpress.com/2013/04/14/building-apache-hadoop-from-source/


[ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:3.0.0-alpha1-SNAPSHOT:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: 'protoc --version' did not return a version -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :hadoop-common

成功了~

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop Main ................................ SUCCESS [11.189s]
[INFO] Apache Hadoop Build Tools ......................... SUCCESS [0.259s]
[INFO] Apache Hadoop Project POM ......................... SUCCESS [0.285s]
[INFO] Apache Hadoop Annotations ......................... SUCCESS [1.363s]
[INFO] Apache Hadoop Assemblies .......................... SUCCESS [0.053s]
[INFO] Apache Hadoop Project Dist POM .................... SUCCESS [0.798s]
[INFO] Apache Hadoop Maven Plugins ....................... SUCCESS [1.930s]
[INFO] Apache Hadoop MiniKDC ............................. SUCCESS [1.126s]
[INFO] Apache Hadoop Auth ................................ SUCCESS [1.912s]
[INFO] Apache Hadoop Auth Examples ....................... SUCCESS [1.882s]
[INFO] Apache Hadoop Common .............................. SUCCESS [1:06.348s]
[INFO] Apache Hadoop NFS ................................. SUCCESS [3.134s]
[INFO] Apache Hadoop KMS ................................. SUCCESS [16.271s]
[INFO] Apache Hadoop Common Project ...................... SUCCESS [0.070s]
[INFO] Apache Hadoop HDFS Client ......................... SUCCESS [1:33.562s]
[INFO] Apache Hadoop HDFS ................................ SUCCESS [1:16.122s]
[INFO] Apache Hadoop HDFS Native Client .................. SUCCESS [1.007s]
[INFO] Apache Hadoop HttpFS .............................. SUCCESS [42.760s]
[INFO] Apache Hadoop HDFS BookKeeper Journal ............. SUCCESS [19.795s]
[INFO] Apache Hadoop HDFS-NFS ............................ SUCCESS [3.031s]
[INFO] Apache Hadoop HDFS Project ........................ SUCCESS [0.064s]
[INFO] Apache Hadoop YARN ................................ SUCCESS [0.043s]
[INFO] Apache Hadoop YARN API ............................ SUCCESS [31.200s]
[INFO] Apache Hadoop YARN Common ......................... SUCCESS [35.082s]
[INFO] Apache Hadoop YARN Server ......................... SUCCESS [0.046s]
[INFO] Apache Hadoop YARN Server Common .................. SUCCESS [10.946s]
[INFO] Apache Hadoop YARN NodeManager .................... SUCCESS [9.749s]
[INFO] Apache Hadoop YARN Web Proxy ...................... SUCCESS [1.870s]
[INFO] Apache Hadoop YARN ApplicationHistoryService ...... SUCCESS [7.364s]
[INFO] Apache Hadoop YARN ResourceManager ................ SUCCESS [15.382s]
[INFO] Apache Hadoop YARN Server Tests ................... SUCCESS [2.757s]
[INFO] Apache Hadoop YARN Client ......................... SUCCESS [3.846s]
[INFO] Apache Hadoop YARN SharedCacheManager ............. SUCCESS [1.708s]
[INFO] Apache Hadoop YARN Timeline Plugin Storage ........ SUCCESS [1.536s]
[INFO] Apache Hadoop YARN Applications ................... SUCCESS [0.018s]
[INFO] Apache Hadoop YARN DistributedShell ............... SUCCESS [1.518s]
[INFO] Apache Hadoop YARN Unmanaged Am Launcher .......... SUCCESS [1.106s]
[INFO] Apache Hadoop YARN Site ........................... SUCCESS [0.016s]
[INFO] Apache Hadoop YARN Registry ....................... SUCCESS [2.672s]
[INFO] Apache Hadoop YARN Project ........................ SUCCESS [3.012s]
[INFO] Apache Hadoop MapReduce Client .................... SUCCESS [0.031s]
[INFO] Apache Hadoop MapReduce Core ...................... SUCCESS [16.635s]
[INFO] Apache Hadoop MapReduce Common .................... SUCCESS [10.791s]
[INFO] Apache Hadoop MapReduce Shuffle ................... SUCCESS [1.988s]
[INFO] Apache Hadoop MapReduce App ....................... SUCCESS [5.929s]
[INFO] Apache Hadoop MapReduce HistoryServer ............. SUCCESS [3.182s]
[INFO] Apache Hadoop MapReduce JobClient ................. SUCCESS [5.845s]
[INFO] Apache Hadoop MapReduce HistoryServer Plugins ..... SUCCESS [1.015s]
[INFO] Apache Hadoop MapReduce NativeTask ................ SUCCESS [2.589s]
[INFO] Apache Hadoop MapReduce Examples .................. SUCCESS [3.414s]
[INFO] Apache Hadoop MapReduce ........................... SUCCESS [2.297s]
[INFO] Apache Hadoop MapReduce Streaming ................. SUCCESS [14.670s]
[INFO] Apache Hadoop Distributed Copy .................... SUCCESS [5.451s]
[INFO] Apache Hadoop Archives ............................ SUCCESS [1.438s]
[INFO] Apache Hadoop Archive Logs ........................ SUCCESS [1.368s]
[INFO] Apache Hadoop Rumen ............................... SUCCESS [3.274s]
[INFO] Apache Hadoop Gridmix ............................. SUCCESS [2.620s]
[INFO] Apache Hadoop Data Join ........................... SUCCESS [1.367s]
[INFO] Apache Hadoop Ant Tasks ........................... SUCCESS [1.102s]
[INFO] Apache Hadoop Extras .............................. SUCCESS [1.284s]
[INFO] Apache Hadoop Pipes ............................... SUCCESS [0.011s]
[INFO] Apache Hadoop OpenStack support ................... SUCCESS [2.503s]
[INFO] Apache Hadoop Amazon Web Services support ......... SUCCESS [12.085s]
[INFO] Apache Hadoop Azure support ....................... SUCCESS [6.075s]
[INFO] Apache Hadoop Client .............................. SUCCESS [4.251s]
[INFO] Apache Hadoop Mini-Cluster ........................ SUCCESS [0.129s]
[INFO] Apache Hadoop Scheduler Load Simulator ............ SUCCESS [2.218s]
[INFO] Apache Hadoop Tools Dist .......................... SUCCESS [3.678s]
[INFO] Apache Hadoop Kafka Library support ............... SUCCESS [24.893s]
[INFO] Apache Hadoop Tools ............................... SUCCESS [0.016s]
[INFO] Apache Hadoop Distribution ........................ SUCCESS [19.842s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 10:36.200s
[INFO] Finished at: Wed May 18 09:26:34 CST 2016
[INFO] Final Memory: 246M/964M
[INFO] ------------------------------------------------------------------------
[WARNING] The requested profile "doc" could not be activated because it does not exist.

檔案位於:

Hadoop dist tar available at: 00_hadoop/hadoop/hadoop-dist/target/hadoop-3.0.0-alpha1-SNAPSHOT.tar.gz





3. good site:
http://www.inside.com.tw/2015/03/12/big-data-4-hadoop

Tuesday, May 17, 2016

hadoop : 使用自己寫的word count



http://glj8989332.blogspot.tw/2015/09/windows-hadoop-eclipse-mapreduce-wordcount.html


step 1:

Maven 的設定


step 2: 
build my word count successfully.


 public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {

   
  /* 初始化 */
  Configuration conf = new Configuration();
   
  /* 建立MapReduce Job, 該job的名稱為MyWordcount */
  @SuppressWarnings("deprecation")
Job job = new Job(conf,"KWordCnt");
   
  /* 啟動job的jar class 為MyWordcount */
  job.setJarByClass(KWordCnt.class);
  /* 啟動job的map class 為MyMapper */
  job.setMapperClass(MyMapper.class);
  /* 啟動job的reduce class 為MyReducer */
  job.setReducerClass(MyReducer.class);
   
  /* 輸入資料的HDFS路徑 */
  //FileInputFormat.addInputPath(job, new Path("/input02"));
  FileInputFormat.addInputPath(job, new Path("/1.csv"));


step 3:


















比較結果:使用原始的jar (一樣1489928)

yarn jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /1.csv /2




get hadoop source code


step 1: setup ssh for  github
git


step 2: git gui

tig:



giggle


step 3: import the hadoop into eclipse


Monday, May 16, 2016

hadoop : 範例練習



Pi的計算:

yarn jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 16 1000000




wordcount 範例
 \yarn jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /1.csv /2

結果:


pig command 練習


請參考:
http://glj8989332.blogspot.tw/2015/11/hadoop-pig-0150.html


step1 : put the file to server
hadoop fs -put Employee_Salaries_2014.csv hdfs://localhost:9000/salary/1.csv


failure:

Input(s):
Failed to read data from "hdfs://localhost:9000/salary/1.csv"




解法:salary刪掉

grunt> salarydata = LOAD 'hdfs://localhost:9000/1.csv' USING PigStorage(',') AS ( FullName1:chararray ,  FullName2:chararray ,Gender:chararray , CurrentAnnualSalary : chararray,GrossPayReceived2014 : chararray ,OvertimePay2014:chararray , Department:chararray , DepartmentName:chararray,Division:chararray, AssignmentCategory: chararray , PositionTitle:chararray, UnderfilledJobTitle: chararray,DateFirstHired:chararray);

DUMP salarydata

結果:


install hadoop in ubuntu



1. link
http://www.bogotobogo.com/Hadoop/BigData_hadoop_Install_on_ubuntu_single_node_cluster.php



2.

 Exception in thread "main" java.lang.IllegalArgumentException: Invalid URI for NameNode address (check fs.defaultFS): file:/// has no authority.


hduser@layer1athome:/usr/local$ ls -all /app/hadoop/tmp
總計 8
drwxr-xr-x 2 hduser hadoop 4096  5月 16 11:49 .
drwxr-xr-x 3 root   root   4096  5月 16 11:49 ..
hduser@layer1athome:/usr/local$ 


3. 
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


4. 
start-dfs.sh
16/05/16 15:58:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.

add this 有解:
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=/usr/local/hadoop/lib/native"


Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.


解法:

結果:namenode and data node 出現了
manager-layer1athome.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hduser-nodemanager-layer1athome.out
hduser@layer1athome:/$ jps
1804 SecondaryNameNode
2000 ResourceManager
2254 NodeManager
2679 Jps
1603 DataNode
1436 NameNode
hduser@layer1athome:/$ 


建立HDFS


http://glj8989332.blogspot.tw/2015/09/hadoop-hdfs-mapreduce-wordcount.html

0.jps



成功上傳:


1.建立HDFS

安裝好Hadoop Cluster後,接著要在HDFS放上資料,並執行Hadoop經典的程式範例 - Wordcount。其程式名稱如字面所示,是可以計算文字檔裡面詞彙的數量。
  
  安裝的那一篇前言提到,HDFS(Hadoop Distributed File System)是分散式的檔案系統,要透過Hadoop做運算,都得從HDFS存取資料。
  
  首先我們把local的資料搬到HDFS上,將hadoop/etc/hadoop/目錄下各種參數設定的檔案搬過去,在hadoop01(slaves也行)要下此指令:

?
1
hadoop dfs -put ~/hadoop-2.7.1/etc/hadoop /input01

  先來分析這指令,
  1.  dfs:要做HDFS的存取,都要用這參數,或者使用fs行,兩種功能是一樣的。
  2. -put:要從server local的資料搬到HDFS上,要用此-put參數
  3. src_dir1 src_dir2 ...:-put後面接著的參數是local資料的路徑,其目錄可以不只1個,而本篇是只有用1個目錄:~/hadoop-2.7.1/etc/hadoop。
  4. des_dir:-put最後一個的參數則是HDFS的目錄,本篇是放在/input01下。原先我的HDFS沒有/input01這目錄,使用-put後,Hadoop會自動創立此目錄。
更多的HDFS command指令請參考Hadoop Documentation File System Shell Guide ,和Linux的檔案系統指令非常相像,若有熟悉使用Linux將會很快上手。


  放上去之後,一起用web介面查看是否有上傳成功,其畫面如下: