實驗commands:
16.
teragen: Generate data for the terasort
17.
terasort: Run the terasort
18.
teravalidate: Checking results of terasort
1.參考:
https://discuss.zendesk.com/hc/en-us/articles/200927666-Running-TeraSort-MapReduce-Benchmark
command:
1.
yarn jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar teragen 100000000 /tera3
16/05/19 15:06:09 INFO mapreduce.Job: map 35% reduce 0%
16/05/19 15:06:11 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:12 INFO mapreduce.Job: map 38% reduce 0%
16/05/19 15:06:14 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:15 INFO mapreduce.Job: map 42% reduce 0%
16/05/19 15:06:17 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:18 INFO mapreduce.Job: map 45% reduce 0%
16/05/19 15:06:20 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:21 INFO mapreduce.Job: map 49% reduce 0%
16/05/19 15:06:23 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:24 INFO mapreduce.Job: map 52% reduce 0%
16/05/19 15:06:26 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:27 INFO mapreduce.Job: map 56% reduce 0%
16/05/19 15:06:29 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:30 INFO mapreduce.Job: map 59% reduce 0%
16/05/19 15:06:32 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:33 INFO mapreduce.Job: map 63% reduce 0%
16/05/19 15:06:35 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:36 INFO mapreduce.Job: map 66% reduce 0%
16/05/19 15:06:38 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:39 INFO mapreduce.Job: map 70% reduce 0%
16/05/19 15:06:41 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:42 INFO mapreduce.Job: map 73% reduce 0%
16/05/19 15:06:44 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:45 INFO mapreduce.Job: map 77% reduce 0%
16/05/19 15:06:47 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:48 INFO mapreduce.Job: map 80% reduce 0%
16/05/19 15:06:50 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:51 INFO mapreduce.Job: map 84% reduce 0%
16/05/19 15:06:53 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:54 INFO mapreduce.Job: map 87% reduce 0%
16/05/19 15:06:56 INFO mapred.LocalJobRunner: map > map
16/05/19 15:06:57 INFO mapreduce.Job: map 90% reduce 0%
16/05/19 15:06:59 INFO mapred.LocalJobRunner: map > map
16/05/19 15:07:00 INFO mapreduce.Job: map 94% reduce 0%
16/05/19 15:07:02 INFO mapred.LocalJobRunner: map > map
16/05/19 15:07:03 INFO mapreduce.Job: map 97% reduce 0%
16/05/19 15:07:04 INFO mapred.LocalJobRunner: map > map
16/05/19 15:07:04 INFO mapred.Task: Task:attempt_local302140298_0001_m_000000_0 is done. And is in the process of committing
16/05/19 15:07:04 INFO mapred.LocalJobRunner: map > map
16/05/19 15:07:04 INFO mapred.Task: Task attempt_local302140298_0001_m_000000_0 is allowed to commit now
16/05/19 15:07:04 INFO output.FileOutputCommitter: Saved output of task 'attempt_local30214029
8_0001_m_000000_0' to hdfs://localhost:9000/tera3/_temporary/0/task_local302140298_0001_m_000000
檢查:9.31G產生
command
2:
yarn jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar terasort /tera3 /tera3-sort
log分析
INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=433799246528
FILE: Number of bytes written=831560527500
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=393214197000
HDFS: Number of bytes written=10000000000
HDFS: Number of read operations=7525
HDFS: Number of large read operations=0
HDFS: Number of write operations=154
Map-Reduce Framework
Map input records=100000000
Map output records=100000000
Map output bytes=10200000000
Map output materialized bytes=10400000450
Input split bytes=7875
Combine input records=0
Combine output records=0
Reduce input groups=100000000
Reduce shuffle bytes=10400000450
Reduce input records=100000000
Reduce output records=100000000
Spilled Records=346976200
Shuffled Maps =75
Failed Shuffles=0
Merged Map outputs=75
GC time elapsed (ms)=12738
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=77072957440
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=10000000000
File Output Format Counters
Bytes Written=10000000000
16/05/19 15:23:16 INFO terasort.TeraSort: done
3. 結果
==============
yarn jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar
teravalidate -D mapred.reduce.tasks=8 /tera3-sort /teraValidate
Log 分析:
16/05/19 15:31:09 INFO output.FileOutputCommitter: Saved output of task 'attempt_local112206802_0001_r_000000_0' to hdfs://localhost:9000/teraValidate/_temporary/0/task_local112206802_0001_r_000000
16/05/19 15:31:09 INFO mapred.LocalJobRunner: reduce > reduce
16/05/19 15:31:09 INFO mapred.Task: Task 'attempt_local112206802_0001_r_000000_0' done.
16/05/19 15:31:09 INFO mapred.LocalJobRunner: Finishing task: attempt_local112206802_0001_r_000000_0
16/05/19 15:31:09 INFO mapred.LocalJobRunner: reduce task executor complete.
16/05/19 15:31:10 INFO mapreduce.Job: map 100% reduce 100%
16/05/19 15:31:10 INFO mapreduce.Job: Job job_local112206802_0001 completed successfully
16/05/19 15:31:10 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=541210
FILE: Number of bytes written=1046767
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=20000000000
HDFS: Number of bytes written=25
HDFS: Number of read operations=15
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Map-Reduce Framework
Map input records=100000000
Map output records=3
Map output bytes=83
Map output materialized bytes=95
Input split bytes=110
Combine input records=0
Combine output records=0
Reduce input groups=3
Reduce shuffle bytes=95
Reduce input records=3
Reduce output records=1
Spilled Records=6
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=362
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=747634688
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=10000000000
File Output Format Counters
Bytes Written=25