Hadoop测试例子wordcount

  1、建立一个测试的目录  

[root@localhost hadoop-1.1.1]# bin/hadoop dfs -mkdir /hadoop/input

  2、建立测试文件

[root@localhost test]# vi test.txthello hadoophello WorldHello JavaHey mani am a programmer

  3、将测试文件放到测试目录中

[root@localhost hadoop-1.1.1]# bin/hadoop dfs -put ./test/test.txt /hadoop/input

  4、执行wordcount程序

[root@localhost hadoop-1.1.1]# bin/hadoop jar hadoop-examples-1.1.1.jar wordcount /hadoop/input/* /hadoop/output

    /hadoop/output目录必须不存在,虚拟主机,虚拟主机,美国空间,否则会报错:

org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory /hadoop/output already exists

    因为Hadoop执行的是耗费资源的运算,产生的结果默认是不能被覆盖的。

    执行成功的话,显示下面的信息:

[root@localhost hadoop-1.1.1]# bin/hadoop jar hadoop-examples-1.1.1.jar wordcount /hadoop/input/* /hadoop/output13/01/17 00:36:06 INFO input.FileInputFormat: Total input paths to process : 113/01/17 00:36:06 INFO util.NativeCodeLoader: Loaded the native-hadoop library13/01/17 00:36:06 WARN snappy.LoadSnappy: Snappy native library not loaded13/01/17 00:36:07 INFO mapred.JobClient: Running job: job_201301162205_000613/01/17 00:36:08 INFO mapred.JobClient: map 0% reduce 0%13/01/17 00:36:14 INFO mapred.JobClient: map 100% reduce 0%13/01/17 00:36:22 INFO mapred.JobClient: map 100% reduce 33%13/01/17 00:36:24 INFO mapred.JobClient: map 100% reduce 100%13/01/17 00:36:25 INFO mapred.JobClient: Job complete: job_201301162205_000613/01/17 00:36:25 INFO mapred.JobClient: Counters: 2913/01/17 00:36:25 INFO mapred.JobClient: Job Counters 13/01/17 00:36:25 INFO mapred.JobClient:Launched reduce tasks=113/01/17 00:36:25 INFO mapred.JobClient:SLOTS_MILLIS_MAPS=686313/01/17 00:36:25 INFO mapred.JobClient:Total time spent by all reduces waiting after reserving slots (ms)=013/01/17 00:36:25 INFO mapred.JobClient:Total time spent by all maps waiting after reserving slots (ms)=013/01/17 00:36:25 INFO mapred.JobClient:Launched map tasks=113/01/17 00:36:25 INFO mapred.JobClient:Data-local map tasks=113/01/17 00:36:25 INFO mapred.JobClient:SLOTS_MILLIS_REDUCES=920713/01/17 00:36:25 INFO mapred.JobClient: File Output Format Counters 13/01/17 00:36:25 INFO mapred.JobClient:Bytes Written=7813/01/17 00:36:25 INFO mapred.JobClient: FileSystemCounters13/01/17 00:36:25 INFO mapred.JobClient:FILE_BYTES_READ=12813/01/17 00:36:25 INFO mapred.JobClient:HDFS_BYTES_READ=17013/01/17 00:36:25 INFO mapred.JobClient:FILE_BYTES_WRITTEN=4805913/01/17 00:36:25 INFO mapred.JobClient:HDFS_BYTES_WRITTEN=7813/01/17 00:36:25 INFO mapred.JobClient: File Input Format Counters 13/01/17 00:36:25 INFO mapred.JobClient:Bytes Read=6213/01/17 00:36:25 INFO mapred.JobClient: Map-Reduce Framework13/01/17 00:36:25 INFO mapred.JobClient:Map output materialized bytes=12813/01/17 00:36:25 INFO mapred.JobClient:Map input records=513/01/17 00:36:25 INFO mapred.JobClient:Reduce shuffle bytes=12813/01/17 00:36:25 INFO mapred.JobClient:Spilled Records=2213/01/17 00:36:25 INFO mapred.JobClient:Map output bytes=11013/01/17 00:36:25 INFO mapred.JobClient:CPU time spent (ms)=165013/01/17 00:36:25 INFO mapred.JobClient:Total committed heap usage (bytes)=17649254413/01/17 00:36:25 INFO mapred.JobClient:Combine input records=1213/01/17 00:36:25 INFO mapred.JobClient:SPLIT_RAW_BYTES=10813/01/17 00:36:25 INFO mapred.JobClient:Reduce input records=1113/01/17 00:36:25 INFO mapred.JobClient:Reduce input groups=1113/01/17 00:36:25 INFO mapred.JobClient:Combine output records=1113/01/17 00:36:25 INFO mapred.JobClient:Physical memory (bytes) snapshot=18008883213/01/17 00:36:25 INFO mapred.JobClient:Reduce output records=1113/01/17 00:36:25 INFO mapred.JobClient:Virtual memory (bytes) snapshot=75624448013/01/17 00:36:25 INFO mapred.JobClient:Map output records=12[root@localhost hadoop-1.1.1]#

  5、查看结果

    wordcount程序统计目标文件中的单词个数,将结果输出到/hadoop/output/part-r-00000文件中

[root@localhost hadoop-1.1.1]# bin/hadoop dfs -ls /hadoop/outputFound 3 items-rw-r–r– 1 root supergroup0 2013-01-17 00:36 /hadoop/output/_SUCCESSdrwxr-xr-x – root supergroup0 2013-01-17 00:36 /hadoop/output/_logs-rw-r–r– 1 root supergroup78 2013-01-17 00:36 /hadoop/output/part-r-00000[root@localhost hadoop-1.1.1]#

[root@localhost hadoop-1.1.1]# bin/hadoop dfs -cat /hadoop/output/part-r-00000Hello 1Hey1Java 1World 1a1am1hadoop 1hello 2i1man1programmer1[root@localhost hadoop-1.1.1]#

玩坏了可以选择重来,

Hadoop测试例子wordcount

相关文章:

你感兴趣的文章:

标签云: