mvn+eclipse构建hadoop项目并运行(超简单hadoop开发入门指南)

本文详述如何在windows开发环境下通过mvn+eclipse构建hadoop项目并运行

必备环境windows7下环境配置

1、本地hadoop环境配置 添加环境变量HADOOP_HOME=E:\doc_api\ebook\hadoop-2.5.2 追加环境变量path内容:%HADOOP_HOME%\bin

2、bin下增加hadoop.dll,winutils.exe文件 从https://github.com/srccodes/hadoop-common-2.2.0-bin或从……下载hadoop.dll,winutils.exe,放置到${HADOOP_HOME}\bin目录下

构建hadoop项目

下面以经典的WordCount为例,构建我们第一个hadoop项目。

引包

pom文件中加入依赖包

>hadoop-mapreduce-client-core>>hadoop-common>>hadoop-hdfs>>hadoop-mapreduce-client-common>编写WordCount类如下import javaimport javaimport orgimport orgimport orgimport orgimport orgimport orgimport orgimport orgimport orgimport orgimport orgimport orgimport orgimport orgimport org/** * @version 1.0 * @author tangqian */public class WordCount extends Configured implements Tool {public static void main(String[] args) throws Exception {int result = ToolRunner.run(new Configuration(),new WordCount(), args);System.exit(result);}@Overridepublic int run(String[] args) throws Exception {Path inputPath, outputPath;if(args.length == 2){inputPath = new Path(args[0]);outputPath = new Path(args[1]);}else{System.out.println(“usage <input> <output>”);return 1;}Configuration conf = getConf();Job job = Job.getInstance(conf, “word count”);job.setJarByClass(WordCount.class);job.setMapperClass(WordCountMapper.class);job.setReducerClass(WordCountReducer.class);job.setInputFormatClass(TextInputFormat.class);job.setOutputFormatClass(TextOutputFormat.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);FileInputFormat.addInputPath(job, inputPath);FileOutputFormat.setOutputPath(job, outputPath);return job.waitForCompletion(true) ? 0 : 1;}public static class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {private final static IntWritable one = new IntWritable(1);private Text word = new Text();@Overridepublic void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {StringTokenizer itr = new StringTokenizer(value.toString());while (itr.hasMoreTokens()) {word.set(itr.nextToken());context.write(word, one);}}}public static class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {private IntWritable result = new IntWritable();@Overridepublic void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {int sum = 0;for (IntWritable value : values) {sum += value.get();}result.set(sum);context.write(key, result);}}}

然后在该类上右键Run As->Run Configurations->Arguments标签的Program arguments中指定输入路径和输出路径如下:

点Run即可运行该类,此时可在Console看到输出信息。等完成后,可到e:/hadoop/result2看到结果文件part-r-00000内容如下

is 1test two 1

说明:由于是在本地hadoop单机模式下运行,,故采用本地文件系统(以file://开头指定输入输出路径)。

附 hadoop-2.5.2集群安装指南(参阅)

如何修改Windows7下的hosts文件? hosts文件一般在C:\Windows\System32\drivers\etc目录下,在windows7下如果不是管理员身份登录,可能无权限修改,此时可右键hosts文件->属性->安全->编辑,选择当前登录用户,开放修改权限即可,具体操作如下图。

抱最大的希望,为最大的努力,做最坏的打算

mvn+eclipse构建hadoop项目并运行(超简单hadoop开发入门指南)

相关文章:

你感兴趣的文章:

标签云: