Linux单机运行hadoop平台

Hadoop-0.19.2的代码可以到Apache上下载,使用的Linux机器是RHEL 5,Linux上安装的Java版本为1.6.0_16,并且JAVA_HOME=/usr/java/jdk1.6.0_16实践过程

1、ssh无密码验证登陆localhost保证Linux系统的ssh服务已经启动,并保证能够通过无密码验证登陆本机Linux系统。如果不能保证,可以按照如下的步骤去做:(1)启动命令行窗口,执行命令行:$ ssh-keygen -t dsa -P ” -f ~/.ssh/id_dsa$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys(2)ssh登陆localhost,执行命令行:$ ssh localhost第一次登录,会提示你无法建立到127.0.0.1的连接,是否要建立,输入yes即可,,下面是能够通过无密码验证登陆的信息:[root@localhost hadoop-0.19.2]# ssh localhostLast login: Sun Aug 1 18:35:37 2010 from 192.168.0.104[root@localhost ~]#

2、Hadoop-0.19.0配置下载hadoop-0.19.0.tar.gz,大约是40.3M,解压缩到Linux系统指定目录,这里我的是/root/hadoop-0.19.2目录下。下面按照有序的步骤来说明配置过程:(1)修改hadoop-env.sh配置将Java环境的配置进行修改后,并取消注释“#”,修改后的行为:export JAVA_HOME=/usr/java/jdk1.6.0_16(2)修改hadoop-site.xml配置在<configuration>与</configuration>加上3个属性的配置,修改后的配置文件内容为:<?xml version=”1.0″?><?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?><!– Put site-specific property overrides in this file. –><configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property></configuration>

3运行wordcount实例wordcount例子是hadoop发行包中自带的实例,通过运行实例可以感受并尝试理解hadoop在执行MapReduce任务时的执行过程。按照官方的“Hadoop Quick Start”教程基本可以容易地实现,下面简单说一下我的练习过程。导航到hadoop目录下面,我的是/root/hadoop-0.19.0。(1)格式化HDFS执行格式化HDFS的命令行:[root@localhost hadoop-0.19.2]# bin/hadoop namenode -format格式化执行信息如下所示:10/08/01 19:04:02 INFO namenode.NameNode: STARTUP_MSG:

Re-format filesystem in /tmp/hadoop-root/dfs/name ? (Y or N) yFormat aborted in /tmp/hadoop-root/dfs/name10/08/01 19:04:05 INFO namenode.NameNode: SHUTDOWN_MSG:

(2)启动Hadoop相关后台进程执行命令行:[root@localhost hadoop-0.19.2]# bin/start-all.sh启动执行信息如下所示:starting namenode, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-namenode-localhost.outlocalhost: starting datanode, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-datanode-localhost.outlocalhost: starting secondarynamenode, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-secondarynamenode-localhost.outstarting jobtracker, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-jobtracker-localhost.outlocalhost: starting tasktracker, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-tasktracker-localhost.out(3)准备执行wordcount任务的数据首先,这里在本地创建了一个数据目录input,并拷贝一些文件到该目录下面,如下所示:[root@localhost hadoop-0.19.2]# mkdir input[root@localhost hadoop-0.19.2]# cp CHANGES.txt LICENSE.txt NOTICE.txt README.txt input/然后,将本地目录input上传到HDFS文件系统上,执行如下命令:[root@localhost hadoop-0.19.2]# bin/hadoop fs -put input/ input(4)启动wordcount任务执行如下命令行:[root@localhost hadoop-0.19.2]# bin/hadoop jar hadoop-0.19.2-examples.jar wordcount input output元数据目录为input,输出数据目录为output。任务执行信息如下所示:10/08/01 19:06:15 INFO mapred.FileInputFormat: Total input paths to process : 410/08/01 19:06:15 INFO mapred.JobClient: Running job: job_201008011904_000210/08/01 19:06:16 INFO mapred.JobClient: map 0% reduce 0%10/08/01 19:06:22 INFO mapred.JobClient: map 20% reduce 0%10/08/01 19:06:24 INFO mapred.JobClient: map 40% reduce 0%10/08/01 19:06:25 INFO mapred.JobClient: map 60% reduce 0%10/08/01 19:06:27 INFO mapred.JobClient: map 80% reduce 0%10/08/01 19:06:28 INFO mapred.JobClient: map 100% reduce 0%10/08/01 19:06:38 INFO mapred.JobClient: map 100% reduce 26%10/08/01 19:06:40 INFO mapred.JobClient: map 100% reduce 100%10/08/01 19:06:41 INFO mapred.JobClient: Job complete: job_201008011904_000210/08/01 19:06:41 INFO mapred.JobClient: Counters: 1610/08/01 19:06:41 INFO mapred.JobClient: File Systems10/08/01 19:06:41 INFO mapred.JobClient: HDFS bytes read=30148910/08/01 19:06:41 INFO mapred.JobClient: HDFS bytes written=11309810/08/01 19:06:41 INFO mapred.JobClient: Local bytes read=17400410/08/01 19:06:41 INFO mapred.JobClient: Local bytes written=34817210/08/01 19:06:41 INFO mapred.JobClient: Job Counters10/08/01 19:06:41 INFO mapred.JobClient: Launched reduce tasks=110/08/01 19:06:41 INFO mapred.JobClient: Launched map tasks=510/08/01 19:06:41 INFO mapred.JobClient: Data-local map tasks=510/08/01 19:06:41 INFO mapred.JobClient: Map-Reduce Framework10/08/01 19:06:41 INFO mapred.JobClient: Reduce input groups=899710/08/01 19:06:41 INFO mapred.JobClient: Combine output records=1086010/08/01 19:06:41 INFO mapred.JobClient: Map input records=736310/08/01 19:06:41 INFO mapred.JobClient: Reduce output records=899710/08/01 19:06:41 INFO mapred.JobClient: Map output bytes=43407710/08/01 19:06:41 INFO mapred.JobClient: Map input bytes=29987110/08/01 19:06:41 INFO mapred.JobClient: Combine input records=3919310/08/01 19:06:41 INFO mapred.JobClient: Map output records=3919310/08/01 19:06:41 INFO mapred.JobClient: Reduce input records=10860(5)查看任务执行结果可以通过如下命令行:bin/hadoop fs -cat output/*执行结果,截取部分显示如下所示:vijayarenu 20violations. 1virtual 3vis-a-vis 1visible 1visit 1volume 1volume, 1volumes 2volumes. 1w.r.t 2wait 9waiting 6waiting. 1waits 3want 1warning 7warning, 1warnings 12warnings. 3warranties 1warranty 1warranty, 1(6)终止Hadoop相关后台进程执行如下命令行:[root@localhost hadoop-0.19.2]# bin/stop-all.sh执行信息如下所示:stopping jobtrackerlocalhost: stopping tasktrackerstopping namenodelocalhost: stopping datanodelocalhost: stopping secondarynamenode已经将上面列出的5个进程jobtracker、tasktracker、namenode、datanode、secondarynamenode异常分析在进行上述实践过程中,可能会遇到某种异常情况,大致分析如下:1、Call to localhost/127.0.0.1:9000 failed on local exception异常(1)异常描述可能你会在执行如下命令行的时候出现:[root@localhost hadoop-0.19.2]# bin/hadoop jar hadoop-0.19.0-examples.jar wordcount input output出错异常信息如下所示:10/08/01 19:50:55 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 0 time(s).10/08/01 19:50:56 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s).10/08/01 19:50:57 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 2 time(s).10/08/01 19:50:58 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 3 time(s).10/08/01 19:50:59 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 4 time(s).10/08/01 19:51:00 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 5 time(s).10/08/01 19:51:01 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 6 time(s).10/08/01 19:51:02 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 7 time(s).10/08/01 19:51:03 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 8 time(s).10/08/01 19:51:04 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 9 time(s).java.lang.RuntimeException: java.io.IOException: Call to localhost/127.0.0.1:9000 failed on local exception: Connection refused at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:323) at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:295) at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:268) at org.apache.hadoop.examples.WordCount.run(WordCount.java:146) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.examples.WordCount.main(WordCount.java:155) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:141) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:61) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:165) at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)Caused by: java.io.IOException: Call to localhost/127.0.0.1:9000 failed on local exception: Connection refused at org.apache.hadoop.ipc.Client.call(Client.java:699) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at $Proxy0.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:104) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:177) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:74) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1367) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:56) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1379) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:120) at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:319) … 21 moreCaused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:299) at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176) at org.apache.hadoop.ipc.Client.getConnection(Client.java:772) at org.apache.hadoop.ipc.Client.call(Client.java:685) … 33 more(2)异常分析从上述异常信息分析,这句是关键:Retrying connect to server: localhost/127.0.0.1:9000.是说在尝试10次连接到“server”时都无法成功,这就说明到server的通信链路是不通的。我们已经在hadoop-site.xml中配置了namenode结点的值,如下所示: <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property>所以,很可能namenode进程根本就没有启动,更不必谈要执行任务了。上述异常,过程是:格式化了HDFS,但是没有执行bin/start-all.sh,直接启动wordcount任务,就出现上述异常。所以,应该执行bin/start-all.sh以后再启动wordcount任务。2、Input path does not exist异常(1)异常描述当你在当前hadoop目录下面创建一个input目录,并cp某些文件到里面,开始执行:[root@localhost hadoop-0.19.2]# bin/hadoop namenode -format[root@localhost hadoop-0.19.2]# bin/start-all.sh 这时候,你认为input已经存在,应该可以执行wordcount任务了:[root@localhost hadoop-0.19.2]# bin/hadoop jar hadoop-0.19.0-examples.jar wordcount input output结果抛出一堆异常,信息如下:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/root/input at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:190) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:782) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1127) at org.apache.hadoop.examples.WordCount.run(WordCount.java:149) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.examples.WordCount.main(WordCount.java:155) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:141) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:61) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:165) at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)上述异常,我模拟的过程是:[root@localhost hadoop-0.19.2]# bin/hadoop fs -rmr inputDeleted hdfs://localhost:9000/user/root/input[root@localhost hadoop-0.19.2]# bin/hadoop fs -rmr outputDeleted hdfs://localhost:9000/user/root/output(2)异常分析本地的input目录并没有上传到HDFS上,所出现org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/root/input只需要执行上传的命令即可:[root@localhost hadoop-0.19.2]# bin/hadoop fs -put input/ input

你曾经说,等我们老的时候,

Linux单机运行hadoop平台

相关文章:

你感兴趣的文章:

标签云: