Hadoop常见重要命令行操作及命令作用

关于Hadoop[root@master ~]# hadoop –helpUsage: hadoop [–config confdir] COMMANDwhere COMMAND is one of: fsrun a generic filesystem user client versionprint the version jar <jar>run a jar file checknative [-a|-h] check native hadoop and compression libraries availability distcp <srcurl> <desturl> directories recursively archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive classpathprints Hadoop jar and the required libraries daemonloglevel for each daemon or CLASSNAMEnamed CLASSNAMEMost commands print help when invoked w/o parameters.查看版本[root@master ~]# hadoop versionHadoop -101Subversion git@github.com:hortonworks/hadoop.git -r b07b2906c36defd389c8b5bd22bebc1bead8115bCompiled by jenkins on 2014-01-09T05:18ZCompiled source with checksum 704f1e463ebc4fb89353011407e965This command was run -101.jar运行jar文件[root@master liguodong]# hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.2.0.2.0.6.0-101.jar pi 10 100Number of Maps = 10Samples per Map = 100Wrote input for Map #0Wrote input for Map #1Wrote input for Map #2…Job Finished in 19.715 secondsEstimated value of Pi is 3.14800000000000000000检查Hadoop本地库和压缩库的可用性[root@master liguodong]:28:07 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native15/06/03 10:28:07 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib libraryNative library checking:true /lib64/libztrue /usr/lib64/libsnappytrue revision:文件归档 Archive

hadoop不适合小文件的存储,小文件本身就占用了很多metadata,就会造成namenode越来越大。 Hadoop Archives (HAR files)是在0.18.0版本中引入的,它的出现就是为了 缓解大量小文件消耗namenode内存的问题。 HAR文件是通过在HDFS上构建一个层次化的文件系统来工作。一个HAR文件是通过hadoop的archive命令来创建,而这个命令实际上也是运行了一个MapReduce任务来将小文件打包成HAR。对于client端来说,使用HAR文件没有任何影响。所有的原始文件都使用har://URL。但在HDFS端它内部的文件数减少了。 通过HAR来读取一个文件并不会比直接从HDFS中读取文件高效,而且实际上可能还会稍微低效一点,因为对每一个HAR文件的访问都需要完成两层读取,index文件的读取和文件本身数据的读取。并且尽管HAR文件可以被用来作为MapReduce job的input,但是并没有特殊的方法来使maps将HAR文件中打包的文件当作一个HDFS文件处理。 创建文件 hadoop archive -archiveName xxx.har -p /src /dest 查看内容 hadoop fs -lsr har:///dest/xxx.har

[root@master liguodong]# hadoop archivearchive -archiveName NAME -p <parent path> <src>* <dest>[root@master liguodong]# hadoop fs -lsr /liguodongdrwxrwxrwx – hdfshdfs0 2015-05-04 19:40 /liguodong/output-/liguodong/output/_SUCCESS-/liguodong/output/part-r-00000[root@master liguodong]# hadoop archive -archiveName liguodong.har -p /liguodong output /liguodong/har[root@master liguodong]# hadoop fs -lsr /liguodongdrwxr-xr-x – roothdfs0 2015-06-03 11:15 /liguodong/hardrwxr-xr-x – roothdfs0 2015-06-03 11:15 /liguodong/har/liguodong.har-/liguodong/har/liguodong.har/_SUCCESS-/liguodong/har/liguodong.har/_index-/liguodong/har/liguodong.har/_masterindex-/liguodong/har/liguodong.har/part-0drwxrwxrwx – hdfshdfs0 2015-05-04 19:40 /liguodong/output-/liguodong/output/_SUCCESS-/liguodong/output/part-r-00000查看内容[root@master liguodong]# hadoop fs -lsr har:///liguodong/har/liguodong.harlsr: DEPRECATED: Please use ‘ls -R’ instead.drwxr-xr-x – root hdfs0 2015-05-04 19:40 har:///liguodong/har/liguodong.har/output-///liguodong/har/liguodong.har/output/_SUCCESS-///liguodong/har/liguodong.har/output/part-r-00000—————————————————————[root@master liguodong]# hadoop archive -archiveName liguodong2.har -p /liguodong/output /liguodong/har[root@master liguodong]# hadoop fs -lsr har:///liguodong/har/liguodong2.har-///liguodong/har/liguodong2.har/_SUCCESS-///liguodong/har/liguodong2.har/part-r-00000关于HDFS[root@master /]# hdfs –helpUsage: hdfs [–config confdir] COMMAND where COMMAND is one of: dfs systems supported in Hadoop. namenode -format format the DFS filesystem secondarynamenode run the DFS secondary namenode namenode run the DFS namenode journalnode run the DFS journalnode zkfc run the ZK Failover Controller daemon datanode run a DFS datanode dfsadmin run a DFS admin client haadmin run a DFS HA admin client fsck run a DFS filesystem checking utility balancer run a cluster balancing utility jmxget get JMX exported values from NameNode or DataNode. oiv apply the offline fsimage viewer to an fsimage oev apply the offline edits viewer to an edits file fetchdt fetch a delegation token from the NameNode getconf get config values from configuration groups get the groups which users belong to snapshotDiff diff two snapshots of a directory or diff the current directory contents with a snapshot lsSnapshottableDir list all snapshottable dirs owned by the current user Use -help to see options portmap run a portmap service nfs3 run an NFS version 3 gateway校验检查某个目录是否健康[root@master liguodong]# hdfs fsck /liguodongConnecting to namenode via :50070FSCK started by root (auth:SIMPLE) from /path /liguodong at Wed Jun 03 10:43:41 CST 2015………..Status: HEALTHY Total size: 1559 B Total dirs: 7 Total files: 11 Total symlinks:0 Total blocks (validated):7 (avg. block size 222 B)…The filesystem under path ‘/liguodong’ is HEALTHY更加详细的查看命令[root@master liguodong]# hdfs fsck /liguodong -files -blocks

作用: 检查文件系统的健康状态 可以查看一个文件所在的数据块 可以删除一个坏块。 可以查找一个缺失的块。

balancer磁盘均衡器挫折其实就是迈向成功所应缴的学费。

Hadoop常见重要命令行操作及命令作用

相关文章:

你感兴趣的文章:

标签云: