Nutch1.2运行时可能发生的错误以及解决办法

错误1.由linux下允许打开的最大文件数量引起

错误消息:java.io.IOException: background merge hit exception: _0:C500->_0 _1:C500->_0 _2:C500->_….. [optimize]at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2310)at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2249)at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2219)at org.apache.nutch.indexer.lucene.LuceneWriter.close(LuceneWriter.java:237)at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)at org.apache.Hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)at org.apache.hadoop.mapred.Child.main(Child.java:170)Caused by: java.io.FileNotFoundException: /var/lib/crawlzilla/nutch-crawler/mapred/local/index/_682243155/_6a.frq (Too many open files)at java.io.RandomAccessFile.open(Native Method)at java.io.RandomAccessFile.(RandomAccessFile.java:212)at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.(SimpleFSDirectory.java:76)at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.(SimpleFSDirectory.java:97)at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.(NIOFSDirectory.java:87)at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:67)at org.apache.lucene.index.SegmentReader$CoreReaders.(SegmentReader.java:129)at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:576)at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:609)at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4239)at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3917)at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:231)at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:288)

原因及解决方法: Java程序运行在Unix/Linux环境下,并且该Java程序需要对文件做大量的操作,则会产生这样的异常。 Unix/Linux环境下有文件句柄的限制,可以使用ulimit -n查看当前环境允许打开的文件句柄数量,默认为1024。但是在我们的Java程序并发接近于或者多余1024而同时又在频繁的读写文件,,所以会出

现以上异常,解决方式是按实际需要增大对文件句柄的限制数。 命令: ulimit –n 32768

错误2.硬盘空间不足

错误消息:Error: java.io.IOException: No space left on deviceat java.io.FileOutputStream.writeBytes(Native Method)at java.io.FileOutputStream.write(FileOutputStream.java:260)at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:190)at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)at java.io.DataOutputStream.write(DataOutputStream.java:90)at org.apache.hadoop.mapred.IFileOutputStream.write(IFileOutputStream.java:84)at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)at java.io.DataOutputStream.write(DataOutputStream.java:90)at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:218)at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:157)at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:2454)

原因及解决方法:当硬盘不足时nutch会等待可用空间,如果是分布式的话可以增加一个运行节点。

Nutch的详细介绍:请点这里Nutch的下载地址:请点这里

人的一辈子唯一做的就是,不断地用你手中

Nutch1.2运行时可能发生的错误以及解决办法

相关文章:

你感兴趣的文章:

标签云: