hive2solr时count的一个bug

最近在测试hive导入solr,github上有个相关的代码

https://github.com/chimpler/hive-solr

其原理就是实现inputformat和outputformat,,通过mapred来做数据的读写操作。

测试的表结构:

showcreatetabletable_in_solr1;CREATEEXTERNALTABLEtable_in_solr1(idstringCOMMENT’fromdeserializer’,cookie_id_sstringCOMMENT’fromdeserializer’,first_url_sstringCOMMENT’fromdeserializer’,warehouse_sstringCOMMENT’fromdeserializer’)ROWFORMATSERDE’com.chimpler.hive.solr.SolrSerDe’STOREDBY’com.chimpler.hive.solr.SolrStorageHandler’WITHSERDEPROPERTIES(‘serialization.format’=’1′,’solr.column.mapping’=’id,cookie_id_s,first_url_s,warehouse_s’)LOCATION’hdfs://xxxxxx:9000/bip/hive_warehouse/table_in_solr1’TBLPROPERTIES(‘solr.url’=’http://xxxxxxx:8888/solr/userinfo’,’transient_lastDdlTime’=’1401357066′,’solr.buffer.input.rows’=’10000′,’solr.buffer.output.rows’=’10000′)

导入数据之后,进行count测试:

selectcount(1)fromtable_in_solr1;EndedJob=job_1401419652664_0010witherrorsErrorduringjob,obtainingdebugginginformation…ExaminingtaskID:task_1401419652664_0010_m_000000(andmore)fromjobjob_1401419652664_0010Taskwiththemostfailures(4):—–TaskID:task_1401419652664_0010_m_000000URL::8088/taskdetails.jsp?jobid=job_1401419652664_0010&tipid=task_1401419652664_0010_m_000000—–DiagnosticMessagesforthisTask:Error:java.io.IOException:java.lang.NumberFormatException:Forinputstring:””atorg.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)atorg.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)atorg.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:243)atorg.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:526)atorg.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:166)atorg.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:407)atorg.apache.hadoop.mapred.MapTask.run(MapTask.java:340)atorg.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:160)atjava.security.AccessController.doPrivileged(NativeMethod)atjavax.security.auth.Subject.doAs(Subject.java:396)atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)atorg.apache.hadoop.mapred.YarnChild.main(YarnChild.java:155)Causedby:java.lang.NumberFormatException:Forinputstring:””atjava.lang.NumberFormatException.forInputString(NumberFormatException.java:48)atjava.lang.Integer.parseInt(Integer.java:470)atjava.lang.Integer.parseInt(Integer.java:499)atcom.chimpler.hive.solr.SolrInputFormat.getReadColumnIDs(SolrInputFormat.java:38)atcom.chimpler.hive.solr.SolrInputFormat.getRecordReader(SolrInputFormat.java:50)atorg.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:240)…9moreFAILED:ExecutionError,returncode2fromorg.apache.hadoop.hive.ql.exec.MapRedTaskMapReduceJobsLaunched:Job0:Map:2Reduce:1HDFSRead:0HDFSWrite:0FAILTotalMapReduceCPUTimeSpent:0msec

而count(一个字段)是ok的。

从explain的结果来看。

count(字段)对比count(1)增加了Select Operator:

SelectOperatorexpressions:expr:idtype:stringoutputColumnNames:idGroupByOperatoraggregations:expr:count(id)自己变得跟水晶一般透明,

hive2solr时count的一个bug

相关文章:

你感兴趣的文章:

标签云: