HBase写入操作卡住长时间不返回的原因分析

本文出处：，转载请注明。由于本人不定期会整理相关博文，会对相应内容作出完善。因此强烈建议在原始出处查看此文。

这些天研究HBase，写了一段Demo代码，具体如下：

@Testpublic void doTest() throws MasterNotRunningException, ZooKeeperConnectionException, IOException {Configuration config = HBaseConfiguration.create();config.set(zkSetKey, zkConn);HBaseAdmin hBaseAdmin = null;try{hBaseAdmin = new HBaseAdmin(config);ClusterStatus clusterStatus = hBaseAdmin.getClusterStatus();ServerName master = clusterStatus.getMaster();log.info("Master主机:{},端口号:{}", master.getHostname(), master.getPort());Collection<ServerName> servers = clusterStatus.getServers();for (ServerName serverName : servers) {log.info("Region主机{},端口号:{}", serverName.getHostname(), serverName.getPort());}HTableDescriptor targetTableDesc = null;try{targetTableDesc = hBaseAdmin.getTableDescriptor(TableName.valueOf(tableName));log.info("表已经存在,显示信息");log.info("属性:{}", targetTableDesc.toString());}catch(TableNotFoundException notFound){log.info("表不存在,创建");HTableDescriptor htd = new HTableDescriptor(TableName.valueOf(tableName));HColumnDescriptor hcd_info = new HColumnDescriptor("info");hcd_info.setMaxVersions(3);HColumnDescriptor hcd_data = new HColumnDescriptor("data");htd.addFamily(hcd_info);htd.addFamily(hcd_data);hBaseAdmin.createTable(htd);log.info("创建成功");targetTableDesc = hBaseAdmin.getTableDescriptor(TableName.valueOf(tableName));log.info("属性:{}", targetTableDesc.toString());}TableName[] listTableNames = hBaseAdmin.listTableNames();if (listTableNames == null){log.info("无表");}else{for (TableName tableName : listTableNames) {log.info("表名:{}", tableName.getNameAsString());}}}finally{IOUtils.closeQuietly(hBaseAdmin);log.info("结束");}}运行这段代码，程序会卡在第28行不动，也就是创建表操作，并且没有报出任何异常，其它的读取操作却很快（例如获取集群状态、列出所有表等操作）。于是本人陷入了深深地思考……

在CSDN找到了一种类似的情况：

主要是说HBase所依赖的HDFS进入了安全模式，需要手动退出该模式（运行命令：hdfs dfsadmin -safemode leave）。可是我查询当前的HDFS安全模式状态（hdfs dfsadmin -safemode get）时得到的信息是：Safe mode is OFF，也就是说根本没在安全模式下。

后来偶然地过了一段时间再去看日志发现如下信息：

#1, waiting for some tasks to finish. Expected max=0, tasksSent=9, tasksDone=8, currentTasksDone=8, retries=8 hasError=false, tableName=demo_table#1, waiting for some tasks to finish. Expected max=0, tasksSent=10, tasksDone=9, currentTasksDone=9, retries=9 hasError=false, tableName=demo_table#1, table=demo_table, attempt=10/35 failed 1 ops, last exception: java.net.UnknownHostException: unknown host: admin.demo.com on admin.demo.cn,5020,1426211698289, tracking started Fri Mar 13 11:41:19 CST 2015, retrying after 10037 ms, replay 1 ops.

最后一行的主机名：admin.demo.com和admin.demo.cn引起了我的注意。因为我的测试环境为3台实体服务器，配置如下：

IP地址hosts文件配置的主机名linux系统的hostnameHBase角色HDFS角色

192.168.1.21hd-21test-21MasterServerNameNode、DataNode、ZooKeeper

192.168.1.22hd-22test-22RegionServerNameNode、DataNode、ZooKeeper

192.168.1.23hd-23test-23RegionServerDataNode、ZooKeeper

另外，，因为是测试环境，在192.168.1.22这台机器上还添加了如下hosts信息：

192.168.1.22 admin.demo.com192.168.1.22 admin.demo.cn简单来说就是hosts文件中配置的主机名和真实主机名不一致，并且还加入了额外的hosts配置信息干扰到了正确解析主机名。

经验：通过上述错误可知，Hadoop集群中hostname一定要与hosts文件中配置的名称一致，并且力求保证集群映射关系的纯净，不要把其他不相干的业务也部署在其中，引起不必要的麻烦。

人生的小河，总要流过森林，荒漠，

相关文章：

你感兴趣的文章：

标签云：