Alex 的 Hadoop 菜鸟教程: 第15课 Impala 安装使用教程

Impala-shell的使用

使用impala-shell可以进入impall的shell命令行。下面通过一个例子来说明Impala怎么使用

建表例子Step1

建立测试文件夹

hdfs dfs -mkdir -p /user/cloudera/sample_data/tab1 /user/cloudera/sample_data/tab2在本地建立文本tab1.csv

1,true,123.123,2012-10-24 08:55:00 2,false,1243.5,2012-10-25 13:40:003,false,24453.325,2008-08-22 09:33:21.1234,false,243423.325,2007-05-12 22:32:21.334545,true,243.325,1953-04-22 09:11:33tab2.csv

1,true,12789.1232,false,1243.53,false,24453.3254,false,2423.32545,true,243.32560,false,243565423.32570,true,243.32580,false,243423.32590,true,243.325把csv文件上传到hdfs

$ hdfs dfs -put tab1.csv /user/cloudera/sample_data/tab1$ hdfs dfs -ls /user/cloudera/sample_data/tab1Found 1 items-rw-r–r– 1 cloudera cloudera192 2013-04-02 20:08 /user/cloudera/sample_data/tab1/tab1.csv$ hdfs dfs -put tab2.csv /user/cloudera/sample_data/tab2$ hdfs dfs -ls /user/cloudera/sample_data/tab2Found 1 items-rw-r–r– 1 cloudera cloudera158 2013-04-02 20:09 /user/cloudera/sample_data/tab2/tab2.csvStep2

进入impala-shell,并建立3个表

DROP TABLE IF EXISTS tab1;– The EXTERNAL clause means the data is located outside the central location– for Impala data files and is preserved when the associated Impala table is dropped.– We expect the data to already exist in the directory specified by the LOCATION clause.CREATE EXTERNAL TABLE tab1( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP)ROW FORMAT DELIMITED FIELDS TERMINATED BY ','LOCATION '/user/cloudera/sample_data/tab1';DROP TABLE IF EXISTS tab2;– TAB2 is an external table, similar to TAB1.CREATE EXTERNAL TABLE tab2( id INT, col_1 BOOLEAN, col_2 DOUBLE)ROW FORMAT DELIMITED FIELDS TERMINATED BY ','LOCATION '/user/cloudera/sample_data/tab2';DROP TABLE IF EXISTS student;CREATE TABLE student( id INT, name STRING)ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

然后你就可以用 select语句测试下查询结果了

[xmseapp03:21000] > select * from tab1;Query: select * from tab1+—-+——-+————+——————————-+| id | col_1 | col_2| col_3|+—-+——-+————+——————————-+| 1 | true | 123.123 | 2012-10-24 08:55:00|| 2 | false | 1243.5| 2012-10-25 13:40:00|| 3 | false | 24453.325 | 2008-08-22 09:33:21.123000000 || 4 | false | 243423.325 | 2007-05-12 22:32:21.334540000 || 5 | true | 243.325 | 1953-04-22 09:11:33|+—-+——-+————+——————————-+Fetched 5 row(s) in 6.91s

Impala还可以调用写好的.sql文件Step1

建立一个文本 customer.dat

1|AAAAAAAABAAAAAAA|980124|7135|32946|2452238|2452208|Mr.|Javier|Lewis|Y|9|12|1936|CHILE||Javier.Lewis@VFAxlnZEvOx.org|2452508|2|AAAAAAAACAAAAAAA|819667|1461|31655|2452318|2452288|Dr.|Amy|Moses|Y|9|4|1966|TOGO||Amy.Moses@Ovk9KjHH.com|2452318|3|AAAAAAAADAAAAAAA|1473522|6247|48572|2449130|2449100|Miss|Latisha|Hamilton|N|18|9|1979|NIUE||Latisha.Hamilton@V.com|2452313|4|AAAAAAAAEAAAAAAA|1703214|3986|39558|2450030|2450000|Dr.|Michael|White|N|7|6|1983|MEXICO||Michael.White@i.org|2452361|5|AAAAAAAAFAAAAAAA|953372|4470|36368|2449438|2449408|Sir|Robert|Moran|N|8|5|1956|FIJI||Robert.Moran@Hh.edu|2452469|然后上传到hdfs上

hdfs dfs -put costomer.dat /user/hive/tpcds/customer/然后我们写一段sql 名叫customer_setup.sql

—- store_sales fact table and surrounding dimension tables only–create database tpcds;use tpcds;drop table if exists customer;create external table customer(c_customer_skint,c_customer_idstring,c_current_cdemo_skint,c_current_hdemo_skint,c_current_addr_skint,c_first_shipto_date_sk int,c_first_sales_date_skint,c_salutationstring,c_first_namestring,c_last_namestring,c_preferred_cust_flagstring,c_birth_dayint,c_birth_monthint,c_birth_yearint,c_birth_countrystring,c_loginstring,c_email_addressstring,c_last_review_datestring)row format delimited fields terminated by '|' location '/user/hive/tpcds/customer.dat';Step2

然后这样子去执行

impala-shell -i localhost -f customer_setup.sqlhive能做的事情impala都可以做,我就说说hive做不到但是Impala可以做到的事情

单条插入外部分区表的数据

hive不支持 insert into …values…这样的单条插入,具体原因未可知,但是Impala支持! Impala甚至可以向外部分区表插入数据!

Step1

建立hdfs的测试数据

$ hdfs dfs -mkdir -p /user/impala/data/logs/year=2015/month=01/day=01/host=host1$ hdfs dfs -mkdir -p /user/impala/data/logs/year=2015/month=02/day=22/host=host2并上传文本文件a.txt

1,jack2,michael和b.txt

3,sara4,johnhdfs dfs -put /root/a.txt /user/impala/data/logs/year=2015/month=01/day=01/host=host1hdfs dfs -put /root/b.txt /user/impala/data/logs/year=2015/month=02/day=22/host=host2Step2

创建外部分区表

create external table logs (id int, name string) partitioned by (year string, month string, day string, host string) row format delimited fields terminated by ',' location '/user/impala/data/logs';Step3

没有人会帮你一辈子,所以你要奋斗一生。

Alex 的 Hadoop 菜鸟教程: 第15课 Impala 安装使用教程

相关文章:

你感兴趣的文章:

标签云: