Make progress step by step everyday…..

第四章： HQl的数据定义1：创建数据库 create database financials; create database if not exists financials;2: 查看数据库 show databases; 模糊查询数据库 show databases like ‘h.*’ ;3：创建数据库修改数据库的默认位置 create database financials localtion ‘/my/preferred/directory’4：增加数据库的描述信息 create database financials comment ‘holds all financials tables’5: 显示数据库的描述的信息 describe database financials;6：增加一些和相关属性的键-值对属性信息 create database financials with dbproperties (‘create’= ‘Mark Moneybags’, ‘data’=’2012-12-12’); describe database extended financials;7:没有命令提示让用户查看当前所在的是那个数据库。可以重复使用use use financials；可以通过设置一个属性值来在提示符里面显示当前所在的数据库 set hive.cli.print.current.db = true; set hive.cli.print.current.db= false;8:删除数据库drop database if exists financials;Hive是不允许删除一个包含表的数据库，当时如果加上关键字： cascade，就可以了，hive自动删除数据库中的表 drop database if exists financials cascade;9：修改数据库，设置dbproperties键值对属性值alert database financials set dbproperties(‘edited-by’=’joe dba’);10:创建表：create table if not exists employees ( name string comment ’employee name’, salary float comment ’employee salary ‘, subordinates array<string> comment ’employee name of subordinates ‘ , deductions Map<string,FLOAT>, address struct<street:string,city:string,state:String,zip：int> ) comment ‘ description of the table ‘ tblproperties (‘creater’= ‘me’, ‘created_at’=’2012-12-12’); location ‘/user/hive/warehouse/mydb.db/employees’ — tblproperties 的主要作用是：按键-值对的格式为表增加额外的文档说明11: 列举某个表的tblproperties 属性信息 show tblproperties employees;12：拷贝表create table if not exists mydb.employees2 like mydb.employees2 13：选择数据库 use mydb 显示表show tables;show tables IN mydb;14：查看这个表的详细结果信息describe extended mydb.employees 使用formatted 关键字代替extendeddescribe formatted mydb.employees15：管理表：内部表：删除表时，会删除这个表的数据创建一个外部表：其可以读取所有位于/data/stocks目录下的以逗号分割的数据 create external table if not exists stocks( exchange string, symbol string, ymd String, price_open float, price_hight float, price_low float, price_close float, volume int,price_adj_close float) row format delimited fields terminated by ‘,’ location ‘/data/stocks’16：查看表是否是管理表还是外部表describe extended tablename 输出信息： tableType.managed_table–管理表 tableType.external_table–外部表 — 复制表但不会复制数据create table if not exists mydb.employees3(新表)like mydb.employees2(原表) location ‘/data/stocks’ 17：创建分区表 create table employees ( name string, salary float, subordinates array<string>, deductions Map<string,FLOAT>, address struct<street:string,city:string,state:String,zip：int> ) partitioned by (country String,state string); 分区自段： country String,state string 和普通字段一样，相当于索引字段，根据分区字段查询，提交效率，提高查询性能18： set hive.mapred.mode=strict; 如果对分区表进行查询而where子句没有加分区过滤的话，将会禁止提交这个任务。可以设置为：nostrict19：查看表中存在的所有分区 show partitions employees; 20：查看是否存储某个特定分区键的分区的话show partitions employees partition(country=’US’); describe extended employees 命令也会显示分区键管理大型生产数据集最常见的情况：使用外部分区表21：在管理表中用户可以通过载入数据的方式创建分区： load data local inpath ‘/home/hive/California-employees’ INTO table employees partition(country=’US’,state=’CA’); hive 将会创建这个分区对应的目录…./employees/country=US/state=CA22:创建外部分区表 create table if not exists log_messages (hms int,severity string,server string,process_id int,message string ) partitioned by (year int,month int,day int) row format delimited fields terminated by ‘\t’1:order by 会对输入做全局排序2: sort可以控制每个reduce产生的文件都是排序，再对多个排序的好的文件做二次归并排序。sort by 特点如下：1) . sort by 基本受hive.mapred.mode是否为strict、nonstrict的影响，但若有分区需要指定分区。2). sort by 的数据在同一个reduce中数据是按指定字段排序。3). sort by 可以指定执行的reduce个数，如：set mapred.reduce.tasks=5 ,对输出的数据再执行归并排序，即可以得到全部结果。结果说明：严格模式下，sort by 不指定limit 数，可以正常执行。sort by 受hive.mapred.mode=sctrict 的影响较小。3:distribute bydistribute by 是控制在map端如何拆分给reduce端。根据distribute by 后面的列及reduce个数进行数据分发，默认采用hash算法。distribute可以使用length方法会根据string类型的长度划分到不同的reduce中，最终输出到不同的文件中。 length 是内建函数，，也可以指定其他的函数或这使用自定义函数。4: cluster bycluster by 除了distribute by 的功能外，还会对该字段进行排序，所以cluster by = distribute by +sort by

如果心在远方，只需勇敢前行，梦想自会引路，

相关文章：

你感兴趣的文章：

标签云：