hive中使用case、if:一个region统计业务(hive条件函数case、if

前言:Hive ql自己设计总结1,遇到复杂的查询情况,就分步处理。将一个复杂的逻辑,分成几个简单子步骤处理。2,但能合在一起的,尽量和在一起的。比如同级别的多个concat函数合并一个select

也就是说,字段之间是并行的同级别处理,则放在一个hive ql;而字段间有前后处理逻辑依赖(判断、补值、计算)则可分步执行,提前将每个字段分别处理好,然后进行相应的分步简单逻辑处理。

一、 场景:日志中region数据处理(国家,省份,城市)select city_id,province_id,country_idfrom wizad_mdm_cleaned_hdfsprovince_id = city_id,province_id,country_id二 、发现日志中有空数据:3817317516481761(全空)77三、设定过滤逻辑city_id = ” thenCONCAT(‘region_’,’1′,’_’,province_id)elseCONCAT(‘region_’,’1′,’_’,province_id,’_’,city_id)CONCAT(‘region_’,’1′,’_’,parent_region_id,’_’,city_id)city_id !=” thenCONCAT(‘region_’,country_id,’_’,parent_region_id,’_’,city_id)四、hive ql实现test_lmj_mdm_tmp1;guid,(CASE country_id(CASE WHEN province_id=” THENIF(city_id = ”,”,CONCAT(‘region_’,’1′,’_’,parent_region_id,’_’,city_id)) ELSEIF(city_id=”,CONCAT(‘region_’,’1′,’_’,province_id),CONCAT(‘region_’,’1′,’_’,province_id,’_’,city_id))END)(city_id = ”, CONCAT(‘region_’,country_id,’_’,province_id),CONCAT(‘region_’,country_id,’_’,province_id,’_’,city_id))END)END )AS region,(CASE connection_type WHEN ‘2’ THENCONCAT(‘carrier_’,’wifi’) ELSE CONCAT(‘carrier_’,c.element_id) END) AS carrier,) AS imp_pv,) AS clk_pvFROM wizad_mdm_cleaned_hdfs aleft outer joinwizad_mdm_dev_lmj_ad_campaign_industry_brand bON (a.wizad_ad_id = b.ad_id)(SELECT * FROMwizad_mdm_dev_lmj_mapping_table_analytics WHERE TYPE = ‘7’) cON (a.adn_id = c.ad_network_id ANDa.carrier_id = c.mapping_id)left outer joinwizad_mdm_dev_lmj_app_category_analytics dON (a.app_category_id = d.adn_category)(select region_template_id,parent_region_id from wizad_mdm_dev_lmj_region_template) eON (a.city_id = e.region_template_id)guid,(CASE country_id((city_id = ”,”,CONCAT(‘region_’,’1′,’_’,parent_region_id,’_’,city_id))ELSEIF(city_id=”,CONCAT(‘region_’,’1′,’_’,province_id),CONCAT(‘region_’,’1′,’_’,province_id,’_’,city_id))END)ELSE (CASE when province_id=” THENIF(city_id=”,CONCAT(‘region_’,country_id),CONCAT(‘region_’,country_id,’_’,parent_region_id,’_’,city_id))ELSEIF(city_id=”,CONCAT(‘region_’,country_id,’_’,province_id),CONCAT(‘region_’,country_id,’_’,province_id,’_’,city_id))END)END),(CASE connection_type WHEN ‘2’ THENCONCAT(‘carrier_’,’wifi’) ELSE CONCAT(‘carrier_’,c.element_id) END);五、Hive ql语句分析

上例中使用case和if,语法参见最后{七、CONDITIONAL FUNCTIONS IN HIVE} 注意: 1,case特殊用法:case后可无对象,而在when后加条件判断语句,如,case when a=1 then true else false end; 2,select后的变换字段提取,对应在groupby中也要有,,如carrier的case处理。(否则select不到)。但group by 后不能起表别名(as),select后可以。substring处理time时也一样在select和group by都有, 3,left outerjoin用子查询减少join时的内存 4,IF看版本才能用

六、Hive ql设计重构初学者如我,总设计复杂逻辑,变态语句。实际上,有经验的人面对逻辑太过复杂,应该分步操作。一个sql的高级同事重构上例。分两步: – 1)先分别给各字段补充合理值(能补充的补充,不能的置空) – 2)然后在region处理时直接过滤掉非法值记录6.1步骤一语句test_lmj_mdm_tmp;guid,CONCAT(‘adn_’,adn_id) AS adn,CONCAT(‘time_’,substr(createtime,12,2)) AS hour,CONCAT(‘os_’,os_id) AS os,country_id = ‘NULL’ or country_id isnull)province_id is null)city_id is null)(country_id = )province_id orcity_id <> city_id )country_id end ascountry_id,province_id = ‘NULL’ or province_idis null)ande.parent_region_id <> thene.parent_region_idelse province_id end asprovince_id,city_id,CONCAT(‘campaign_’,b.campaign_id) AS campaign,CONCAT(‘interest_’,b.industry_id) AS interest,CONCAT(‘brand_’,b.brand_id) AS brand,(CONCAT(‘carrier_’,’wifi’) ELSECONCAT(‘carrier_’,c.element_id) END) AS carrier,CONCAT(‘appcategory_’,d.wizad_category) AS appcategory,uid,) AS imp_pv,) AS clk_pvFROM ${clean_log_table} awizad_mdm_dev_lmj_ad_campaign_industry_brand bON (a.wizad_ad_id = b.ad_id)(SELECT * FROMwizad_mdm_dev_lmj_mapping_table_analytics WHERE TYPE = ‘7’) cON (a.adn_id = c.ad_network_id AND a.carrier_id = c.mapping_id)wizad_mdm_dev_lmj_app_category_analytics dON (a.app_category_id = d.adn_category)(select region_template_id, parent_region_id fromwizad_mdm_dev_lmj_region_template) eON (a.city_id = e.region_template_id)guid,CONCAT(‘adn_’,adn_id),CONCAT(‘time_’,substr(createtime,12,2)),CONCAT(‘os_’,os_id),country_id = ‘NULL’ or country_id isnull)province_id is null))(country_id = )province_id orcity_id <> city_id )country_id end,province_id = ‘NULL’ or province_idis null)thene.parent_region_idelse province_id end,city_id,CONCAT(‘campaign_’,b.campaign_id),CONCAT(‘interest_’,b.industry_id),CONCAT(‘brand_’,b.brand_id),(CONCAT(‘carrier_’,’wifi’) ELSECONCAT(‘carrier_’,c.element_id) END),CONCAT(‘appcategory_’,d.wizad_category),UID;6.2步骤二语句city_id <> thenconcat()) AS fixeddim,UID,SUM(imp_pv) AS pvFROM test_lmj_mdm_tmpcountry_id <> province_id <> province_id guid,CONCAT(city_id thenconcat()),UID

以下引自网络

七、CONDITIONALFUNCTIONS IN HIVE

Hive supports three types of conditional functions. These functions are listed below:

IF( Test Condition, True Value, False Value )

人生的小河,总要流过森林,荒漠,

hive中使用case、if:一个region统计业务(hive条件函数case、if

相关文章:

你感兴趣的文章:

标签云: