[Hive]MapReduce将数据写入Hive分区表

业务需求:

将当天产生的数据写入Hive分区表中(以日期作为分区)

业务分析:

利用MapReduce将数据写入Hive表实则上就是将数据写入至Hive表的HDFS目录下,但是问题在于写入至当天的分区,因此问题转换为:如何事先创建Hive表的当天分区

解决方案:

1.创建Hive表

# 先创建分区表rcmd_valid_pathhive -e "set mapred.job.queue.name=pms;drop table if exists pms.test_rcmd_valid_path;create table if not exists pms.test_rcmd_valid_path (track_id string,track_time string,session_id string,gu_id string,end_user_id string,page_category_id bigint,algorithm_id int,is_add_cart int,rcmd_product_id bigint,product_id bigint,path_id string,path_type string,path_length int,path_list string,order_code string,groupon_id bigint)partitioned by (ds string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n';"2.创建表的date当天分区(若分区不存在则创建)# 创建正式表rcmd_valid_path表date当天的分区目录hive -e "set mapred.job.queue.name=pms;insert overwrite table pms.test_rcmd_valid_path partition(ds='$date')select track_id,track_time,session_id,gu_id,end_user_id,page_category_id,algorithm_id,is_add_cart,rcmd_product_id,product_id,path_id,path_type,path_length,path_list,order_code,groupon_id from pms.test_rcmd_valid_path where ds = '$date';" 3.Job直接写入即可(留意job2OutputPath)hadoop jar lib/bigdata-datamining-1.1-user-trace-jar-with-dependencies.jar com.yhd.datamining.data.usertrack.offline.job.mapred.TrackPathJob \–similarBrandPath /user/pms/recsys/algorithm/schedule/warehouse/relation/brand/$yesterday \–similarCategoryPath /user/pms/recsys/algorithm/schedule/warehouse/relation/category/$yesterday \–mcSiteCategoryPath /user/hive/warehouse/mc_site_category \–extractPreprocess /user/hive/warehouse/test_extract_preprocess \–engineMatchRule /user/pms/recsys/algorithm/schedule/warehouse/mix/artificial/product/$yesterday \–artificialMatchRule /user/pms/recsys/algorithm/schedule/warehouse/ruleengine/artificial/product/$yesterday \–category /user/hive/warehouse/category \–keywordCategoryTopN 3 \–termCategory /user/hive/pms/temp_term_category \–extractGrouponInfo /user/hive/pms/extract_groupon_info \–extractProductSerial /user/hive/pms/product_serial_id \–job1OutputPath /user/pms/workspace/ouyangyewei/testUsertrack/job1Output \–job2OutputPath /user/hive/pms/test_rcmd_valid_path/ds=$date

,青春一经典当即永不再赎

[Hive]MapReduce将数据写入Hive分区表

相关文章:

你感兴趣的文章:

标签云: