使用hadoop平台进行小型网站日志分析

0.上传日志文件到linux中，通过flume将文件收集到hdfs中。执行命令/home/cloud/flume/bin/flume-ng agent -n a4 -c conf -f /home/cloud/flume/conf/a4.conf -Dflume.root.logger=DEBUG,console1.建立hive表create external table bbslog (ip string,logtime string,url string) partitioned by (logdate string) row format delimited fields terminated by ‘\t’ location ‘/cleaned’;2.创建shell脚本touch daily.sh添加执行权限chmod +x daily.shdaily.sh:CURRENT=`date +%Y%m%d`#对数据进行清理，保存到cleaned文件夹，按照当前日期进行保存/home/cloud/hadoop/bin/hadoop jar /home/cloud/cleaner.jar /flume/$CURRENT /cleaned/$CURRENT#修改hive表，，添加当前日期的分区/home/cloud/hive/bin/hive -e "alter table bbslog add partition (logdate=$CURRENT) location ‘cleaned/$CURRENT’"#使用hive进行分析，根据业务需求而定#统计pv并计入每日的pv表/home/cloud/hive/bin/hive -e "create table pv_$CURRENT row format delimited fields terminated by ‘\t’ as select count(*) from bbslog where logdate=$CURRENT;"#统计点击次数过20的潜在用户/home/cloud/hive/bin/hive -e "create table vip _$CURRENT row format delimited fields terminated by ‘\t’ as select $CURRENT,ip,count(*) as hits from bbslog where logdate=$CURRENT group by ip having hits > 20 order by hits desc"#查询uv/home/cloud/hive/bin/hive -e "create table uv_$CURRENT row format delimited fields terminated by ‘\t’ as select count(distinct ip) from bbslog where logdate=$CURRENT"#查询每天的注册人数/home/cloud/hive/bin/hive -e "create table reg_$CURRENT row format delimited fields terminated by ‘\t’ as select count(*) from bbslog where logdate=$CURRENT AND instr(url,’member.php?mod=register’)>0"#将hive表中的数据导入mysql/home/cloud/sqoop/bin/sqoop export –connect jdbc:mysql://cloud3:3306/jchubby –username root –password JChubby123 –export-dir "/user/hive/warehouse/vip_$CURRENT" –table vip –fields-terminated-by ‘\t’

我知道我不是一个很好的记录者，但我比任何人都喜欢回首自己来时的路，

相关文章：

你感兴趣的文章：

标签云：