功能一个文本文件需要根据里面的时间戳拆分到不同的按日期命令的文件,可以计算剩余行数。可以断点续传(中途Ctrl+C退出可能下次会写少量已经写过的数据),脚本可以任意路径执行,比如../../split.sh或/tmp/split.sh,记录进度的split.data总是和脚本处于相同目录。脚本用要的技术比较复杂,有多NB自己体会,因为awk调用外部命令和每次关闭句柄,所以性能不是非常高。
数据示例:
1418103343455 175.25.244.178 new - javascript%3Avoid(0) allgame_a s851.sg2.ledu.com1418103343575 175.25.244.178 new - javascript%3Avoid(0) allgame_a s851.sg2.ledu.com1418103343751 175.25.244.178 new - javascript%3Avoid(0) allgame_a s851.sg2.ledu.com1418103149566 119.176.238.210 new - javascript%3Avoid(0) timely s887.sg2.ledu.com1418103163525 119.176.238.210 new - javascript%3Avoid(0) close_notice+fr+png6 s887.sg2.ledu.com1418103182940 218.57.139.22 new - javascript%3Avoid(0) pack_age_a s15.ttgj.ledu.com1418103186528 218.57.139.22 new - javascript%3Avoid(0) pack_age_a s15.ttgj.ledu.com1418103171841 180.104.22.40 new - javascript%3Avoid(0) pack_age_a s889.sg2.ledu.com1418103196648 218.57.139.22 new - %23 put_cross+fl s15.ttgj.ledu.com141810320543a 218.57.139.22 new - javascript%3Avoid(0) pack_age_a s9.ttgj.ledu.com
脚本(split.sh):
#!/bin/bashfile=/data/tracepng_logs/access_tracepng.logdir=/data/apache/_flume/crossbar/clicktotalLines=`wc -l $file|cut -d " " -f1`if [[ "$0" =~ "^/" ]];then fileData=$0else fileData=`pwd`/$0fifileData=`dirname $fileData`fileData=$fileData/split.data#中途取消可能造成最后一行写入不全(这是不close会出现的状况)lastLine=(`tail -n 2 $fileData`)#消除最后一个n的影响(close之后应该不会出现这种情况)if [ "" = "${lastLine[1]}" ];then unset lastLine[1]fiif [ ${#lastLine[@]} -eq 1 ] || [ ${lastLine[0]} -gt ${lastLine[1]} ];then lastLine=${lastLine[0]}else lastLine=${lastLine[1]}fiecho $lastLine > $fileDataremainLines=`echo "$totalLines-$lastLine"|bc`tail -n +$lastLine $file| awk --re-interval '{ printf "r%s remains",("'"$remainLines"'"-NR); if($1 ~ /[0-9]{13}/){ time=substr($1,0,10); file="'"$dir"'""/"f".log" c="date +%Y-%m-%d -d @"time; c|getline f; print $0 >> file #没有close很有可能导致文件写入不全 close(file) #没有close可能报错,cmd. line:6: (FILENAME=- FNR=113924) fatal: cannot open pipe `date +%Y-%m-%d -d @1418097392' (打开的文件过多) close(c) }; if(NR%1==0){ print "'"$lastLine"'"+NR > "'"$fileData"'" #没有close会导致$fileData文件是每行追加,并且很有可能写入不完整。 close("'"$fileData"'") }}'echo
运行中:
[root@210.14.138.94 ~/xiepeng]# ./split.sh 141435 remains
结果:
2001-11-25.log 2008-01-25.log 2011-11-24.log 2014-03-21.log 2014-11-17.log2002-01-01.log 2008-02-25.log 2011-11-25.log 2014-08-25.log 2014-11-18.log2002-01-02.log 2009-01-01.log 2012-05-30.log 2014-09-03.log 2014-11-19.log2002-01-03.log 2009-02-03.log 2012-08-29.log 2014-09-10.log 2014-11-20.log2002-01-04.log 2009-08-10.log 2013-02-14.log 2014-09-24.log 2014-11-21.log2003-01-01.log 2009-09-16.log 2013-03-10.log 2014-10-01.log 2014-11-22.log2003-01-02.log 2010-01-01.log 2013-03-12.log 2014-10-05.log 2014-11-23.log2004-01-01.log 2010-01-19.log 2013-08-19.log 2014-10-18.log 2014-11-24.log2004-11-23.log 2010-06-04.log 2013-08-20.log 2014-10-20.log 2014-11-25.log2006-01-01.log 2010-09-05.log 2013-08-22.log 2014-10-25.log 2014-11-26.log2006-07-23.log 2010-10-20.log 2013-08-24.log 2014-10-26.log 2014-11-27.log2006-08-31.log 2011-01-01.log 2013-11-19.log 2014-11-01.log 2014-12-05.log2006-11-25.log 2011-01-03.log 2013-11-20.log 2014-11-03.log 2014-12-20.log2007-01-30.log 2011-01-05.log 2013-11-21.log 2014-11-04.log 2014-12-25.log2007-02-01.log 2011-03-09.log 2013-11-22.log 2014-11-07.log 2015-01-24.log2007-02-02.log 2011-04-04.log 2013-11-23.log 2014-11-08.log 2022-08-27.log2007-02-08.log 2011-04-25.log 2013-11-24.log 2014-11-10.log2007-09-11.log 2011-07-03.log 2013-11-25.log 2014-11-13.log2008-01-10.log 2011-11-19.log 2014-01-05.log 2014-11-15.log2008-01-14.log 2011-11-23.log 2014-01-25.log 2014-11-16.log
?
?
原文地址:shell按文本内容分割文件的脚本, 感谢原作者分享。 当你感到悲哀痛苦时,最好是去学些什么东西。