shell按文本内容分割文件的脚本

功能一个文本文件需要根据里面的时间戳拆分到不同的按日期命令的文件,可以计算剩余行数。可以断点续传(中途Ctrl+C退出可能下次会写少量已经写过的数据),脚本可以任意路径执行,比如../../split.sh或/tmp/split.sh,记录进度的split.data总是和脚本处于相同目录。脚本用要的技术比较复杂,有多NB自己体会,因为awk调用外部命令和每次关闭句柄,所以性能不是非常高。

数据示例:

1418103343455   175.25.244.178  new     -       javascript%3Avoid(0)    allgame_a       s851.sg2.ledu.com1418103343575   175.25.244.178  new     -       javascript%3Avoid(0)    allgame_a       s851.sg2.ledu.com1418103343751   175.25.244.178  new     -       javascript%3Avoid(0)    allgame_a       s851.sg2.ledu.com1418103149566   119.176.238.210 new     -       javascript%3Avoid(0)    timely  s887.sg2.ledu.com1418103163525   119.176.238.210 new     -       javascript%3Avoid(0)    close_notice+fr+png6    s887.sg2.ledu.com1418103182940   218.57.139.22   new     -       javascript%3Avoid(0)    pack_age_a      s15.ttgj.ledu.com1418103186528   218.57.139.22   new     -       javascript%3Avoid(0)    pack_age_a      s15.ttgj.ledu.com1418103171841   180.104.22.40   new     -       javascript%3Avoid(0)    pack_age_a      s889.sg2.ledu.com1418103196648   218.57.139.22   new     -       %23     put_cross+fl    s15.ttgj.ledu.com141810320543a   218.57.139.22   new     -       javascript%3Avoid(0)    pack_age_a      s9.ttgj.ledu.com

脚本(split.sh):

#!/bin/bashfile=/data/tracepng_logs/access_tracepng.logdir=/data/apache/_flume/crossbar/clicktotalLines=`wc -l $file|cut -d " " -f1`if [[ "$0" =~ "^/" ]];then        fileData=$0else        fileData=`pwd`/$0fifileData=`dirname $fileData`fileData=$fileData/split.data#中途取消可能造成最后一行写入不全(这是不close会出现的状况)lastLine=(`tail -n 2 $fileData`)#消除最后一个n的影响(close之后应该不会出现这种情况)if [ "" = "${lastLine[1]}" ];then        unset lastLine[1]fiif [ ${#lastLine[@]} -eq 1 ] || [ ${lastLine[0]} -gt ${lastLine[1]} ];then        lastLine=${lastLine[0]}else        lastLine=${lastLine[1]}fiecho $lastLine > $fileDataremainLines=`echo "$totalLines-$lastLine"|bc`tail -n +$lastLine $file| awk --re-interval '{        printf "r%s remains",("'"$remainLines"'"-NR);        if($1 ~ /[0-9]{13}/){                time=substr($1,0,10);                file="'"$dir"'""/"f".log"                c="date +%Y-%m-%d -d @"time;                c|getline f;                print $0 >> file                #没有close很有可能导致文件写入不全                close(file)                #没有close可能报错,cmd. line:6: (FILENAME=- FNR=113924) fatal: cannot open pipe `date +%Y-%m-%d -d @1418097392' (打开的文件过多)                close(c)        };        if(NR%1==0){                print "'"$lastLine"'"+NR > "'"$fileData"'"                #没有close会导致$fileData文件是每行追加,并且很有可能写入不完整。                close("'"$fileData"'")        }}'echo

运行中:

[root@210.14.138.94 ~/xiepeng]# ./split.sh 141435 remains

结果:

2001-11-25.log  2008-01-25.log  2011-11-24.log  2014-03-21.log  2014-11-17.log2002-01-01.log  2008-02-25.log  2011-11-25.log  2014-08-25.log  2014-11-18.log2002-01-02.log  2009-01-01.log  2012-05-30.log  2014-09-03.log  2014-11-19.log2002-01-03.log  2009-02-03.log  2012-08-29.log  2014-09-10.log  2014-11-20.log2002-01-04.log  2009-08-10.log  2013-02-14.log  2014-09-24.log  2014-11-21.log2003-01-01.log  2009-09-16.log  2013-03-10.log  2014-10-01.log  2014-11-22.log2003-01-02.log  2010-01-01.log  2013-03-12.log  2014-10-05.log  2014-11-23.log2004-01-01.log  2010-01-19.log  2013-08-19.log  2014-10-18.log  2014-11-24.log2004-11-23.log  2010-06-04.log  2013-08-20.log  2014-10-20.log  2014-11-25.log2006-01-01.log  2010-09-05.log  2013-08-22.log  2014-10-25.log  2014-11-26.log2006-07-23.log  2010-10-20.log  2013-08-24.log  2014-10-26.log  2014-11-27.log2006-08-31.log  2011-01-01.log  2013-11-19.log  2014-11-01.log  2014-12-05.log2006-11-25.log  2011-01-03.log  2013-11-20.log  2014-11-03.log  2014-12-20.log2007-01-30.log  2011-01-05.log  2013-11-21.log  2014-11-04.log  2014-12-25.log2007-02-01.log  2011-03-09.log  2013-11-22.log  2014-11-07.log  2015-01-24.log2007-02-02.log  2011-04-04.log  2013-11-23.log  2014-11-08.log  2022-08-27.log2007-02-08.log  2011-04-25.log  2013-11-24.log  2014-11-10.log2007-09-11.log  2011-07-03.log  2013-11-25.log  2014-11-13.log2008-01-10.log  2011-11-19.log  2014-01-05.log  2014-11-15.log2008-01-14.log  2011-11-23.log  2014-01-25.log  2014-11-16.log

?

?

shell按文本内容分割文件的脚本

相关文章:

你感兴趣的文章:

标签云: