linux awk简要笔记 - IDC云服务

linux awk简明笔记
awk是一个简单强大的文本过滤报告工具。

简介

其基本结构为 pattern { action }

awk是一个面向行的编程工具。pattern定义了怎么匹配一个行，action定义了匹配成功的动作。默认为所有的行都匹配，BEGIN 和 END 关键字，一个在所有行处理之前执行，一个在所有行处理之后执行。

awk把每一行看作一些field，$0代表整行，$1代表第一个field，等等。

和很多脚本语言不一样的是，$number不代表一个变量，而是一个field。
也不对””之中的符号进行特别解释。

换行不能乱用，可以
{ action1 }
{ action2 }

awk中有3种常量，字符串，数字，正则表达式。
数字和字符串的转换：
字符串转数字，贪婪算法转该字符串的顶头子字符串，如果不行为0。
数字转字符串，CONVFMT控制。整数不受CONVFMT控制。

awk的底层类型有两种，数字，字符串，一个变量可以既是数字，又是字符串，底层保有2个值。原因嘛，效率，精度。

运算符

比较的时候，使用的是最后赋值的类型，而不是最后使用的类型。
对于不确定类型（如传入参数，域）等，能数字比较的数字比较，不能的话变为字符串比较。

和c一脉相传的东西（很相似）

运算符
二元 + – * / % <space> <space>为字符串连接操作符。
一元 +(正号) -(负号)
自增运算符 ++ —
+= -= *= /= %=
=(赋值运算符) ()可以明确运算符的优先级。

逻辑符
== != > >= < <=

&& || !

检查一个字符串是否匹配
!~ /正则表达式/ 不匹配正则表达式
~ /正则表达式/ 匹配正则表达式

命令

if ( conditional ) statement [ else statement ]
while ( conditional ) statement
for ( expression ; conditional ; expression ) statement
for ( variable in array ) statement
break
continue
{ [ statement ] ...}
variable=expression
print [ expression-list ] [ > expression ]
printf format [ , expression-list ] [ > expression ]
next  这个挺好用的，跳出当前处理的行。
exit   这个和一般的exit不同，跳过所有的行，但是会执行END。

内置常量

FS – The Input Field Separator Variable

用-F可以指定分隔符。FS是build in的分隔符变量。

经典例子
行为 One Two:Three:4 Five

{
	print $2
	FS=":"
	print $2
}

输出 Two:Three:4 Five两次。
防止side-effect，FS的设置在read line之后有效，在一次read line之中变量不会改变，可以消除很多bug。

OFS – The Output Field Separator Variable
NF – The Number of Fields Variable
NR – The Number of Records Variable
RS – The Record Separator Variable
ORS – The Output Record Separator Variable
FILENAME – The Current Filename Variable

Associative Arrays
Instead of using a number to find an entry in an array, use anything you want.
关联数组的key全部都是string。

直接引用数组中不存在的key会创建一个pair。
用 key in array 可以测试该数组是否有一个key的pair并且不创建pair。

delete：删除元素。

BEGIN {
	username[""]=0;
}
{
	username[$3]++;
}
END {
	for (i in username) {
		if (i != "") {
			print username[i], i;
		}
	}
}

Multi-dimensional Arrays
a[1,2] = y; 这个不行。
a[1 “,” 2] = y; 记住，<space>是字符串连接。

Example of using AWK’s Associative Arrays
1 记住初始化一个值。
2 好的变量命名。
3 确保输入正确。
4 给每一个field起一个好名字。
5 一遍遍历，得到所有东西。

一个好例子，值得好好研究，用户，组，文件所占空间大小检测程序。

#!/bin/sh

find . -type f -print | xargs /usr/bin/ls -islg |

awk '

BEGIN {

# initialize all arrays used in for loop

	u_count[""]=0;

	g_count[""]=0;

	ug_count[""]=0;

	all_count[""]=0;

}

{

# validate your input

        if (NF != 11) {

#	ignore

	} else {

# assign field names

		inode=$1;

		size=$2;

		linkcount=$4;

		user=$5;

		group=$6; 
# should I count this file?
		doit=0;

		if (linkcount == 1) {

# only one copy - count it

			doit++;

		} else {

# a hard link - only count first one

			seen[inode]++;

			if (seen[inode] == 1) {

				doit++;

			}

		}

# if doit is true, then count the file

		if (doit ) {
# total up counts in one pass

# use description array names

# use array index that unifies the arrays
# first the counts for the number of files
			u_count[user " *"]++;

			g_count["* " group]++;

			ug_count[user " " group]++;

			all_count["* *"]++;
# then the total disk space used
			u_size[user " *"]+=size;

			g_size["* " group]+=size;

			ug_size[user " " group]+=size;

			all_size["* *"]+=size;

		}

        }

}

END {

# output in

相关文章：

你感兴趣的文章：

标签云：