Pattern和Matcher详解（字符串匹配）

一：起因

（1）Java里面进行字符串操作，第一个想到的是String类和 StringBuilder类内含replace() 、replaceAll() 、split()、matches()等方法 —— 其实String类里面的 public String[] split(String regex, int limit) 和 matches()方法，调用是Pattern.compile().matches()方法 —– 源码为：

return Pattern.compile(regex).split(this, limit);

（2）String类里面的public String replaceAll(String regex, String replacement)方法也是一样的 —– 源码为：

public String replaceAll(String regex, String replacement) { return Pattern.compile(regex).matcher(this).replaceAll(replacement); }

（3）String类里面的public boolean matches(String regex)方法也是一样的 —– 源码为：

public boolean matches(String regex) { return Pattern.matches(regex, this); }

（4）相对于Java， C++语言就没有那么高大上了，连split() 和 trim()函数都没有，但是可以通过c++string类里面的其他函数实现，例如find_first_of / find_not_first_of() 等。

（5）java正则表达式通过java.util.regex包下的Pattern类与Matcher类实现，(建议在阅读本文时,打开java API文档,当介绍到哪个方法时,查看java API中的方法说明,效果会更佳).

二：详解

（1）Pattern类用于创建一个正则表达式,也可以说创建一个匹配模式,它的构造方法是私有的,不可以直接创建,但可以通过Pattern.complie(String regex)简单工厂方法创建一个正则表达式,轮到Matcher类登场了,Pattern.matcher(CharSequence input)返回一个Matcher对象.Matcher类的构造方法也是私有的,不能随意创建,只能通过Pattern.matcher(CharSequence input)方法得到该类的实例：

（2）find()对字符串进行匹配,匹配到的字符串可以在任何位置. 当使用matches(),lookingAt(),find()执行匹配操作后,就可以利用如下三个方法得到更详细的信息Mathcer.start()/ Matcher.end()/ Matcher.group()start()返回匹配到的子字符串在字符串中的索引位置.end()返回匹配到的子字符串的最后一个字符在字符串中的索引位置.group()返回匹配到的子字符串

Pattern p=Pattern.compile("\\d+"); Matcher m=p.matcher("aaa2223bb"); m.find();//匹配2223 m.start();//返回3 m.end();//返回7,返回的是2223后的索引号 m.group();//返回2223 Mathcer m2=p.matcher("2223bb"); m2.lookingAt(); //匹配2223 m2.start(); //返回0,由于lookingAt()只能匹配前面的字符串,所以当使用lookingAt()匹配时,start()方法总是返回0 m2.end(); //返回4 m2.group(); //返回2223 Matcher m3=p.matcher("2223"); //如果Matcher m3=p.matcher("2223bb"); 那么下面的方法出错，因为不匹配返回falsem3.matches(); //匹配整个字符串 m3.start(); //返回0m3.end(); //返回3,原因相信大家也清楚了,因为matches()需要匹配所有字符串 m3.group(); //返回2223说说正则表达式的分组在java中是怎么使用的. start(),end(),group()均有一个重载方法它们是start(int i),end(int i),group(int i)专用于分组操作,Mathcer类还有一个groupCount()用于返回有多少组.三：案例回放

（1）简单练习

public static void main(String[] args) {String phones1 = "MKY 的手机号码：0939-100391"+"XL 的手机号码：0939-666888aaaa"+"LJ 的手机号码：0952-600391"+"XQZ 的手机号码：0939-550391";;String regex = ".*0939-\\d{6}";Pattern pattern = Pattern.compile(regex);Matcher matcher = pattern.matcher(phones1);while(matcher.find()) {System.out.println(matcher.group()+"&&&&&");System.out.println("start: " + matcher.start());System.out.println("end: " + matcher.end());}// 仅仅返回一个结果的,返回最长的结果，而不是一截一截的String phones2 = "LJ 的手机号码：0952-600391\r\n"+"XQZ 的手机号码：0939-550391";//重用patternmatcher = pattern.matcher(phones2);while(matcher.find()) {System.out.println(matcher.group());}// 另外一个patternString text = "abcdebcadxbc";regex = ".bc";Pattern pattern2 = Pattern.compile(regex);matcher = pattern2.matcher(text);while(matcher.find()) {System.out.println(matcher.group()+"****");System.out.println("start: " + matcher.start());System.out.println("end: " + matcher.end());}// 返回结果的是多个的System.out.println("*************");// 下面是两个非常重要的pattern = Pattern.compile("<.+?>", Pattern.DOTALL);matcher = pattern.matcher("<a href=\&;index.html\&;>主页</a>");String string = matcher.replaceAll("");System.out.println(string);pattern = Pattern.compile("href=\&;(.+?)\&;");matcher = pattern.matcher("<a href=\&;index.html\&;>主页</a>");if(matcher.find())System.out.println(matcher.group(1));}（2）Java 正则表达式（此部分为转载）

现在通过一些实验来说明正则表达式的匹配规则,这儿是Greedy方式 . 任何字符a? a一次或一次也没有a* a零次或多次a+ a一次或多次a{n}? a恰好 n 次a{n,}? a至少n次a{n,m}? a至少n次，但是不超过m次//初步认识. * + ? p("a".matches("."));//true p("aa".matches("aa"));//true p("aaaa".matches("a*"));//true p("aaaa".matches("a+"));//true p("".matches("a*"));//true p("aaaa".matches("a?"));//false p("".matches("a?"));//true p("a".matches("a?"));//true p("1232435463685899".matches("\\d{3,100}"));//true p("192.168.0.aaa".matches("\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}"));//false p("192".matches("[0-2][0-9][0-9]"));//true [abc] a、b 或 c（简单类）[^abc] 任何字符，，除了 a、b 或 c（否定）[a-zA-Z] a 到 z 或 A 到 Z，两头的字母包括在内（范围）[a-d[m-p]] a 到 d 或 m 到 p：[a-dm-p]（并集）[a-z&&[def]] d、e 或 f（交集）[a-z&&[^bc]] a 到 z，除了 b 和 c：[ad-z]（减去）[a-z&&[^m-p]] a 到 z，而非 m 到 p：[a-lq-z]（减去）//范围 p("a".matches("[abc]"));//true p("a".matches("[^abc]"));//false p("A".matches("[a-zA-Z]"));//true p("A".matches("[a-z]|[A-Z]"));//true p("A".matches("[a-z[A-Z]]"));//true p("R".matches("[A-Z&&[RFG]]"));//true\d 数字：[0-9]\D 非数字： [^0-9]\s 空白字符：[ \t\n\x0B\f\r]\S 非空白字符：[^\s]\w 单词字符：[a-zA-Z_0-9]\W 非单词字符：[^\w] //认识\s \w \d \ p("\n\r\t".matches("\\s(4)"));//false p(" ".matches("\\S"));//false p("a_8 ".matches("\\w(3)"));//false p("abc888&^%".matches("[a-z]{1,3}\\d+[&^#%]+"));//true p("\\".matches("\\\\"));//true边界匹配器 ^ 行的开头 $ 行的结尾 \b 单词边界 \B 非单词边界 \A 输入的开头 \G 上一个匹配的结尾 \Z 输入的结尾，仅用于最后的结束符（如果有的话） \z 输入的结尾//边界匹配 p("hello sir".matches("^h.*"));//true p("hello sir".matches(".*ir$"));//true p("hello sir".matches("^h[a-z]{1,3}o\\b.*"));//true p("hellosir".matches("^h[a-z]{1,3}o\\b.*"));//false //空白行:一个或多个(空白并且非换行符)开头，并以换行符结尾 p(" \n".matches("^[\\s&&[^\\n]]*\\n$"));//true重点说明表示<>之间有任意一个（含）字符以上，括号表示捕获组，匹配后可以单独提取出括号内的内容，?代表最短匹配，比如<asdf>>>这样的输入，有？会匹配成<asdf>，没有？会匹配整个<asdf>>>。str.ReplactAll("<(.)+?>","")就是把所有<>间有一个字符以上的文字都替换为空。比如asdf<1234>jkl<>会变成asdfjkl<>另外要是str_line.replaceAll("&(.)+?;"," ") 是将&开头的包含任意字符的右面的最短匹配并以;结束的都替换成为空

生活的最大悲剧不是失败，而是一个人已经习惯于失败。

相关文章：

你感兴趣的文章：

标签云：