java实现字符串匹配问题之求两个字符串的最大公共子串

转载请注明出处:

最近在项目工作中有一个关于文本对比的需求,经过这段时间的学习,总结了这篇博客内容:求两个字符串的最大公共子串。

算法思想:基于图计算两字符串的公共子串。具体算法思想参照下图:

输入字符串S1:achmacmh 输入字符串S2:macham

1)第a步,是将字符串s1,s2分别按字节拆分,构成一个二维数组;

2)二维数组中的值如b所示,比如第一行第一列的值表示字符串s2和s1的第一个字节是否相等,若相等就是1,否则就是0,最终产生b所示的二维数组;

3)分别求二维数组中斜线上的公共因子(斜线为元素a右下角值,即a[i][j]的下一个元素是a[i+1][j+1];公共因子为1所在的位置构成的字符串);

4)对所有公共因子排序,返回最大的公共因子的值。

具体的实现代码如下所示:

package cn.lulei.compare;import java.util.ArrayList;import java.util.Collections;import java.util.Comparator;import java.util.List;public class StringCompare {private int a;private int b;public String getMaxLengthCommonString(String s1, String s2) {if (s1 == null || s2 == null) {return null;}a = s1.length();//s1长度做行b = s2.length();//s2长度做列if(a== 0 || b == 0) {return "";}//设置匹配矩阵boolean [][] array = new boolean[a][b];for (int i = 0; i < a; i++) {char c1 = s1.charAt(i);for (int j = 0; j < b; j++) {char c2 = s2.charAt(j);if (c1 == c2) {array[i][j] = true;} else {array[i][j] = false;}}}//求所有公因子字符串,保存信息为相对第二个字符串的起始位置和长度List<ChildString> childStrings = new ArrayList<ChildString>();for (int i = 0; i < a; i++) {getMaxSort(i, 0, array, childStrings);}for (int i = 1; i < b; i++) {getMaxSort(0, i, array, childStrings);}//排序sort(childStrings);if (childStrings.size() < 1) {return "";}//返回最大公因子字符串int max = childStrings.get(0).maxLength;StringBuffer sb = new StringBuffer();for (ChildString s: childStrings) {if (max != s.maxLength) {break;}sb.append(s2.substring(s.maxStart, s.maxStart + s.maxLength));sb.append("\n");}return sb.toString();}//排序,倒叙private void sort(List<ChildString> list) {Collections.sort(list, new Comparator<ChildString>(){public int compare(ChildString o1, ChildString o2) {return o2.maxLength – o1.maxLength;}});}//求一条斜线上的公因子字符串private void getMaxSort(int i, int j, boolean [][] array, List<ChildString> sortBean) {int length = 0;int start = j;for (; i < a && j < b; i++,j++) {if (array[i][j]) {length++;} else {sortBean.add(new ChildString(length, start));length = 0;start = j + 1;}if (i == a-1 || j == b-1) {sortBean.add(new ChildString(length, start));}}}//公因子类class ChildString {int maxLength;int maxStart;ChildString(int maxLength, int maxStart){this.maxLength = maxLength;this.maxStart = maxStart;}}/** * @param args */public static void main(String[] args) {// TODO Auto-generated method stubSystem.out.println(new StringCompare().getMaxLengthCommonString("achmacmh", "macham"));}}

程序最终执行结果是:

对于两个文件的比对个人认为可以参照这种算法思想(自己现在并为实现),在日后的博客中将会写到。

上述实现过程中,用数组保存了所有的公共子串信息,然后排序取最大的子串,这种做法如果只是求最大子串的话,算法就不是很合理,,因此做了如下修改,List只保存当前计算中最大的子串,具体实现如下:

/** *@Description: 字符串比较*/ package com.lulei.test;import java.util.ArrayList;import java.util.List;public class StringCompare {private int a;private int b;private int maxLength = -1;public String getMaxLengthCommonString(String s1, String s2) {if (s1 == null || s2 == null) {return null;}a = s1.length();//s1长度做行b = s2.length();//s2长度做列if(a== 0 || b == 0) {return "";}//设置匹配矩阵boolean [][] array = new boolean[a][b];for (int i = 0; i < a; i++) {char c1 = s1.charAt(i);for (int j = 0; j < b; j++) {char c2 = s2.charAt(j);if (c1 == c2) {array[i][j] = true;} else {array[i][j] = false;}}}//求所有公因子字符串,保存信息为相对第二个字符串的起始位置和长度List<ChildString> childStrings = new ArrayList<ChildString>();for (int i = 0; i < a; i++) {getMaxSort(i, 0, array, childStrings);}for (int i = 1; i < b; i++) {getMaxSort(0, i, array, childStrings);}StringBuffer sb = new StringBuffer();for (ChildString s: childStrings) {sb.append(s2.substring(s.maxStart, s.maxStart + s.maxLength));sb.append("\n");}return sb.toString();}//求一条斜线上的公因子字符串private void getMaxSort(int i, int j, boolean [][] array, List<ChildString> sortBean) {int length = 0;int start = j;for (; i < a && j < b; i++,j++) {if (array[i][j]) {length++;} else {//直接add,保存所有子串,下面的判断,只保存当前最大的子串//sortBean.add(new ChildString(length, start));if (length == maxLength) {sortBean.add(new ChildString(length, start));} else if (length > maxLength) {sortBean.clear();maxLength = length;sortBean.add(new ChildString(length, start));}length = 0;start = j + 1;}if (i == a-1 || j == b-1) {//直接add,保存所有子串,下面的判断,只保存当前最大的子串//sortBean.add(new ChildString(length, start));if (length == maxLength) {sortBean.add(new ChildString(length, start));} else if (length > maxLength) {sortBean.clear();maxLength = length;sortBean.add(new ChildString(length, start));}}}}//公因子类class ChildString {int maxLength;int maxStart;ChildString(int maxLength, int maxStart){this.maxLength = maxLength;this.maxStart = maxStart;}}/** * @param args */public static void main(String[] args) {// TODO Auto-generated method stubSystem.out.println(new StringCompare().getMaxLengthCommonString("abcdef", "defabc"));}}

然后拍一些美得想哭的照片,留给老年的自己。

java实现字符串匹配问题之求两个字符串的最大公共子串

相关文章:

你感兴趣的文章:

标签云: