[LeetCode] Repeated DNA Sequences

Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",Return:["AAAAACCCCC", "CCCCCAAAAA"].

解题思路:

1、用map存储已经扫描过的子串,并对之计数。时间复杂度为O(n)。代码如下:

class Solution {public:vector<string> findRepeatedDnaSequences(string s) {map<string, int> count;vector<string> result;int len = s.length();for(int i=0; i<len-10; i++){string str = s.substr(i, 10);map<string, int>::iterator it=count.find(str);if(it!=count.end()){count[str]=1;}else{if(it->second==1){result.push_back(str);}count[str]++;}}return result;}};但是会报内存溢出错误。

2、为AGCT分别编码,共有4种,故只需两位便能编码。共十个字符,只需20位就能表示任何一种组合。int类型32位,因此可以用一个int类型来存储10个字符串。每检查一个字符之后,需要将最高位置为0。代码如下:

class Solution {public:vector<string> findRepeatedDnaSequences(string s) {const int subStrLen = 10;const int mask = 0x3ffff;map<int, int> count;map<char, int> cCode;cCode['A']=0;cCode['C']=1;cCode['G']=2;cCode['T']=3;vector<string> result;int len = s.length();int code=0;if(len>subStrLen){string str = s.substr(0, subStrLen);for(int i=0; i<subStrLen; i++){code <<= 2;code |= cCode[str[i]];}count[code] = 1;}for(int i=subStrLen; i<len; i++){code &= mask; //清空最高位code <<= 2;code |= cCode[s[i]];count[code]++;if(count[code]==2){result.push_back(s.substr(i-subStrLen + 1, subStrLen));}}return result;}};在leetCode中,我跑上面的代码需要269ms,看了很多其他人的时间,都比我快很多。不知各位有更好的方法没有。

,在乎的是看风景的心情,旅行不会因为美丽的风景终止。

[LeetCode] Repeated DNA Sequences

相关文章:

你感兴趣的文章:

标签云: