All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",
Return:
["AAAAACCCCC", "CCCCCAAAAA"].
Solution1:Hashset<string> 查重
思路:
Time Complexity: O(N) Space Complexity: O(10N)
Solution2:先encode 再Hashset<int> 查重
Time Complexity: O(N) Space Complexity: O(N)
Solution1 Code:
public List<String> findRepeatedDnaSequences(String s) {
Set seen = new HashSet(), repeated = new HashSet();
for (int i = 0; i + 9 < s.length(); i++) {
String ten = s.substring(i, i + 10);
if (!seen.add(ten))
repeated.add(ten);
}
return new ArrayList(repeated);
}
Solution2 Code:
class Solution {
private char[] encode_map = new char[26];
public List<String> findRepeatedDnaSequences(String s) {
// init
Set<Integer> seen = new HashSet<Integer>();
Set<String> repeated = new HashSet<String>();
//encode_map['A' - 'A'] = 0;
encode_map['C' - 'A'] = 1;
encode_map['G' - 'A'] = 2;
encode_map['T' - 'A'] = 3;
// sliding window
for (int i = 0; i + 9 < s.length(); i++) {
String ten = s.substring(i, i + 10);
int code = encode(ten);
if (!seen.add(code))
repeated.add(ten);
}
// result
return new ArrayList(repeated);
}
private int encode(String s) {
int code = 0;
for(int j = 0; j < 10; j++) {
code <<= 2;
code |= encode_map[s.charAt(j) - 'A'];
}
return code;
}
}