A new method for finding approximate repetitions in DNA sequences
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Frequent patterns mining in multiple biological sequences
Computers in Biology and Medicine
Hi-index | 0.00 |
This paper proposes a new concept of repetitions, the Largest Pattern Repetition (the LPR) and a concept of pattern unit. A lightweight index structure, namely, the Succeeding Unit Array (the SUA) is designed based on pattern unit. The SUA decreases the space consumption efficiently and solves the space bottleneck in the search of repetitions. On the SUA all the atomic patterns which constitute the LPRs can be detected and the LPRs can be identified by connecting the same patterns. The theoretical analysis and experimental results show that both space and time complexity of the algorithms is O(n).