An Optimal DNA Segmentation Based on the MDL Principle
CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Identifying frequent items in sliding windows over on-line packet streams
Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
Discovering Frequent Arrangements of Temporal Intervals
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Optimal segmentation using tree models
Knowledge and Information Systems
Artificial Intelligence in Medicine
Mining frequent arrangements of temporal intervals
Knowledge and Information Systems
Hi-index | 0.00 |
We study the problem of mining poly-regions in DNA. A poly-region is defined as a bursty DNA area, i.e., area of elevated frequency of a DNA pattern. We introduce a general formulation that covers a range of meaningful types of poly-regions and develop three efficient detection methods. The first applies recursive segmentation and is entropy-based. The second uses a set of sliding windows that summarize each sequence segment using several statistics. Finally, the third employs a technique based on majority vote. The proposed algorithms are tested on DNA sequences of four different organisms in terms of recall and runtime.