Finding motifs in the twilight zone
Proceedings of the sixth annual international conference on Computational biology
Combinatorial Approaches to Finding Subtle Signals in DNA Sequences
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Spelling Approximate Repeated or Common Motifs Using a Suffix Tree
LATIN '98 Proceedings of the Third Latin American Symposium on Theoretical Informatics
Bases of Motifs for Generating Repeated Patterns with Wild Cards
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Efficient parallel construction of suffix trees for genomes larger than main memory
Proceedings of the 20th European MPI Users' Group Meeting
Fast computation of entropic profiles for the detection of conservation in genomes
PRIB'13 Proceedings of the 8th IAPR international conference on Pattern Recognition in Bioinformatics
Hi-index | 5.23 |
We address the problem of detecting consensus motifs, that occur with subtle variations, across multiple sequences. These are usually functional domains in DNA sequences such as transcriptional binding factors or other regulatory sites. The problem in its generality has been considered difficult and various benchmark data serve as the litmus test for different computational methods. We present a method centered around unsupervised combinatorial pattern discovery. The parameters are chosen using a careful statistical analysis of consensus motifs. This method works well on the benchmark data and is general enough to be extended to a scenario where the variation in the consensus motif includes indels (along with mutations). We also present some results on detection of transcription binding factors in human DNA sequences.