DNA Motif Representation with Nucleotide Dependency
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
An efficient motif discovery algorithm with unknown motif length and number of binding sites
International Journal of Data Mining and Bioinformatics
Improved pattern-driven algorithms for motif finding in DNA sequences
RECOMB'05 Proceedings of the 2005 joint annual satellite conference on Systems biology and regulatory genomics
A two-block motif discovery method with improved accuracy
ICIC'07 Proceedings of the intelligent computing 3rd international conference on Advanced intelligent computing theories and applications
Generalized planted (l,d)-motif problem with negative set
WABI'05 Proceedings of the 5th International conference on Algorithms in Bioinformatics
Hi-index | 3.84 |
Motivation: Finding common patterns, or motifs, in the promoter regions of co-expressed genes is an important problem in bioinformatics. A common representation of the motif is by probability matrix or PSSM (position specific scoring matrix). However, even for a motif of length six or seven, there is no algorithm that can guarantee finding the exact optimal matrix from an infinite number of possible matrices. Results: This paper introduces the first algorithm, called EOMM, for finding the exact optimal matrix-represented motif, or simply optimal motif. Based on branch-and-bound searching by partitioning the solution space recursively, EOMM can find the optimal motif of size up to eight or nine, and a motif of larger size with any desired accuracy on the principle that the smaller the error bound, the longer the running time. Experiments show that for some real and simulated data sets, EOMM finds the motif despite very weak signals when existing software, such as MEME and MITRA-PSSM, fails to do so. Availability: Contact: cmleung2@cs.hku.hk