Journal of VLSI Signal Processing Systems
An efficient algorithm for planted structured motif extraction
Proceedings of the 1st ACM workshop on Breaking frontiers of computational biology
Establishing a statistic model for recognition of steroid hormone response elements
Computational Biology and Chemistry
Improved pattern-driven algorithms for motif finding in DNA sequences
RECOMB'05 Proceedings of the 2005 joint annual satellite conference on Systems biology and regulatory genomics
A two-block motif discovery method with improved accuracy
ICIC'07 Proceedings of the intelligent computing 3rd international conference on Advanced intelligent computing theories and applications
A two-phase ANN method for genome-wide detection of hormone response elements
PRIB'07 Proceedings of the 2nd IAPR international conference on Pattern recognition in bioinformatics
An algorithm to find all identical motifs in multiple biological sequences
PRIB'10 Proceedings of the 5th IAPR international conference on Pattern recognition in bioinformatics
Lecture notes in computer science: multiple DNA sequence alignment using joint weight matrix
ICCSA'11 Proceedings of the 2011 international conference on Computational science and its applications - Volume Part III
A profile HMM for recognition of hormone response elements
PRIB'06 Proceedings of the 2006 international conference on Pattern Recognition in Bioinformatics
Comparing multiobjective swarm intelligence metaheuristics for DNA motif discovery
Engineering Applications of Artificial Intelligence
Journal of Global Optimization
A parallel cooperative team of multiobjective evolutionary algorithms for motif discovery
The Journal of Supercomputing
Hi-index | 3.84 |
Motivation: Transcription regulatory protein factors often bind DNA as homo-dimers or hetero-dimers. Thus they recognize structured DNA motifs that are inverted or direct repeats or spaced motif pairs. However, these motifs are often difficult to identify owing to their high divergence. The motif structure included explicitly into the motif recognition algorithm improves recognition efficiency for highly divergent motifs as well as estimation of motif geometric parameters. Result: We present a modification of the Gibbs sampling motif extraction algorithm, SeSiMCMC (Sequence Similarities by Markov Chain Monte Carlo), which finds structured motifs of these types, as well as non-structured motifs, in a set of unaligned DNA sequences. It employs improved estimators of motif and spacer lengths. The probability that a sequence does not contain any motif is accounted for in a rigorous Bayesian manner. We have applied the algorithm to a set of upstream regions of genes from two Escherichia coli regulons involved in respiration. We have demonstrated that accounting for a symmetric motif structure allows the algorithm to identify weak motifs more accurately. In the examples studied, ArcA binding sites were demonstrated to have the structure of a direct spaced repeat, whereas NarP binding sites exhibited the palindromic structure. Availability: The WWW interface of the program, its FreeBSD (4.0) and Windows 32 console executables are available at http://bioinform.genetika.ru/SeSiMCMC Contact: favorov@sensi.org Supplementary information: Supplementary material available at http://bioinform.genetika.ru/SeSiMCMC