Finding similar regions in many sequences
Journal of Computer and System Sciences - STOC 1999
Spelling Approximate Repeated or Common Motifs Using a Suffix Tree
LATIN '98 Proceedings of the Third Latin American Symposium on Theoretical Informatics
Identification of weak motifs in multiple biological sequences using genetic algorithm
Proceedings of the 8th annual conference on Genetic and evolutionary computation
Modeling evolutionary fitness for DNA motif discovery
Proceedings of the 11th Annual conference on Genetic and evolutionary computation
A Field Guide to Genetic Programming
A Field Guide to Genetic Programming
Hi-index | 0.00 |
Motif discovery is an important Bioinformatics problem for deciphering gene regulation. Numerous sequence-based approaches have been proposed employing human specialist motif models (evaluation functions), but performance is so unsatisfactory on benchmarks that the underlying information seems to have already been exploited and have doomed. However, we have found that even a simple modified representation still achieves considerably high performance on a challenging benchmark, implying potential for sequence-based motif discovery. Thus we raise the problem of learning motif evaluation functions. We employ Genetic programming (GP) which has the potential to evolve human competitive models. We take advantage of the terminal set containing specialist-model-like components and have tried three fitness functions. Results exhibit both great challenges and potentials. No models learnt can perform universally well on the challenging benchmark, where one reason may be the data appropriateness for sequence-based motif discovery. However, when applied on different widely-tested datasets, the same models achieve comparable performance to existing approaches based on specialist models. The study calls for further novel GP to learn different levels of effective evaluation models from strict to loose ones on exploiting sequence information for motif discovery, namely quantitative functions, cardinal rankings, and learning feasibility classifications.