Challenges rising from learning motif evaluation functions using genetic programming

Authors:
Leung-Yau Lo;Tak-Ming Chan;Kin-Hong Lee;Kwong-Sak Leung
Affiliations:
The Chinese University of Hong Kong, Hong Kong, Hong Kong;The Chinese University of Hong Kong, Hong Kong, Hong Kong;The Chinese University of Hong Kong, Hong Kong, Hong Kong;The Chinese University of Hong Kong, Hong Kong, Hong Kong
Venue:
Proceedings of the 12th annual conference on Genetic and evolutionary computation
Year:
2010

Citing 10
Cited 0

Finding similar regions in many sequences

Journal of Computer and System Sciences - STOC 1999
Spelling Approximate Repeated or Common Motifs Using a Suffix Tree

LATIN '98 Proceedings of the Third Latin American Symposium on Theoretical Informatics
Mining ChIP-chip data for transcription factor and cofactor binding sites

Bioinformatics
Identification of weak motifs in multiple biological sequences using genetic algorithm

Proceedings of the 8th annual conference on Genetic and evolutionary computation
Informative priors based on transcription factor structural class improve de novo motif discovery

Bioinformatics
GAME: detecting cis-regulatory elements using a genetic algorithm

Bioinformatics
Nonlinear ranking function representations in genetic programming-based ranking discovery for personalized search

Decision Support Systems
TFBS identification based on genetic algorithm with combined representations and adaptive post-processing

Bioinformatics
Modeling evolutionary fitness for DNA motif discovery

Proceedings of the 11th Annual conference on Genetic and evolutionary computation
A Field Guide to Genetic Programming

A Field Guide to Genetic Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Motif discovery is an important Bioinformatics problem for deciphering gene regulation. Numerous sequence-based approaches have been proposed employing human specialist motif models (evaluation functions), but performance is so unsatisfactory on benchmarks that the underlying information seems to have already been exploited and have doomed. However, we have found that even a simple modified representation still achieves considerably high performance on a challenging benchmark, implying potential for sequence-based motif discovery. Thus we raise the problem of learning motif evaluation functions. We employ Genetic programming (GP) which has the potential to evolve human competitive models. We take advantage of the terminal set containing specialist-model-like components and have tried three fitness functions. Results exhibit both great challenges and potentials. No models learnt can perform universally well on the challenging benchmark, where one reason may be the data appropriateness for sequence-based motif discovery. However, when applied on different widely-tested datasets, the same models achieve comparable performance to existing approaches based on specialist models. The study calls for further novel GP to learn different levels of effective evaluation models from strict to loose ones on exploiting sequence information for motif discovery, namely quantitative functions, cardinal rankings, and learning feasibility classifications.