A comparative study of content statistics of coding regions in an evolutionary computation framework for gene prediction

Authors:
Javier Pérez-Rodríguez;Alexis G. Arroyo-Peña;Nicolás García-Pedrajas
Affiliations:
Department of Computing and Numerical Analysis, University of Córdoba, Spain;Department of Computing and Numerical Analysis, University of Córdoba, Spain;Department of Computing and Numerical Analysis, University of Córdoba, Spain
Venue:
IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence
Year:
2012

Citing 3
Cited 0

Text classification using string kernels

The Journal of Machine Learning Research
A Mathematical Theory of Communication

A Mathematical Theory of Communication
Class imbalance methods for translation initiation site recognition in DNA sequences

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The determination of which parts of a DNA sequence are coding is an unsolved and relevant problem in the field of bioinformatics. This problem is called gene prediction or gene finding, and it consists of locating the most likely gene structure in a genomic sequence. Taking into account some restrictions, gene structure prediction may be considered as a search problem. To address the problem, evolutionary computation approaches can be used, although their performance will depend on the discriminative power of the statistical measures employed to extract useful features from the sequence. In this study, we test six different content statistics to determine which of them have higher relevance in an evolutionary search for coding and non-coding regions of human DNA. We conduct this comparative study on the human chromosomes 3, 19 and 21.