Enhancing coding potential prediction for short sequences using complementary sequence features and feature selection

Authors:
Yvan Saeys;Yves Van De Peer
Affiliations:
Department of Plant Systems Biology, Ghent University, Flanders Interuniversity Institute for Biotechnology, Ghent, Belgium;Department of Plant Systems Biology, Ghent University, Flanders Interuniversity Institute for Biotechnology, Ghent, Belgium
Venue:
KDECB'06 Proceedings of the 1st international conference on Knowledge discovery and emergent complexity in bioinformatics
Year:
2006

Citing 10
Cited 0

A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
The nature of statistical learning theory

The nature of statistical learning theory
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Making large-scale support vector machine learning practical

Advances in kernel methods
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
An introduction to variable and feature selection

The Journal of Machine Learning Research
Digging into acceptor splice site prediction: an iterative feature selection approach

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Comparison of various algorithms for recognizing short coding sequences of human genes

Bioinformatics
TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders

Bioinformatics
Combined 5 × 2 cv F Test for Comparing Supervised Classification Learning Algorithms

Neural Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The identification of coding potential in DNA sequences is of major importance in bioinformatics, where it is often used to assist expert systems that automatically try to recognize genes in genomes. For longer sequences, the identification of coding potential tends to be easier due to a better signal-to-noise ratio, whereas for very short sequences the issue becomes more problematic. In this paper, we present new methods that specifically aim at a better prediction of coding potential in short sequences. To this end, we combine different, complementary sequence features together with a feature selection strategy. Results comparing the new classifiers to state of the art models show that our new approach significantly outperforms the existing methods when applied to short sequences.