Enhancing coding potential prediction for short sequences using complementary sequence features and feature selection

  • Authors:
  • Yvan Saeys;Yves Van De Peer

  • Affiliations:
  • Department of Plant Systems Biology, Ghent University, Flanders Interuniversity Institute for Biotechnology, Ghent, Belgium;Department of Plant Systems Biology, Ghent University, Flanders Interuniversity Institute for Biotechnology, Ghent, Belgium

  • Venue:
  • KDECB'06 Proceedings of the 1st international conference on Knowledge discovery and emergent complexity in bioinformatics
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The identification of coding potential in DNA sequences is of major importance in bioinformatics, where it is often used to assist expert systems that automatically try to recognize genes in genomes. For longer sequences, the identification of coding potential tends to be easier due to a better signal-to-noise ratio, whereas for very short sequences the issue becomes more problematic. In this paper, we present new methods that specifically aim at a better prediction of coding potential in short sequences. To this end, we combine different, complementary sequence features together with a feature selection strategy. Results comparing the new classifiers to state of the art models show that our new approach significantly outperforms the existing methods when applied to short sequences.