Computational identification of protein-coding sequences by comparative analysis

  • Authors:
  • Arnaud Fontaine;Helene Touzet

  • Affiliations:
  • LIFL ‐ UMR CNRS 8022 University Lille 1, INRIA Lille Nord Europe, France.;LIFL ‐ UMR CNRS 8022 University Lille 1, INRIA Lille Nord Europe, France

  • Venue:
  • International Journal of Data Mining and Bioinformatics
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Gene prediction is an essential step in understanding the genome of a species once it has been sequenced. For that, a promising direction in current research on gene finding is a comparative genomics approach. In this paper, we present a novel approach to identifying evolutionarily conserved protein-coding sequences in genomes. The method takes advantage of the specific substitution pattern of coding sequences together with the consistency of reading frames. It has been implemented in a software called PROTEA. Large-scale experimentation shows good results. PROTEA is intended to be a useful complement to existing tools based on homology search or statistical properties of the sequences.