Comparative gene prediction based on gene structure conservation

  • Authors:
  • Shu Ju Hsieh;Chun Yuan Lin;Ning Han Liu;Chuan Yi Tang

  • Affiliations:
  • Department of Computer Science;Institute of Molecular and Cellular Biology, National Tsing-Hua University, Hsinchu, Taiwan, ROC;Department of Computer Science;Department of Computer Science

  • Venue:
  • PRIB'06 Proceedings of the 2006 international conference on Pattern Recognition in Bioinformatics
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Identifying protein coding genes is one of most important task in newly sequenced genomes. With increasing numbers of gene annotations verified by experiments, it is feasible to identify genes in newly sequenced genomes by comparing with genes annotated on phylogenetically close organisms. Here, we propose a program, GeneAlign, which predicts the genes on one sequence by measuring the similarity between the predicted sequence and related genes annotated on another genome. The program applies CORAL, a heuristic linear time alignment tool, to determine whether the regions flanked by candidate signals are similar with the annotated exons or not. The approach, which employs the conservation of gene structures and sequence homologies between protein coding regions, increases the prediction accuracy. GeneAlign was tested on Projector data set of 449 human-mouse homologous sequence pairs. At the gene level, the sensitivity and specificity of GeneAlign are 80%, and larger than 96% at the exon level.