A new approach for gene prediction using comparative sequence analysis

  • Authors:
  • Rong Chen;Hesham H. Ali

  • Affiliations:
  • University of Nebraska at Omaha, Omaha, NE;University of Nebraska at Omaha, Omaha, NE

  • Venue:
  • Proceedings of the 2005 ACM symposium on Applied computing
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

The availability of large fragments of genomic DNA makes it possible to apply comparative genomics for identification of protein-coding regions. In this work, a comparative analysis is conducted on homologous genomic sequences of organisms with different evolutionary distances and the conservation of the non-coding regions between closely related organisms is found. In contrast, more distance shows much less intron similarity but less conservation on the exon structures. This study sought to illuminate the impact of evolutionary distances on the performance of the proposed gene-finding program based on the cross-species sequence comparison. Base on the finding from comparative study and training of data sets, we proposed a model by which coding sequence could be identified by comparing sequences of multiple species, both close and approximately distant. The reliability of the proposed method is evaluated in terms of sensitivity and specificity, and results are compared to those obtained by other popular gene prediction programs. Provided sequences can be found from other species at appropriate evolutionary distances, this approach could be applied in newly sequenced organisms where no species-dependent statistical models are available.