Orphan gene finding: an exon assembly approach

Authors:
Philippe Blayo;Pierre Rouzé;Marie-France Sagot
Affiliations:
Institut Gaspard-Monge, Université de Marne-la-Vallée 5, bd Descartes, Champs-sur-Marne, 77454-Marne-la-Vallée cedex 2, France;Laboratoire Associé de l'INRA, University of Ghent, Ledeganckstraat 35, 9000-Gent, Belgium;Inria Rhône-Alpes, Laboratoire de Biométrie et Biologie Évolutive, Université Claude Bernard (LyonI), 43, Bd du 11 Novembre 1918, F-69622-Villeurbanne cedex, France
Venue:
Theoretical Computer Science
Year:
2003

Citing 6
Cited 1

A linear space algorithm for computing maximal common subsequences

Communications of the ACM
Spliced Alignment: A New Approach to Gene Recognition

CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
Aligning Coding DNA in the Presence of Frame-Shift Errors

CPM '97 Proceedings of the 8th Annual Symposium on Combinatorial Pattern Matching
Aligning DNA Sequences to Minimize the Change in Protein (Extended Abstract)

CPM '98 Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching
Comparison of Coding DNA

CPM '98 Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching
The Conserved Exon Method for Gene Finding

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology

Gene prediction by syntenic alignment

BSB'05 Proceedings of the 2005 Brazilian conference on Advances in Bioinformatics and Computational Biology

Quantified Score

Hi-index	5.23

Visualization

Abstract

This paper introduces an algorithm for finding eukaryotic genes. It particularly addresses the problem of orphan genes, that is of genes that cannot, based on homology alone, be connected to any known gene family and to which it is therefore not possible to apply traditional gene finding methods. To the best of our knowledge, this is also the first algorithm that attempts to compare in an exact way two DNA sequences that contain both coding (i.e. exonic) and non-coding (i.e. intronic and, possibly, intergenic) parts. The comparison is performed following an algorithmical model of a gene that is as close as possible to the biological one (we consider in this paper the "one ORF, one gene" problem only). A gene is seen as a set of exons that are pieces of an assembly and are not independent. The algorithm is efficient enough: although the constants are higher than for usual sequence comparison, its time complexity is proportional to the product of the sequences lengths while its space complexity scales linearly with the length of the smallest sequence.