Introduction to algorithms
Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology
Dynamic Programming
Hi-index | 0.00 |
In this paper, we describe the basis of EUGÈNE, a gene finder for eukaryotic organisms applied to Arabidopsis thaliana. The specificity of EuGène, compared to existing gene finding software, is that EUGÈNE has been designed to combine the output of several information sources, including output of other software or user information. To achieve this, a weighted directed acyclic graph (DAG) is built in such a way that a shortest feasible path in this graph represents the most likely gene structure of the underlying DNA sequence. The usual simple Bellman linear time shortest path algorithm for DAG has been replaced by a shortest path with constraints algorithm. The constraints express minimum length of introns or intergenic regions. The specificity of the constraints leads to an algorithm which is still linear both in time and space. EUGÈNE effectiveness has been assessed on Araset, a recent dataset of Arabidopsis thaliana sequences used to evaluate several existing gene finding software. It appears that, despite its simplicity, EUGÈNE gives results which compare very favourably to existing software. We try to analyse the reasons of these results.