Engineering a software tool for gene structure prediction in higher organisms

  • Authors:
  • Gordon Gremme;Volker Brendel;Michael E. Sparks;Stefan Kurtz

  • Affiliations:
  • Zentrum für Bioinformatik, Universität Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany;Department of Statistics, Iowa State University, Ames, IA 50011-3260, USA and Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011-3260, USA;Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011-3260, USA;Zentrum für Bioinformatik, Universität Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany

  • Venue:
  • Information and Software Technology
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The research area now commonly called 'bioinformatics' has brought together biologists, computer scientists, statisticians, and scientists of many other fields of expertise to work on computational solutions to biological problems. A large number of algorithms and software packages are freely available for many specific tasks, such as sequence alignment, molecular phylogeny reconstruction, or protein structure determination. Rapidly changing needs and demands on data handling capacity challenge the application providers to consistently keep pace. In practice, this has led to many incremental advances and re-writing of code that present the user community with confusing options and a large overhead from non-standardized implementations that need to be integrated into existing work flows. This situation gives much scope for contributions by software engineers. In this article, we describe an example of engineering a software tool for a specific bioinformatics task known as spliced alignment. The problem was motivated by disabling limitations in an original, ad hoc, and yet widely popular implementation by one of the authors. The present collaboration has led to a robust, highly versatile, and extensible tool (named GenomeThreader) that not only overcomes the limitations of the earlier implementation but greatly improves space and time requirements.