A software system for gene sequence database construction based on fast approximate string matching

  • Authors:
  • Zheng Liu;James Borneman;Tao Jiang

  • Affiliations:
  • Department of Computer Science and Engineering, University of California, Riverside CA 92521, USA.;Department of Computer Science and Engineering, University of California, Riverside CA 92521, USA.;Department of Computer Science and Engineering, University of California, Riverside CA 92521, USA

  • Venue:
  • International Journal of Bioinformatics Research and Applications
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a web-based software system for sequence acquisition and database construction. An example application of this system is to construct a ribosomal RNA gene (rDNA) sequence database to facilitate the study of microbial communities. A fast and accurate approximate string matching algorithm is implemented to fetch rDNA sequences sandwiched by two given primers from GenBank. A homology search algorithm based on Basic-Local-Alignment-Search-Tool (BLAST) is then used to extract rDNA sequences that do not contain the primers. This two step process leads to an rDNA sequence database for a specific taxonomic group. We consider the distance between the occurrences of the two given primers, mismatches and degeneracy when performing string matching. In the homology search, a chaining algorithm is combined with BLAST to obtain global alignments based on local alignments. This system can be used in many biological applications.