A new approach to text searching
Communications of the ACM
Approximate Boyer-Moore string matching
SIAM Journal on Computing
Fast string matching with mismatches
Information and Computation
Fast and practical approximate string matching
Information Processing Letters
A fast string searching algorithm
Communications of the ACM
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
We propose a web-based software system for sequence acquisition and database construction. An example application of this system is to construct a ribosomal RNA gene (rDNA) sequence database to facilitate the study of microbial communities. A fast and accurate approximate string matching algorithm is implemented to fetch rDNA sequences sandwiched by two given primers from GenBank. A homology search algorithm based on Basic-Local-Alignment-Search-Tool (BLAST) is then used to extract rDNA sequences that do not contain the primers. This two step process leads to an rDNA sequence database for a specific taxonomic group. We consider the distance between the occurrences of the two given primers, mismatches and degeneracy when performing string matching. In the homology search, a chaining algorithm is combined with BLAST to obtain global alignments based on local alignments. This system can be used in many biological applications.