q-gram based database searching using a suffix array (QUASAR)
RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
Indexing and Retrieval for Genomic Databases
IEEE Transactions on Knowledge and Data Engineering
Computing the Threshold for q-Gram Filters
SWAT '02 Proceedings of the 8th Scandinavian Workshop on Algorithm Theory
A Metric Index for Approximate String Matching
LATIN '02 Proceedings of the 5th Latin American Symposium on Theoretical Informatics
Better filtering with gapped q-grams
Fundamenta Informaticae - Special issue on computing patterns in strings
A hash trie filter method for approximate string matching in genomic databases
Applied Intelligence
Hi-index | 0.00 |
The homology search within genomic databases is a fundamental and crucial work in biological knowledge discovery. With exponentially increasing size and access of databases, the issues of efficient retrieval become more essential in bioinformatics. Due to the varieties of biological data, similar sequences are not only under some error tolerance, but are also above some seriate coverage level. In this paper, we propose a seriate coverage filtration approach to extract the homologies from the databases efficiently. Our approach performs a lossless filtration and can be implemented as a preprocess of the existing search heuristics. Our method converts a user's requests for error and seriate coverage levels to some thresholds of interest. Accordingly, we transform the work of homology discovery to a variation of the longest increasing subsequence problem, and design an efficient counterpart algorithm. In the performance test, it is found that our approach has an attractive quality of filtration.