Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Fast subsequence matching in time-series databases
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
q-gram based database searching using a suffix array (QUASAR)
RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
Approximate nearest neighbors and sequence comparison with block operations
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Indexing and Retrieval for Genomic Databases
IEEE Transactions on Knowledge and Data Engineering
Efficient Index Structures for String Databases
Proceedings of the 27th International Conference on Very Large Data Bases
A Database Index to Large Biological Sequences
Proceedings of the 27th International Conference on Very Large Data Bases
Effective Indexing and Filtering for Similarity Search in Large Biosequence Databases
BIBE '03 Proceedings of the 3rd IEEE Symposium on BioInformatics and BioEngineering
The ed-tree: an index for large DNA sequence databases
SSDBM '03 Proceedings of the 15th International Conference on Scientific and Statistical Database Management
OASIS: an online and accurate technique for local-alignment searches on biological sequences
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
IEEE Transactions on Parallel and Distributed Systems
Survey on index based homology search algorithms
The Journal of Supercomputing
Brief communication: An efficient similarity search based on indexing in large DNA databases
Computational Biology and Chemistry
Indexing DNA sequences using q-grams
DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
ALAE: accelerating local alignment with affine gap exactly in biosequence databases
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Growing interest in genomic research has resulted in the creation of huge biological sequence databases. In this paper, we present a hash-based pier model for efficient homology search in large DNA sequence databases. In our model, only certain segments in the databases called 'piers' need to be accessed during searches as opposite to other approaches which require a full scan on the biological sequence database. To further improve the search efficiency, the piers are stored in a specially designed hash table which helps to avoid expensive alignment operation. The has table is small enough to reside in main memory, hence avoiding I/O in the search steps. We show theoretically and empirically that the proposed approach can efficiently detect biological sequences that are similar to a query sequence with very high sensitivity.