Automatic text processing
Finding approximate matches in large lexicons
Software—Practice & Experience
Self-indexing inverted files for fast text retrieval
ACM Transactions on Information Systems (TOIS)
Managing Gigabytes: Compressing and Indexing Documents and Images
Managing Gigabytes: Compressing and Indexing Documents and Images
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Indexing Nucleotide Databases for Fast Query Evaluation
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
FLASH: A Fast Look-Up Algorithm for String Homology
Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology
Searching Large Lexicons for Partially Specified Terms using Compressed Inverted Files
VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Burst tries: a fast, efficient data structure for string keys
ACM Transactions on Information Systems (TOIS)
Indexing Genomic Databases for Fast Homology Searching
DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
ADC '03 Proceedings of the 14th Australasian database conference - Volume 17
BLAST++: a tool for BLASTing queries in batches
APBC '03 Proceedings of the First Asia-Pacific bioinformatics conference on Bioinformatics 2003 - Volume 19
Efficient single-pass index construction for text databases
Journal of the American Society for Information Science and Technology
A seriate coverage filtration approach for homology search
Proceedings of the 2004 ACM symposium on Applied computing
Variable-length intervals in homology search
APBC '04 Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29
DASH: Localising Dynamic Programming for Order of Magnitude Faster, Accurate Sequence Alignment
CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
n-gram/2L: a space and time efficient two-level n-gram inverted index structure
VLDB '05 Proceedings of the 31st international conference on Very large data bases
An efficient approach for sequence matching in large DNA databases
Journal of Information Science
Information Processing and Management: an International Journal
Information Processing and Management: an International Journal
Survey on index based homology search algorithms
The Journal of Supercomputing
Searching on the secondary structure of protein sequences
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Comparing Compressed Sequences for Faster Nucleotide BLAST Searches
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
The ND-tree: a dynamic indexing technique for multidimensional non-ordered discrete data spaces
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
OASIS: an online and accurate technique for local-alignment searches on biological sequences
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A case study of parallel I/O for biological sequence search on Linux clusters
International Journal of High Performance Computing and Networking
Structural optimization of a full-text n-gram index using relational normalization
The VLDB Journal — The International Journal on Very Large Data Bases
Compressed text indexes: From theory to practice
Journal of Experimental Algorithmics (JEA)
AS-index: a structure for string search using n-grams and algebraic signatures
Proceedings of the 18th ACM conference on Information and knowledge management
Information Processing and Management: an International Journal
Information Processing and Management: an International Journal
A practical method for approximate subsequence search in DNA databases
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Brief communication: An efficient similarity search based on indexing in large DNA databases
Computational Biology and Chemistry
A hash trie filter method for approximate string matching in genomic databases
Applied Intelligence
Prefix tree indexing for similarity search and similarity joins on genomic data
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Inverted files versus suffix arrays for locating patterns in primary memory
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
A novel indexing method for efficient sequence matching in large DNA database environment
PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Indexing DNA sequences using q-grams
DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Sequence alignment as a database technology challenge
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Re-Ordered FEGC and Block Based FEGC for Inverted File Compression
International Journal of Information Retrieval Research
Hi-index | 0.00 |
Genomic sequence databases are widely used by molecular biologists for homology searching. Amino acid and nucleotide databases are increasing in size exponentially, and mean sequence lengths are also increasing. In searching such databases, it is desirable to use heuristics to perform computationally intensive local alignments on selected sequences and to reduce the costs of the alignments that are attempted. We present an index-based approach for both selecting sequences that display broad similarity to a query and for fast local alignment. We show experimentally that the indexed approach results in significant savings in computationally intensive local alignments and that index-based searching is as accurate as existing exhaustive search schemes.