Indexing and Retrieval for Genomic Databases

Authors:
H. E. Williams;Justin Zobel
Affiliations:
-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2002

Citing 8
Cited 33

Automatic text processing

Automatic text processing
Finding approximate matches in large lexicons

Software—Practice & Experience
Self-indexing inverted files for fast text retrieval

ACM Transactions on Information Systems (TOIS)
Managing Gigabytes: Compressing and Indexing Documents and Images

Managing Gigabytes: Compressing and Indexing Documents and Images
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Indexing Nucleotide Databases for Fast Query Evaluation

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
FLASH: A Fast Look-Up Algorithm for String Homology

Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology
Searching Large Lexicons for Partially Specified Terms using Compressed Inverted Files

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases

Burst tries: a fast, efficient data structure for string keys

ACM Transactions on Information Systems (TOIS)
Indexing Genomic Databases for Fast Homology Searching

DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
Genomic information retrieval

ADC '03 Proceedings of the 14th Australasian database conference - Volume 17
BLAST++: a tool for BLASTing queries in batches

APBC '03 Proceedings of the First Asia-Pacific bioinformatics conference on Bioinformatics 2003 - Volume 19
Efficient single-pass index construction for text databases

Journal of the American Society for Information Science and Technology
A seriate coverage filtration approach for homology search

Proceedings of the 2004 ACM symposium on Applied computing
Variable-length intervals in homology search

APBC '04 Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29
DASH: Localising Dynamic Programming for Order of Magnitude Faster, Accurate Sequence Alignment

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Piers: an efficient model for similarity search in DNA sequence databases

ACM SIGMOD Record
n-gram/2L: a space and time efficient two-level n-gram inverted index structure

VLDB '05 Proceedings of the 31st international conference on Very large data bases
An efficient approach for sequence matching in large DNA databases

Journal of Information Science
Unique-order interpolative coding for fast querying and space-efficient indexing in information retrieval systems

Information Processing and Management: an International Journal
Fast query evaluation through document identifier assignment for inverted file-based information retrieval systems

Information Processing and Management: an International Journal
Survey on index based homology search algorithms

The Journal of Supercomputing
Searching on the secondary structure of protein sequences

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Comparing Compressed Sequences for Faster Nucleotide BLAST Searches

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
The ND-tree: a dynamic indexing technique for multidimensional non-ordered discrete data spaces

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
OASIS: an online and accurate technique for local-alignment searches on biological sequences

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A case study of parallel I/O for biological sequence search on Linux clusters

International Journal of High Performance Computing and Networking
Structural optimization of a full-text n-gram index using relational normalization

The VLDB Journal — The International Journal on Very Large Data Bases
Compressed text indexes: From theory to practice

Journal of Experimental Algorithmics (JEA)
AS-index: a structure for string search using n-grams and algebraic signatures

Proceedings of the 18th ACM conference on Information and knowledge management
Unique-order interpolative coding for fast querying and space-efficient indexing in information retrieval systems

Information Processing and Management: an International Journal
Fast query evaluation through document identifier assignment for inverted file-based information retrieval systems

Information Processing and Management: an International Journal
A practical method for approximate subsequence search in DNA databases

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Brief communication: An efficient similarity search based on indexing in large DNA databases

Computational Biology and Chemistry
A hash trie filter method for approximate string matching in genomic databases

Applied Intelligence
Prefix tree indexing for similarity search and similarity joins on genomic data

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Inverted files versus suffix arrays for locating patterns in primary memory

SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
A novel indexing method for efficient sequence matching in large DNA database environment

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Indexing DNA sequences using q-grams

DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Sequence alignment as a database technology challenge

DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Re-Ordered FEGC and Block Based FEGC for Inverted File Compression

International Journal of Information Retrieval Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Genomic sequence databases are widely used by molecular biologists for homology searching. Amino acid and nucleotide databases are increasing in size exponentially, and mean sequence lengths are also increasing. In searching such databases, it is desirable to use heuristics to perform computationally intensive local alignments on selected sequences and to reduce the costs of the alignments that are attempted. We present an index-based approach for both selecting sequences that display broad similarity to a query and for fast local alignment. We show experimentally that the indexed approach results in significant savings in computationally intensive local alignments and that index-based searching is as accurate as existing exhaustive search schemes.