Finding approximate matches in large lexicons
Software—Practice & Experience
Self-indexing inverted files for fast text retrieval
ACM Transactions on Information Systems (TOIS)
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Indexing and Retrieval for Genomic Databases
IEEE Transactions on Knowledge and Data Engineering
FLASH: A Fast Look-Up Algorithm for String Homology
Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology
Searching Large Lexicons for Partially Specified Terms using Compressed Inverted Files
VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Introduction to Bioinformatics
Introduction to Bioinformatics
Variable-length intervals in homology search
APBC '04 Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29
n-gram/2L: a space and time efficient two-level n-gram inverted index structure
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Towards Efficient Searching on the Secondary Structure of Protein Sequences
Fundamenta Informaticae - Special issue ISMIS'05
Structural optimization of a full-text n-gram index using relational normalization
The VLDB Journal — The International Journal on Very Large Data Bases
ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
Towards Efficient Searching on the Secondary Structure of Protein Sequences
Fundamenta Informaticae - Special issue ISMIS'05
Hi-index | 0.00 |
The in-silico revolution has changed how biologists characterise DNA and protein sequences. As a first step to exploring the structure and function of an unknown sequence, biologists search large genomic databases for similar sequences. This process of genomic information retrieval has allowed significant advances in biology and led to advancements in critical areas such as cancer research. In this paper, we present a background to genomic information retrieval by describing the problems, collections, and techniques used by biologists for searching large collections. In particular, we identify the problems inherent in the popular search techniques, and discuss how index-based approaches may be applied to solve these problems. We conclude by offering the challenge that information retrieval specialists must continue to make significant contributions to allow further advances in molecular biology research.