Genomic information retrieval

Authors:
Hugh E. Williams
Affiliations:
School of Computer Science and Information Technology, RMIT University, GPO Box 2476V, Melbourne
Venue:
ADC '03 Proceedings of the 14th Australasian database conference - Volume 17
Year:
2003

Citing 9
Cited 6

Finding approximate matches in large lexicons

Software—Practice & Experience
Self-indexing inverted files for fast text retrieval

ACM Transactions on Information Systems (TOIS)
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Indexing and Retrieval for Genomic Databases

IEEE Transactions on Knowledge and Data Engineering
FLASH: A Fast Look-Up Algorithm for String Homology

Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology
Searching Large Lexicons for Partially Specified Terms using Compressed Inverted Files

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Introduction to Bioinformatics

Introduction to Bioinformatics

Variable-length intervals in homology search

APBC '04 Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29
n-gram/2L: a space and time efficient two-level n-gram inverted index structure

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Towards Efficient Searching on the Secondary Structure of Protein Sequences

Fundamenta Informaticae - Special issue ISMIS'05
Structural optimization of a full-text n-gram index using relational normalization

The VLDB Journal — The International Journal on Very Large Data Bases
CSI: clustered segment indexing for efficient approximate searching on the secondary structure of protein sequences

ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
Towards Efficient Searching on the Secondary Structure of Protein Sequences

Fundamenta Informaticae - Special issue ISMIS'05

Quantified Score

Hi-index	0.00

Visualization

Abstract

The in-silico revolution has changed how biologists characterise DNA and protein sequences. As a first step to exploring the structure and function of an unknown sequence, biologists search large genomic databases for similar sequences. This process of genomic information retrieval has allowed significant advances in biology and led to advancements in critical areas such as cancer research. In this paper, we present a background to genomic information retrieval by describing the problems, collections, and techniques used by biologists for searching large collections. In particular, we identify the problems inherent in the popular search techniques, and discuss how index-based approaches may be applied to solve these problems. We conclude by offering the challenge that information retrieval specialists must continue to make significant contributions to allow further advances in molecular biology research.