Indexing genomic sequence libraries

Authors:
Kevin C. O'Kane;Matthew J. Lockner
Affiliations:
Department of Computer Science, The University of Northern Iowa, Cedar Falls, IA;Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA
Venue:
Information Processing and Management: an International Journal
Year:
2005

Citing 12
Cited 0

An algorithm for the calculation of exact term discrimination values

Information Processing and Management: an International Journal
PLEXUS-The expert system for referral

Information Processing and Management: an International Journal - Expert systems and library information science
CANSEARCH: An expert systems approach to document retrieval

Information Processing and Management: an International Journal - Expert systems and library information science
A reference and referral system using expert system techniques

Journal of Documentation
An analysis of approximate versus exact discrimination values

Information Processing and Management: an International Journal
An improved algorithm for the calculation of exact term discrimination values

Information Processing and Management: an International Journal
INSTRUCT: a teaching package for experimental methods in information retrieval. Part 111. Browsing, clustering and query

Program
Comparison of hierarchic agglomerative clustering methods for document retrieval

The Computer Journal
Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Automatic Information Organization and Retrieval.

Automatic Information Organization and Retrieval.
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes an extensible, open-source (GPL) data repository and retrieval system that supports fast, efficient, keyword based retrieval of genomic sequences from multiple libraries with retrieved sequences post-processed by FASTA, Smith-Waterman and other analysis software. This application is implemented for Linux and is written in Mumps, C, and C++ with supporting components that include the Berkeley Data Base, the Perl Compatible Regular Expression Library, GLADE, and tools such as FASTA, Smith-Waterman, and modules from EMBOSS. The package described here can quickly index data sets of up to 256 terabytes using a B-tree based multi-dimensional data model. An example is presented that indexes the text of the full NCBI Genbank library.