A high-throughput distributed DNA sequence analysis and database system

Authors:
J. T. Inman;H. Raul Flores;G. D. May;J. W. Weller;C. J. Bell
Affiliations:
National Center for Genome Resources, 2935 Rodeo Park Drive East, Santa Fe, New Mexico;Netvoice Technologies Corporation, 3201 West Royal Lane, Suite 160, Irving, Texas;Samuel Roberts Noble Foundation, 2510 Sam Noble Parkway, Ardmore, Oklahoma;Virginia Bioinformatics Institute, 1750 Kraft Drive, Suite 1400, Virginia Polytechnic Institute, Blacksburg, Virginia;National Center for Genome Resources, 2935 Rodeo Park Drive East, Santa Fe, New Mexico
Venue:
IBM Systems Journal - Deep computing for the life sciences
Year:
2001

Citing 2
Cited 0

MULTILISP: a language for concurrent symbolic computation

ACM Transactions on Programming Languages and Systems (TOPLAS)
SEALS: A System for Easy Analysis of Lots of Sequences

Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology

Quantified Score

Hi-index	0.00

Visualization

Abstract

The National Center for Genome Resources (NCGR) has developed a high-throughput DNA (deoxyribonucleic acid) sequence analysis pipeline, which allows researchers at remote sites to submit biological sequence information for rapid analysis, the results of which can be queried through a Web interface. Behind the browser interface is a relational database used to manage both the raw data and the results of the different analyses performed, and a server, which performs those analyses. The system allows multiple contributors to submit data and also allows the data to be marked as "private" or as available to the general public. The CPU-intensive part of the processing is done on a 40- processor domain of a Sun Enterprise 10000 computer, which is represented by a distributed system of software objects, implemented in CORBATM (Common Object Request Broker ArchitectureTM). In this paper we discuss the architecture of the pipeline, the database support, types of DNA sequence analysis used, the distributed analysis system, and the capabilities of the Web interface. As a case study, we present data from an ongoing collaborative project in which expressed sequence tags (ESTs) from Medicago truncatula are being processed. M. truncatula is a plant that is used as a research model for crops in the legume family, an economically important group of food and forage plants.