Similar_Join: extending DBMS with a bio-specific operator

Authors:
Jake Yue Chen;John V. Carlis
Affiliations:
Myriad Proteomics, Inc., Salt Lake City, UT;University of Minnesota, Minneapolis, MN
Venue:
Proceedings of the 2003 ACM symposium on Applied computing
Year:
2003

Citing 8
Cited 0

MATCH—a new high-level relational operator for pattern matching

Communications of the ACM
HAS, a Relational Algebra Operator or Divide is not Enough to Conquer

Proceedings of the Second International Conference on Data Engineering
Database Search Based on Bayesian Alignment

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
A Sequence Similarity Search Algorithm Based on a Probabilistic Interpretation of an Alignment Scoring System

Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology
Genomic data modeling

Information Systems - Special issue: Data management in bioinformatics
Visualization of Biological Sequence Similarity Search Results

VIS '95 Proceedings of the 6th conference on Visualization '95
A bioinformatics discovery-oriented computing framework

A bioinformatics discovery-oriented computing framework
GIS: A Computing Perspective, 2nd Edition

GIS: A Computing Perspective, 2nd Edition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Existing sequence comparison software applications lack adequate automation, abstraction, performance, and flexibility. Users need a new way of studying and applying sequence comparisons in the post-genomics era. We invented and developed a new bio-specific Database Management System (DBMS) operator, Similar_Join, to abstract the labor-intensive batch sequence similarity search task into a syntactically concise database operation. We implemented the Similar_Join operator as part of a relational operator package. This implementation enabled us to write simple PL/SQL scripts within the DBMS to accomplish routine sequence similarity searches conveniently, for example, a "batch BLAST" that compares 7,000 human genes against 500,000 human Expressed Sequence Tags (EST) in a few hours. We also implemented a simple version of Similar_Join as a database operator in the extended data cartridge of Oracle 8i object-relational DBMS. When fully integrated into SQL language extensions, we demonstrated this operator could enable biology users to achieve interesting complex biological queries previously impossible inside the DBMS.