Similar_Join: extending DBMS with a bio-specific operator

  • Authors:
  • Jake Yue Chen;John V. Carlis

  • Affiliations:
  • Myriad Proteomics, Inc., Salt Lake City, UT;University of Minnesota, Minneapolis, MN

  • Venue:
  • Proceedings of the 2003 ACM symposium on Applied computing
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Existing sequence comparison software applications lack adequate automation, abstraction, performance, and flexibility. Users need a new way of studying and applying sequence comparisons in the post-genomics era. We invented and developed a new bio-specific Database Management System (DBMS) operator, Similar_Join, to abstract the labor-intensive batch sequence similarity search task into a syntactically concise database operation. We implemented the Similar_Join operator as part of a relational operator package. This implementation enabled us to write simple PL/SQL scripts within the DBMS to accomplish routine sequence similarity searches conveniently, for example, a "batch BLAST" that compares 7,000 human genes against 500,000 human Expressed Sequence Tags (EST) in a few hours. We also implemented a simple version of Similar_Join as a database operator in the extended data cartridge of Oracle 8i object-relational DBMS. When fully integrated into SQL language extensions, we demonstrated this operator could enable biology users to achieve interesting complex biological queries previously impossible inside the DBMS.