PiQA: an algebra for querying protein data sets

Authors:
Sandeep Tata;Jignesh M. Patel
Affiliations:
University of Michigan;University of Michigan
Venue:
SSDBM '03 Proceedings of the 15th International Conference on Scientific and Statistical Database Management
Year:
2003

Citing 0
Cited 4

Structure-based querying of proteins using wavelets

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Periscope/SQ: interactive exploration of biological sequence databases

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Covariant Evolutionary Event Analysis for Base Interaction Prediction Using a Relational Database Management System for RNA

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Relational operators for prioritizing candidate biomarkers in high-throughput differential expression data

Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Life science researchers frequently need to query large protein data sets in a variety of different ways. Protein data sets have a rich structure that includes its primary structure, which is described as a sequence of amino acids, and its secondary structure, which is described as a sequence of folding patterns of the protein. Both these structures are important as the amino acid sequence is often used to find homologous proteins, and the secondary structure can produce important hints about the functionality of proteins. While there are tools for querying each of these structures independently, there are no tools for declarative querying on both these structures. Even the tools that allow querying on either one of these structures are not based on any formal algebra, and as a result require complex rewriting of the tools programming logic when the ihquery evaluation planla changes. This paper introduces PiQA, a Protein Query Algebra, which provides a rich set of algebraic operations on both the primary and secondary structure of proteins. Using PiQA one can pose several interesting complex queries involving both the primary and the secondary structure of proteins. In addition, simple existing tools that query only on the primary structure, such as BLAST, can also be expressed in this algebra. PiQA is an important first step in developing an algebra that can form the basis of a declarative querying language for querying protein data sets.