Structure-based querying of proteins using wavelets
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Periscope/SQ: interactive exploration of biological sequence databases
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
Hi-index | 0.00 |
Life science researchers frequently need to query large protein data sets in a variety of different ways. Protein data sets have a rich structure that includes its primary structure, which is described as a sequence of amino acids, and its secondary structure, which is described as a sequence of folding patterns of the protein. Both these structures are important as the amino acid sequence is often used to find homologous proteins, and the secondary structure can produce important hints about the functionality of proteins. While there are tools for querying each of these structures independently, there are no tools for declarative querying on both these structures. Even the tools that allow querying on either one of these structures are not based on any formal algebra, and as a result require complex rewriting of the tools programming logic when the ihquery evaluation planla changes. This paper introduces PiQA, a Protein Query Algebra, which provides a rich set of algebraic operations on both the primary and secondary structure of proteins. Using PiQA one can pose several interesting complex queries involving both the primary and the secondary structure of proteins. In addition, simple existing tools that query only on the primary structure, such as BLAST, can also be expressed in this algebra. PiQA is an important first step in developing an algebra that can form the basis of a declarative querying language for querying protein data sets.