Equi-depth multidimensional histograms
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Shoring up persistent applications
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Balancing histogram optimality and practicality for query result size estimation
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Combining fuzzy information from multiple systems (extended abstract)
PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
System R: relational approach to database management
ACM Transactions on Database Systems (TODS)
Optimal aggregation algorithms for middleware
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Introduction to the Theory of Computation
Introduction to the Theory of Computation
Access path selection in a relational database management system
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Indexing and Retrieval for Genomic Databases
IEEE Transactions on Knowledge and Data Engineering
Multi-Dimensional Substring Selectivity Estimation
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Efficient Index Structures for String Databases
Proceedings of the 27th International Conference on Very Large Data Bases
A Database Index to Large Biological Sequences
Proceedings of the 27th International Conference on Very Large Data Bases
Supporting Incremental Join Queries on Ranked Inputs
Proceedings of the 27th International Conference on Very Large Data Bases
Universality of Serial Histograms
VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Fast Retrieval of Similar Subsequences in Long Sequence Databases
KDEX '99 Proceedings of the 1999 Workshop on Knowledge and Data Engineering Exchange
Distance-function design and fusion for sequence data
Proceedings of the thirteenth ACM international conference on Information and knowledge management
A platform based on the multi-dimensional data modal for analysis of bio-molecular structures
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Towards Efficient Searching on the Secondary Structure of Protein Sequences
Fundamenta Informaticae - Special issue ISMIS'05
ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
Towards Efficient Searching on the Secondary Structure of Protein Sequences
Fundamenta Informaticae - Special issue ISMIS'05
Hi-index | 0.00 |
In spite of the many decades of progress in database research, surprisingly scientists in the life sciences community still struggle with inefficient and awkward tools for querying biological data sets. This work highlights a specific problem involving searching large volumes of protein data sets based on their secondary structure. In this paper we define an intuitive query language that can be used to express queries on secondary structure and develop several algorithms for evaluating these queries. We implement these algorithms both in Periscope, a native system that we have built, and in a commercial ORDBMS. We show that the choice of algorithms can have a significant impact on query performance. As part of the Periscope implementation we have also developed a framework for optimizing these queries and for accurately estimating the costs of the various query evaluation plans. Our performance studies show that the proposed techniques are very efficient in the Periscope system and can provide scientists with interactive secondary structure querying options even on large protein data sets.