Continuous queries over append-only databases
SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
Tapes hold data, too: challenges of tuples on tertiary store
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Why decision support fails and how to fix it
ACM SIGMOD Record
Improved query performance with variant indexes
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Simultaneous optimization and evaluation of multiple dimensional queries
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The PanQ tool and EMF SQL for complex data management
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
StorHouse metanoia - new applications for database, storage & data warehousing
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Continuous queries over data streams
ACM SIGMOD Record
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Groupwise Processing of Relational Queries
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Including Group-By in Query Optimization
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Eager Aggregation and Lazy Aggregation
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Query Processing in Tertiary Memory Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Querying Multiple Features of Groups in Relational Databases
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Reordering Query Execution in Tertiary Memory Databases
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
The Design and Implementation of a Sequence Database System
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Evaluation of Ad Hoc OLAP: In-Place Computation
SSDBM '99 Proceedings of the 11th International Conference on Scientific and Statistical Database Management
Relational Joins for Data on Tertiary Storage
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Ad Hoc OLAP: Expression and Evaluation
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Using grouping variables to express complex decision support queries
Data & Knowledge Engineering
Two-phase data warehouse optimized for data mining
BIRTE'06 Proceedings of the 1st international conference on Business intelligence for the real-time enterprises
Hi-index | 0.00 |
Data warehouses collect masses of operational data, allowing analysts to extract information by issuing decision support queries on the otherwise discarded data. In many application areas (e.g. telecommunications), the warehoused data sets are multiple terabytes in size. Parts of these data sets are stored on very large disk arrays, while the remainder is stored on tape-based tertiary storage (which is one to two orders of magnitude less expensive than on-line storage). However, the inherently sequential nature of access to tape-based tertiary storage makes the efficient access to tape-resident data difficult to accomplish through conventional databases.In this paper, we present a way to make access to a massive tape-resident data warehouse easy and efficient. Ad hoc decision support queries usually involve large scale and complex aggregation over the detail data. These queries are difficult to express in SQL, and frequently require self-joins on the detail data (which are prohibitively expensive on the disk-resident data and infeasible to compute on tape-resident data), or unnecessary multiple passes through the detail data. An extension to SQL, the extended multi feature SQL (EMF SQL) expresses complex aggregation computations in a clear manner without using self-joins. The detail data in a data warehouse usually represents a record of past activities, and therefore is temporal. We show that complex queries involving sequences can be easily expressed in EMF SQL. An EMF SQL query can be optimized to minimize the number of passes through the detail data required to evaluate the query, in many cases to only one pass. We describe an efficient query evaluation algorithm along with a query optimization algorithm that minimizes the number of passes through the detail data, and which minimizes the amount of main memory required to evaluate the query. These algorithms are useful not only in the context of tape-resident data warehouses but also in data stream systems which require similar processing techniques.