Decision support queries on a tape-resident data warehouse

Authors:
Damianos Chatziantoniou;Theodore Johnson
Affiliations:
Department of Management Science and Technology, Athens University of Economics and Business, Evelpidon 47 A, Lefkados, Athens 11362, Greece;AT&T Research Labs, 180 Park Ave. P.O. Box 971, Florham Park, NJ
Venue:
Information Systems
Year:
2005

Citing 22
Cited 2

Continuous queries over append-only databases

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
Tapes hold data, too: challenges of tuples on tertiary store

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Why decision support fails and how to fix it

ACM SIGMOD Record
Improved query performance with variant indexes

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The five-minute rule ten years later, and other computer storage rules of thumb

ACM SIGMOD Record
Simultaneous optimization and evaluation of multiple dimensional queries

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The PanQ tool and EMF SQL for complex data management

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
StorHouse metanoia - new applications for database, storage & data warehousing

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Continuous queries over data streams

ACM SIGMOD Record
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Groupwise Processing of Relational Queries

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Including Group-By in Query Optimization

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Eager Aggregation and Lazy Aggregation

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Query Processing in Tertiary Memory Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Querying Multiple Features of Groups in Relational Databases

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Reordering Query Execution in Tertiary Memory Databases

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
The Design and Implementation of a Sequence Database System

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Evaluation of Ad Hoc OLAP: In-Place Computation

SSDBM '99 Proceedings of the 11th International Conference on Scientific and Statistical Database Management
Relational Joins for Data on Tertiary Storage

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Ad Hoc OLAP: Expression and Evaluation

ICDE '99 Proceedings of the 15th International Conference on Data Engineering

Using grouping variables to express complex decision support queries

Data & Knowledge Engineering
Two-phase data warehouse optimized for data mining

BIRTE'06 Proceedings of the 1st international conference on Business intelligence for the real-time enterprises

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data warehouses collect masses of operational data, allowing analysts to extract information by issuing decision support queries on the otherwise discarded data. In many application areas (e.g. telecommunications), the warehoused data sets are multiple terabytes in size. Parts of these data sets are stored on very large disk arrays, while the remainder is stored on tape-based tertiary storage (which is one to two orders of magnitude less expensive than on-line storage). However, the inherently sequential nature of access to tape-based tertiary storage makes the efficient access to tape-resident data difficult to accomplish through conventional databases.In this paper, we present a way to make access to a massive tape-resident data warehouse easy and efficient. Ad hoc decision support queries usually involve large scale and complex aggregation over the detail data. These queries are difficult to express in SQL, and frequently require self-joins on the detail data (which are prohibitively expensive on the disk-resident data and infeasible to compute on tape-resident data), or unnecessary multiple passes through the detail data. An extension to SQL, the extended multi feature SQL (EMF SQL) expresses complex aggregation computations in a clear manner without using self-joins. The detail data in a data warehouse usually represents a record of past activities, and therefore is temporal. We show that complex queries involving sequences can be easily expressed in EMF SQL. An EMF SQL query can be optimized to minimize the number of passes through the detail data required to evaluate the query, in many cases to only one pass. We describe an efficient query evaluation algorithm along with a query optimization algorithm that minimizes the number of passes through the detail data, and which minimizes the amount of main memory required to evaluate the query. These algorithms are useful not only in the context of tape-resident data warehouses but also in data stream systems which require similar processing techniques.