Answering ad hoc aggregate queries from data streams using prefix aggregate trees

Authors:
Moonjung Cho;Jian Pei;Ke Wang
Affiliations:
State University of New York at Buffalo, Department of Computer Science and Engineering, 14260, Buffalo, NY, USA;Simon Fraser University, 8888 University Drive, School of Computing Science, V5A 1S6, Burnaby, BC, Canada;Simon Fraser University, 8888 University Drive, School of Computing Science, V5A 1S6, Burnaby, BC, Canada
Venue:
Knowledge and Information Systems
Year:
2007

Citing 39
Cited 2

Maintaining views incrementally

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Answering queries using views (extended abstract)

PODS '95 Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Research problems in data warehousing

CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
Implementing data cubes efficiently

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
An overview of data warehousing and OLAP technology

ACM SIGMOD Record
Cubetree: organization of and bulk incremental updates on the data cube

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Maintenance of data cubes and summary tables in a warehouse

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
An array-based algorithm for simultaneous multidimensional aggregates

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
On-line warehouse view maintenance

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Quasi-cubes: exploiting approximations in multidimensional databases

ACM SIGMOD Record
Data cube approximation and histograms via wavelets

Proceedings of the seventh international conference on Information and knowledge management
Rewriting aggregate queries using views

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Bottom-up computation of sparse and Iceberg CUBE

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Compressed data cubes for OLAP aggregate query approximation on continuous dimensions

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient computation of Iceberg cubes with complex measures

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
On computing correlated aggregates over continual data streams

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Making views self-maintainable for data warehousing

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Processing complex aggregate queries over data streams

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Dwarf: shrinking the PetaCube

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Continuous queries over data streams

ACM SIGMOD Record
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Fast Computation of Sparse Datacubes

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Temporal Queries in OLAP

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Answering Queries with Aggregation Using Views

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
A simple algorithm for finding frequent elements in streams and bags

ACM Transactions on Database Systems (TODS)
What's hot and what's not: tracking most frequent items dynamically

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Optimizing Selections over Datacubes

SSDBM '00 Proceedings of the 12th International Conference on Scientific and Statistical Database Management
QC-trees: an efficient summary structure for semantic OLAP

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Condensed Cube: An Efficient Approach to Reducing Data Cube Size

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Finding recent frequent itemsets adaptively over online data streams

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Approximate counts and quantiles over sliding windows

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Multi-dimensional regression analysis of time-series data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Quotient cube: how to summarize the semantics of a data cube

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A regression-based temporal pattern mining scheme for data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Finding hierarchical heavy hitters in data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
False positive or false negative: mining frequent itemsets from high speed transactional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Answering linear optimization queries with an approximate stream index

Knowledge and Information Systems
Efficient maintenance of basic statistical functions in data warehouses

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In some business applications such as trading management in financial institutions, it is required to accurately answer ad hoc aggregate queries over data streams. Materializing and incrementally maintaining a full data cube or even its compression or approximation over a data stream is often computationally prohibitive. On the other hand, although previous studies proposed approximate methods for continuous aggregate queries, they cannot provide accurate answers. In this paper, we develop a novel prefix aggregate tree (PAT) structure for online warehousing data streams and answering ad hoc aggregate queries. Often, a data stream can be partitioned into the historical segment, which is stored in a traditional data warehouse, and the transient segment, which can be stored in a PAT to answer ad hoc aggregate queries. The size of a PAT is linear in the size of the transient segment, and only one scan of the data stream is needed to create and incrementally maintain a PAT. Although the query answering using PAT costs more than the case of a fully materialized data cube, the query answering time is still kept linear in the size of the transient segment. Our extensive experimental results on both synthetic and real data sets illustrate the efficiency and the scalability of our design.