Continuous sampling for online aggregation over multiple queries

Authors:
Sai Wu;Beng Chin Ooi;Kian-Lee Tan
Affiliations:
National University of Singapore, Singapore, Singapore;National University of Singapore, Singapore, Singapore;National University of Singapore, Singapore, Singapore
Venue:
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Year:
2010

Citing 17
Cited 3

The Datacycle architecture

Communications of the ACM - Special issue on information filtering
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Ripple joins for online aggregation

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
DynaMat: a dynamic view management system for data warehouses

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Efficient and extensible algorithms for multi query optimization

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Congressional samples for approximate answering of group-by queries

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Materialized view selection and maintenance using multi-query optimization

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
A scalable hash ripple join algorithm

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Overcoming Limitations of Sampling for Aggregation Queries

Proceedings of the 17th International Conference on Data Engineering
Large-Sample and Deterministic Confidence Intervals for Online Aggregation

SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
pCube: Update-Efficient Online Aggregation with Progressive Feedback and Error Bounds

SSDBM '00 Proceedings of the 12th International Conference on Scientific and Statistical Database Management
Dynamic sample selection for approximate query processing

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Online maintenance of very large random samples

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
A disk-based join with probabilistic guarantees

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
The pathologies of big data

Communications of the ACM - A Blind Person's Interaction with Technology
A scalable, predictable join operator for highly concurrent data warehouses

Proceedings of the VLDB Endowment
Distributed online aggregations

Proceedings of the VLDB Endowment

Improving online aggregation performance for skewed data distribution

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Approximate answers to OLAP queries on streaming data warehouses

Proceedings of the fifteenth international workshop on Data warehousing and OLAP
Distributed data management using MapReduce

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper, we propose an online aggregation system called COSMOS (Continuous Sampling for Multiple queries in an Online aggregation System), to process multiple aggregate queries efficiently. In COSMOS, a dataset is first scrambled so that sequentially scanning the dataset gives rise to a stream of random samples for all queries. Moreover, COSMOS organizes queries into a dissemination graph to exploit the dependencies across queries. In this way, aggregates of queries closer to the root (source of data flow) can potentially be used to compute the aggregates of descendent/dependent queries. COSMOS applies some statistical approach to combine answers from ancestor nodes to generate the online aggregates for a node. COSMOS also offers a partitioning strategy to further salvage intermediate answers. We have implemented COSMOS and conducted an extensive experimental study in PostgreSQL. Our results on the TPC-H benchmark show the efficiency and effectiveness of COSMOS.