Communications of the ACM - Special issue on information filtering
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Ripple joins for online aggregation
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
DynaMat: a dynamic view management system for data warehouses
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Efficient and extensible algorithms for multi query optimization
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Congressional samples for approximate answering of group-by queries
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Materialized view selection and maintenance using multi-query optimization
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
A scalable hash ripple join algorithm
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Overcoming Limitations of Sampling for Aggregation Queries
Proceedings of the 17th International Conference on Data Engineering
Large-Sample and Deterministic Confidence Intervals for Online Aggregation
SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
pCube: Update-Efficient Online Aggregation with Progressive Feedback and Error Bounds
SSDBM '00 Proceedings of the 12th International Conference on Scientific and Statistical Database Management
Dynamic sample selection for approximate query processing
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Online maintenance of very large random samples
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
A disk-based join with probabilistic guarantees
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Communications of the ACM - A Blind Person's Interaction with Technology
A scalable, predictable join operator for highly concurrent data warehouses
Proceedings of the VLDB Endowment
Distributed online aggregations
Proceedings of the VLDB Endowment
Improving online aggregation performance for skewed data distribution
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Approximate answers to OLAP queries on streaming data warehouses
Proceedings of the fifteenth international workshop on Data warehousing and OLAP
Distributed data management using MapReduce
ACM Computing Surveys (CSUR)
Hi-index | 0.01 |
In this paper, we propose an online aggregation system called COSMOS (Continuous Sampling for Multiple queries in an Online aggregation System), to process multiple aggregate queries efficiently. In COSMOS, a dataset is first scrambled so that sequentially scanning the dataset gives rise to a stream of random samples for all queries. Moreover, COSMOS organizes queries into a dissemination graph to exploit the dependencies across queries. In this way, aggregates of queries closer to the root (source of data flow) can potentially be used to compute the aggregates of descendent/dependent queries. COSMOS applies some statistical approach to combine answers from ancestor nodes to generate the online aggregates for a node. COSMOS also offers a partitioning strategy to further salvage intermediate answers. We have implemented COSMOS and conducted an extensive experimental study in PostgreSQL. Our results on the TPC-H benchmark show the efficiency and effectiveness of COSMOS.