Distributed online aggregations

Authors:
Sai Wu;Shouxu Jiang;Beng Chin Ooi;Kian-Lee Tan
Affiliations:
National University of Singapore, Singapore;Harbin Institute of Technology, Harbin, China;National University of Singapore, Singapore;National University of Singapore, Singapore
Venue:
Proceedings of the VLDB Endowment
Year:
2009

Citing 25
Cited 10

Approximate Analysis of Fork/Join Synchronization in Parallel Queues

IEEE Transactions on Computers
LH*—a scalable, distributed data structure

ACM Transactions on Database Systems (TODS)
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Ripple joins for online aggregation

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
On near-uniform URL sampling

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Chord: a scalable peer-to-peer lookup protocol for internet applications

IEEE/ACM Transactions on Networking (TON)
Efficient OLAP query processing in distributed data warehouses

Information Systems - Special issue: Best papers from EDBT 2002
Maintenance of Materialized Views of Sampling Queries

Proceedings of the Eighth International Conference on Data Engineering
Online Feedback for Nested Aggregate Queries with Multi-Threading

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
On-Line Analytical Processing in Distributed Data Warehouses

IDEAS '98 Proceedings of the 1998 International Symposium on Database Engineering & Applications
Spreading the Load Using Consistent Hashing: A Preliminary Report

ISPDC '04 Proceedings of the Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks
The peer sampling service: experimental evaluation of unstructured gossip-based implementations

Proceedings of the 5th ACM/IFIP/USENIX international conference on Middleware
Approximately uniform random sampling in sensor networks

DMSN '04 Proceeedings of the 1st international workshop on Data management for sensor networks: in conjunction with VLDB 2004
A disk-based join with probabilistic guarantees

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Gossip-based aggregation in large dynamic networks

ACM Transactions on Computer Systems (TOCS)
Distributed Uniform Sampling in Unstructured Peer-to-Peer Networks

HICSS '06 Proceedings of the 39th Annual Hawaii International Conference on System Sciences - Volume 09
Approximating Aggregation Queries in Peer-to-Peer Networks

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Random walks in peer-to-peer networks: algorithms and evaluation

Performance Evaluation - P2P computing systems
Uniform Data Sampling from a Peer-to-Peer Network

ICDCS '07 Proceedings of the 27th International Conference on Distributed Computing Systems
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Querying the internet with PIER

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Just-in-time query retrieval over partially indexed data on structured P2P overlays

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Efficient Data Sampling in Heterogeneous Peer-to-Peer Networks

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining

Continuous sampling for online aggregation over multiple queries

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
MapReduce online

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
OLAP query reformulation in peer-to-peer data warehousing

Information Systems
PeerTrack: a platform for tracking and tracing objects in large-scale traceability networks

Proceedings of the 15th International Conference on Extending Database Technology
Improving online aggregation performance for skewed data distribution

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
You can stop early with COLA: online processing of aggregate queries in the cloud

Proceedings of the 21st ACM international conference on Information and knowledge management
A DHT-Based system for the management of loosely structured, multidimensional data

Transactions on Large-Scale Data- and Knowledge-Centered Systems VI
Parallel online aggregation in action

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Processing online aggregation on skewed data in mapreduce

Proceedings of the fifth international workshop on Cloud data management
Sampling estimators for parallel online aggregation

BNCOD'13 Proceedings of the 29th British National conference on Big Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many decision making applications, users typically issue aggregate queries. To evaluate these computationally expensive queries, online aggregation has been developed to provide approximate answers (with their respective confidence intervals) quickly, and to continuously refine the answers. In this paper, we extend the online aggregation technique to a distributed context where sites are maintained in a DHT (Distributed Hash Table) network. Our Distributed Online Aggregation (DoA) scheme iteratively and progressively produces approximate aggregate answers as follows: in each iteration, a small set of random samples are retrieved from the data sites and distributed to the processing sites; at each processing site, a local aggregate is computed based on the allocated samples; at a coordinator site, these local aggregates are combined into a global aggregate. DoA adaptively grows the number of processing nodes as the sample size increases. To further reduce the sampling overhead, the samples are retained as a precomputed synopsis over the network to be used for processing future queries. We also study how these synopsis can be maintained incrementally. We have conducted extensive experiments on PlanetLab. The results show that our DoA scheme reduces the initial waiting time significantly and provides high quality approximate answers with running confidence intervals progressively.