Approximate Analysis of Fork/Join Synchronization in Parallel Queues
IEEE Transactions on Computers
LH*—a scalable, distributed data structure
ACM Transactions on Database Systems (TODS)
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Ripple joins for online aggregation
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Chord: a scalable peer-to-peer lookup protocol for internet applications
IEEE/ACM Transactions on Networking (TON)
Efficient OLAP query processing in distributed data warehouses
Information Systems - Special issue: Best papers from EDBT 2002
Maintenance of Materialized Views of Sampling Queries
Proceedings of the Eighth International Conference on Data Engineering
Online Feedback for Nested Aggregate Queries with Multi-Threading
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
On-Line Analytical Processing in Distributed Data Warehouses
IDEAS '98 Proceedings of the 1998 International Symposium on Database Engineering & Applications
Spreading the Load Using Consistent Hashing: A Preliminary Report
ISPDC '04 Proceedings of the Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks
The peer sampling service: experimental evaluation of unstructured gossip-based implementations
Proceedings of the 5th ACM/IFIP/USENIX international conference on Middleware
Approximately uniform random sampling in sensor networks
DMSN '04 Proceeedings of the 1st international workshop on Data management for sensor networks: in conjunction with VLDB 2004
A disk-based join with probabilistic guarantees
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Gossip-based aggregation in large dynamic networks
ACM Transactions on Computer Systems (TOCS)
Distributed Uniform Sampling in Unstructured Peer-to-Peer Networks
HICSS '06 Proceedings of the 39th Annual Hawaii International Conference on System Sciences - Volume 09
Approximating Aggregation Queries in Peer-to-Peer Networks
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Random walks in peer-to-peer networks: algorithms and evaluation
Performance Evaluation - P2P computing systems
Uniform Data Sampling from a Peer-to-Peer Network
ICDCS '07 Proceedings of the 27th International Conference on Distributed Computing Systems
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Querying the internet with PIER
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Just-in-time query retrieval over partially indexed data on structured P2P overlays
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Efficient Data Sampling in Heterogeneous Peer-to-Peer Networks
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Continuous sampling for online aggregation over multiple queries
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
OLAP query reformulation in peer-to-peer data warehousing
Information Systems
PeerTrack: a platform for tracking and tracing objects in large-scale traceability networks
Proceedings of the 15th International Conference on Extending Database Technology
Improving online aggregation performance for skewed data distribution
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
You can stop early with COLA: online processing of aggregate queries in the cloud
Proceedings of the 21st ACM international conference on Information and knowledge management
A DHT-Based system for the management of loosely structured, multidimensional data
Transactions on Large-Scale Data- and Knowledge-Centered Systems VI
Parallel online aggregation in action
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Processing online aggregation on skewed data in mapreduce
Proceedings of the fifth international workshop on Cloud data management
Sampling estimators for parallel online aggregation
BNCOD'13 Proceedings of the 29th British National conference on Big Data
Hi-index | 0.00 |
In many decision making applications, users typically issue aggregate queries. To evaluate these computationally expensive queries, online aggregation has been developed to provide approximate answers (with their respective confidence intervals) quickly, and to continuously refine the answers. In this paper, we extend the online aggregation technique to a distributed context where sites are maintained in a DHT (Distributed Hash Table) network. Our Distributed Online Aggregation (DoA) scheme iteratively and progressively produces approximate aggregate answers as follows: in each iteration, a small set of random samples are retrieved from the data sites and distributed to the processing sites; at each processing site, a local aggregate is computed based on the allocated samples; at a coordinator site, these local aggregates are combined into a global aggregate. DoA adaptively grows the number of processing nodes as the sample size increases. To further reduce the sampling overhead, the samples are retained as a precomputed synopsis over the network to be used for processing future queries. We also study how these synopsis can be maintained incrementally. We have conducted extensive experiments on PlanetLab. The results show that our DoA scheme reduces the initial waiting time significantly and provides high quality approximate answers with running confidence intervals progressively.