Top-k vectorial aggregation queries in a distributed environment

Authors:
Guy Sagy;Izchak Sharfman;Daniel Keren;Assaf Schuster
Affiliations:
CS Faculty, Technion, Technion City 32000, Haifa, Israel;CS Faculty, Technion, Technion City 32000, Haifa, Israel;CS Department, Haifa University, Haifa 31905, Israel;CS Faculty, Technion, Technion City 32000, Haifa, Israel
Venue:
Journal of Parallel and Distributed Computing
Year:
2011

Citing 40
Cited 0

Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Minimal probing: supporting expensive predicates for top-k queries

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Top-k selection queries over relational databases: Mapping strategies and performance evaluation

ACM Transactions on Database Systems (TODS)
The Skyline Operator

Proceedings of the 17th International Conference on Data Engineering
Evaluating Top-k Selection Queries

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Optimizing Multi-Feature Queries for Image Databases

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Efficient Progressive Skyline Computation

Proceedings of the 27th International Conference on Very Large Data Bases
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
Query Processing Issues in Image(Multimedia) Databases

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Towards Efficient Multi-Feature Queries in Heterogeneous Environments

ITCC '01 Proceedings of the International Conference on Information Technology: Coding and Computing
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Efficient top-K query calculation in distributed networks

Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Progressive Distributed Top-k Retrieval in Peer-to-Peer Networks

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Progressive skyline computation in database systems

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
Maximal vector computation in large data sets

VLDB '05 Proceedings of the 31st international conference on Very large data bases
KLEE: a framework for distributed top-k query algorithms

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Supporting ad-hoc ranking aggregates

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Boolean + ranking: querying a database by k-constrained optimization

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Answering top-k queries using views

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
IO-Top-k: index-access optimized top-k query processing

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Algorithms and analyses for maximal vector computation

The VLDB Journal — The International Journal on Very Large Data Bases
Progressive and selective merge: computing top-k with ad-hoc ranking functions

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Progressive ranking of range aggregates

Data & Knowledge Engineering
Shooting stars in the sky: an online algorithm for skyline queries

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Joining ranked inputs in practice

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A geometric approach to monitoring threshold functions over distributed data streams

ACM Transactions on Database Systems (TODS)
Supporting top-K join queries in relational databases

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Top-k query evaluation with probabilistic guarantees

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Probabilistic ranking of database query results

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Best position algorithms for top-k queries

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Anytime measures for top-k algorithms

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
ARCube: supporting ranking aggregate queries in partially materialized data cubes

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
On efficient top-k query processing in highly distributed environments

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Shape sensitive geometric monitoring

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
MINERVA∞: a scalable efficient peer-to-peer search engine

Proceedings of the ACM/IFIP/USENIX 2005 International Conference on Middleware
Skyline-based Peer-to-Peer Top-k Query Processing

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Distributed top-k aggregation queries at large

Distributed and Parallel Databases
Randomized multi-pass streaming skyline algorithms

Proceedings of the VLDB Endowment
Efficient processing of distributed top-k queries

DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given a large set of objects in a distributed database, the goal of a top-k query is to determine the top-k scoring objects and return them to the user. Efficient top-k ranking over distributed databases has been the focus of recent research, with most current algorithms operating on the assumption that each node holds a single or small subset of each object's numerical attributes. However, in many important setups each node might hold instead a full d-dimensional vector of numerical attributes for each object. Examples include website activity in distributed servers, sales statistics for a retail chain, or share price information in different stock markets. For these setups, we define a novel ranking problem, top-kvectorial aggregation queries, where each object's score is determined by first aggregating the attribute vectors held for it and then applying the scoring function over the aggregated vector. Our communication-efficient algorithm uses a blend of geometric and skyline related machinery, some of which is newly developed, as well as an algorithmic framework for defining generic local constraints. Whereas previous algorithms have reduced data sharing by defining local thresholds for each attribute, such tailored solutions might perform poorly. Experimental results on real-world data demonstrate that our algorithm maintains low latency, with a communication cost up to four orders of magnitude lower than that of existing solutions.