Efficient top-K query calculation in distributed networks

  • Authors:
  • Pei Cao;Zhe Wang

  • Affiliations:
  • Stanford University, Stanford, CA;Princeton University, Princeton, NJ

  • Venue:
  • Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a new algorithm to answer top-k queries (e.g. "find the k objects with the highest aggregate values") in a distributed network. Existing algorithms such as the Threshold Algorithm [10] consume an excessive amount of bandwidth when the number of nodes, m, is high. We propose a new algorithm called "Three-Phase Uniform Threshold" (TPUT). TPUT reduces network bandwidth consumption by pruning away ineligible objects, and terminates in three round-trips regardless of data input.The paper presents two sets of results about TPUT. First, trace-driven simulations show that, depending on the size of the network, TPUT reduces network traffic by one to two orders of magnitude compared to existing algorithms. Second, TPUT is proven to be instance-optimal on common data series. In particular, analysis shows that by using a pruning parameter α O(m*m) to O(m*√m) for data series following Zipf distribution.