Correlation-Aware Object Placement for Multi-Object Operations

  • Authors:
  • Ming Zhong;Kai Shen;Joel Seiferas

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDCS '08 Proceedings of the 2008 The 28th International Conference on Distributed Computing Systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

A multi-object operation incurs communication or synchronization overhead when the requested objects are distributed over different nodes. The object pair correlations (the probability for a pair of objects to be requested together in an operation) are often highly skewed and yet stable over time for real-world distributed applications. Thus, placing strongly correlated objects on the same node (subject to node space constraint) tends to reduce communication overhead for multi-object operations. This paper studies the optimization of correlation-aware data placement. First, we formalize a restricted form of the problem as a variant of the classic Quadratic Assignment problem and we show that it is NP-hard. Based on a linear programming relaxation, we then propose a polynomial-time approximation algorithm that finds an object placement with communication overhead at most two times that of the optimal placement. We further show that the computation cost can be reduced by limiting the optimization scope to a relatively small number of most important objects. We quantitatively evaluate our approach on keyword index placement for full-text search engines using real traces of 3.7 million web pages and 6.8 million search queries. Compared to the correlation-oblivious random object placement, our approach achieves 37–86% communication overhead reduction on a range of optimization scopes and system sizes. The communication reduction is 30–78% compared to a correlation-aware greedy approach.