Robust distributed indexing for locality-skewed workloads

  • Authors:
  • Mu-Woong Lee;Seung-won Hwang

  • Affiliations:
  • POSTECH, Pohang, South Korea;POSTECH, Pohang, South Korea

  • Venue:
  • Proceedings of the 21st ACM international conference on Information and knowledge management
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multidimensional indexing is crucial for enabling a fast search over large-scale data. Owing to the unprecedented scale of data, extending such indexing technology has recently gained attention in distributed environments. The goal of existing efforts in distributed indexing has been the localization of queries to data residing at a small number of nodes (i.e., locality-preserving indexing) to minimize communication cost. However, considering that workloads often correlate with data locality, such indexing often generates hotspots. Location-based queries are typically skewed to disaster areas during certain periods of time, e.g., during Hurricane Irene, search traffic increased by more than 2000%. To alleviate such hotspots, we propose workload-balancing as an optimization goal. A cost model analytically supporting the need for load balancing is first developed, then a distributed index that evenly distributes the workload is presented. Our empirical study suggests that hotspots degrading search performance can be effectively alleviated. Specifically, when deployed to Amazon EC2, our proposed scheme showed maximum speed-up of 127.7%. Even in hostile settings where workload is not at all correlated with the search criteria, the proposed scheme's performance is comparable to existing approaches optimized for such settings.