DIMO: distributed index for matching multimedia objects using MapReduce

  • Authors:
  • Ahmed Abdelsadek;Mohamed Hefeeda

  • Affiliations:
  • Simon Fraser University, Surrey, BC, Canada;Qatar Computing Research Institute, Qatar Foundation, Doha, Qatar

  • Venue:
  • Proceedings of the 5th ACM Multimedia Systems Conference
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents the design and evaluation of DIMO, a distributed system for matching high-dimensional multimedia objects. DIMO provides multimedia applications with the basic function of computing the K nearest neighbors on large-scale datasets. It also allows multimedia applications to define application-specific functions to further process the computed nearest neighbors. DIMO presents a novel method for partitioning, searching, and storing high-dimensional datasets on distributed infrastructures that support the MapReduce programming model. We have implemented DIMO and extensively evaluated it on Amazon clusters with number of machines ranging from 8 to 128. We have experimented with large datasets of sizes up to 160 million data points extracted from images, and each point has 128 dimensions. Our experimental results show that DIMO: (i) results in high precision when compared against the ground-truth nearest neighbors, (ii) can elastically utilize varying amounts of computing resources, (iii) does not impose high network overheads, (iv) does not require large main memory even for processing large datasets, and (v) balances the load across the used computing machines. In addition, DIMO outperforms the closest system in the literature by a large margin (up to 20%) in terms of the achieved average precision of the computed nearest neighbors. Furthermore, DIMO requires at least three orders of magnitudes less storage than the other system, and it is more computationally efficient.