DIMO: distributed index for matching multimedia objects using MapReduce

Authors:
Ahmed Abdelsadek;Mohamed Hefeeda
Affiliations:
Simon Fraser University, Surrey, BC, Canada;Qatar Computing Research Institute, Qatar Foundation, Doha, Qatar
Venue:
Proceedings of the 5th ACM Multimedia Systems Conference
Year:
2014

Citing 23
Cited 0

Data structures and algorithms for nearest neighbor search in general metric spaces

SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Multidimensional binary search trees used for associative searching

Communications of the ACM
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable content-addressable network

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
A Fast Nearest-Neighbor Algorithm Based on a Principal Axis Search Tree

IEEE Transactions on Pattern Analysis and Machine Intelligence
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Handwritten Character Classification Using Nearest Neighbor in Large Databases

IEEE Transactions on Pattern Analysis and Machine Intelligence
When Is ''Nearest Neighbor'' Meaningful?

ICDT '99 Proceedings of the 7th International Conference on Database Theory
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Reducing High-Dimensional Data by Principal Component Analysis vs. Random Projection for Nearest Neighbor Classification

ICMLA '06 Proceedings of the 5th International Conference on Machine Learning and Applications
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Distributed similarity search in high dimensions using locality sensitive hashing

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
A content-addressable network for similarity search in metric spaces

DBISP2P'05/06 Proceedings of the 2005/2006 international conference on Databases, information systems, and peer-to-peer computing
Indexing multi-dimensional data in a cloud system

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Multi-dimensional Index on Hadoop Distributed File System

NAS '10 Proceedings of the 2010 IEEE Fifth International Conference on Networking, Architecture, and Storage
The Hadoop Distributed File System

MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
Efficient parallel kNN joins for large data in MapReduce

Proceedings of the 15th International Conference on Extending Database Technology
Efficient processing of k nearest neighbor joins using MapReduce

Proceedings of the VLDB Endowment
Spider: A system for finding 3D video copies

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents the design and evaluation of DIMO, a distributed system for matching high-dimensional multimedia objects. DIMO provides multimedia applications with the basic function of computing the K nearest neighbors on large-scale datasets. It also allows multimedia applications to define application-specific functions to further process the computed nearest neighbors. DIMO presents a novel method for partitioning, searching, and storing high-dimensional datasets on distributed infrastructures that support the MapReduce programming model. We have implemented DIMO and extensively evaluated it on Amazon clusters with number of machines ranging from 8 to 128. We have experimented with large datasets of sizes up to 160 million data points extracted from images, and each point has 128 dimensions. Our experimental results show that DIMO: (i) results in high precision when compared against the ground-truth nearest neighbors, (ii) can elastically utilize varying amounts of computing resources, (iii) does not impose high network overheads, (iv) does not require large main memory even for processing large datasets, and (v) balances the load across the used computing machines. In addition, DIMO outperforms the closest system in the literature by a large margin (up to 20%) in terms of the achieved average precision of the computed nearest neighbors. Furthermore, DIMO requires at least three orders of magnitudes less storage than the other system, and it is more computationally efficient.