MRO-MPI: MapReduce overlapping using MPI and an optimized data exchange policy

Authors:
Hisham Mohamed;Stéphane Marchand-Maillet
Affiliations:
-;-
Venue:
Parallel Computing
Year:
2013

Citing 19
Cited 0

Partitioned posting files: a parallel inverted file structure for information retrieval

SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Using MPI: portable parallel programming with the message-passing interface

Using MPI: portable parallel programming with the message-passing interface
Similarity-based queries

PODS '95 Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Effective Proximity Retrieval by Ordering Permutations

IEEE Transactions on Pattern Analysis and Machine Intelligence
Approximate similarity search in metric spaces using inverted files

Proceedings of the 3rd international conference on Scalable information systems
Approximate similarity search: A multi-faceted problem

Journal of Discrete Algorithms
Towards Efficient MapReduce Using MPI

Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Hadoop: The Definitive Guide

Hadoop: The Definitive Guide
Twister: a runtime for iterative MapReduce

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
MapReduce online

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Enhancing MapReduce via Asynchronous Data Processing

ICPADS '10 Proceedings of the 2010 IEEE 16th International Conference on Parallel and Distributed Systems
A parallel cross-modal search engine over large-scale multimedia collections with interactive relevance feedback

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Can MPI Benefit Hadoop and MapReduce Applications?

ICPPW '11 Proceedings of the 2011 40th International Conference on Parallel Processing Workshops
MapReduce in MPI for Large-scale graph algorithms

Parallel Computing
MapReduce indexing strategies: Studying scalability and efficiency

Information Processing and Management: an International Journal
Parallel approaches to permutation-based indexing using inverted files

SISAP'12 Proceedings of the 5th international conference on Similarity Search and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

MapReduce is a programming model proposed to simplify large-scale data processing. In contrast, the message passing interface (MPI) standard is extensively used for algorithmic parallelization, as it accommodates an efficient communication infrastructure. In the original implementation of MapReduce, the reduce function can only start processing following termination of the map function. If the map function is slow for any reason, this will affect the whole running time. In this paper, we propose MapReduce overlapping using MPI, which is an adapted structure of the MapReduce programming model for fast intensive data processing. Our implementation is based on running the map and the reduce functions concurrently in parallel by exchanging partial intermediate data between them in a pipeline fashion using MPI. At the same time, we maintain the usability and the simplicity of MapReduce. Experimental results based on three different applications (WordCount, Distributed Inverted Indexing and Distributed Approximate Similarity Search) show a good speedup compared to the earlier versions of MapReduce such as Hadoop and the available MPI-MapReduce implementations.