Partitioned posting files: a parallel inverted file structure for information retrieval
SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Using MPI: portable parallel programming with the message-passing interface
Using MPI: portable parallel programming with the message-passing interface
PODS '95 Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Effective Proximity Retrieval by Ordering Permutations
IEEE Transactions on Pattern Analysis and Machine Intelligence
Approximate similarity search in metric spaces using inverted files
Proceedings of the 3rd international conference on Scalable information systems
Approximate similarity search: A multi-faceted problem
Journal of Discrete Algorithms
Towards Efficient MapReduce Using MPI
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Hadoop: The Definitive Guide
Twister: a runtime for iterative MapReduce
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Enhancing MapReduce via Asynchronous Data Processing
ICPADS '10 Proceedings of the 2010 IEEE 16th International Conference on Parallel and Distributed Systems
Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Can MPI Benefit Hadoop and MapReduce Applications?
ICPPW '11 Proceedings of the 2011 40th International Conference on Parallel Processing Workshops
MapReduce in MPI for Large-scale graph algorithms
Parallel Computing
MapReduce indexing strategies: Studying scalability and efficiency
Information Processing and Management: an International Journal
Parallel approaches to permutation-based indexing using inverted files
SISAP'12 Proceedings of the 5th international conference on Similarity Search and Applications
Hi-index | 0.00 |
MapReduce is a programming model proposed to simplify large-scale data processing. In contrast, the message passing interface (MPI) standard is extensively used for algorithmic parallelization, as it accommodates an efficient communication infrastructure. In the original implementation of MapReduce, the reduce function can only start processing following termination of the map function. If the map function is slow for any reason, this will affect the whole running time. In this paper, we propose MapReduce overlapping using MPI, which is an adapted structure of the MapReduce programming model for fast intensive data processing. Our implementation is based on running the map and the reduce functions concurrently in parallel by exchanging partial intermediate data between them in a pipeline fashion using MPI. At the same time, we maintain the usability and the simplicity of MapReduce. Experimental results based on three different applications (WordCount, Distributed Inverted Indexing and Distributed Approximate Similarity Search) show a good speedup compared to the earlier versions of MapReduce such as Hadoop and the available MPI-MapReduce implementations.