Distributed data management using MapReduce

  • Authors:
  • Feng Li;Beng Chin Ooi;M. Tamer Özsu;Sai Wu

  • Affiliations:
  • National University of Singapore, Singapore;National University of Singapore, Singapore;University of Waterloo, Canada;Zhejiang University, China

  • Venue:
  • ACM Computing Surveys (CSUR)
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

MapReduce is a framework for processing and managing large-scale datasets in a distributed cluster, which has been used for applications such as generating search indexes, document clustering, access log analysis, and various other forms of data analytics. MapReduce adopts a flexible computation model with a simple interface consisting of map and reduce functions whose implementations can be customized by application developers. Since its introduction, a substantial amount of research effort has been directed toward making it more usable and efficient for supporting database-centric operations. In this article, we aim to provide a comprehensive review of a wide range of proposals and systems that focusing fundamentally on the support of distributed data management and processing using the MapReduce framework.