A bridging model for parallel computation
Communications of the ACM
Parallel database systems: the future of high performance database systems
Communications of the ACM
ACM Transactions on Computer Systems (TOCS)
Principles of Database and Knowledge-Base Systems: Volume II: The New Technologies
Principles of Database and Knowledge-Base Systems: Volume II: The New Technologies
Access path selection in a relational database management system
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Answering queries using views: A survey
The VLDB Journal — The International Journal on Very Large Data Bases
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Implementing declarative overlays
Proceedings of the twentieth ACM symposium on Operating systems principles
Petascale Computational Systems
Computer
Interpreting the data: Parallel analysis with Sawzall
Scientific Programming - Dynamic Grids and Worldwide Computing
Map-reduce-merge: simplified relational data processing on large clusters
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Technical perspective: the data center is the computer
Communications of the ACM - 50th anniversary issue: 1958 - 2008
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Bigtable: A Distributed Storage System for Structured Data
ACM Transactions on Computer Systems (TOCS)
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
RDF-3X: a RISC-style engine for RDF
Proceedings of the VLDB Endowment
Evita raced: metacompilation for declarative networks
Proceedings of the VLDB Endowment
SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
Ad-hoc data processing in the cloud
Proceedings of the VLDB Endowment
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
SW-Store: a vertically partitioned DBMS for Semantic Web data management
The VLDB Journal — The International Journal on Very Large Data Bases
Traverse: Simplified Indexing on Large Map-Reduce-Merge Clusters
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
A comparison of approaches to large-scale data analysis
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Experiences on Processing Spatial Data with MapReduce
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Brute force and indexed approaches to pairwise document similarity comparisons with MapReduce
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
MapReduce and parallel DBMSs: friends or foes?
Communications of the ACM - Amir Pnueli: Ahead of His Time
MapReduce: a flexible data processing tool
Communications of the ACM - Amir Pnueli: Ahead of His Time
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Proceedings of the VLDB Endowment
Building a high-level dataflow system on top of Map-Reduce: the Pig experience
Proceedings of the VLDB Endowment
PLANET: massively parallel learning of tree ensembles with MapReduce
Proceedings of the VLDB Endowment
Hive: a warehousing solution over a map-reduce framework
Proceedings of the VLDB Endowment
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
Optimizing joins in a map-reduce environment
Proceedings of the 13th International Conference on Extending Database Technology
DEDUCE: at the intersection of MapReduce and stream processing
Proceedings of the 13th International Conference on Extending Database Technology
Boom analytics: exploring data-centric, declarative programming for the cloud
Proceedings of the 5th European conference on Computer systems
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling
Proceedings of the 5th European conference on Computer systems
SPARQL basic graph pattern processing with iterative MapReduce
Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
FlumeJava: easy, efficient data-parallel pipelines
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Nephele/PACTs: a programming model and execution framework for web-scale analytical processing
Proceedings of the 1st ACM symposium on Cloud computing
Towards automatic optimization of MapReduce programs
Proceedings of the 1st ACM symposium on Cloud computing
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Efficient parallel set-similarity joins using MapReduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
ParaTimer: a progress indicator for MapReduce DAGs
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A comparison of join algorithms for log processing in MaPreduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Ricardo: integrating R and Hadoop
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Data warehousing and analytics infrastructure at facebook
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
HadoopDB in action: building real world applications
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Online aggregation and continuous query support in MapReduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
MapDupReducer: detecting near duplicates over massive datasets
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Large graph processing in the cloud
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Twister: a runtime for iterative MapReduce
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
A common substrate for cluster computing
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Manimal: relational optimization for data-intensive programs
Procceedings of the 13th International Workshop on the Web and Databases
Spark: cluster computing with working sets
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
HaLoop: efficient iterative data processing on large clusters
Proceedings of the VLDB Endowment
Dremel: interactive analysis of web-scale datasets
Proceedings of the VLDB Endowment
The performance of MapReduce: an in-depth study
Proceedings of the VLDB Endowment
MRShare: sharing across multiple queries in MapReduce
Proceedings of the VLDB Endowment
Hadoop++: making a yellow elephant run like a cheetah (without it even noticing)
Proceedings of the VLDB Endowment
Massively parallel data analysis with PACTs on Nephele
Proceedings of the VLDB Endowment
Signal/collect: graph algorithms for the (semantic) web
ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
PEGASUS: mining peta-scale graphs
Knowledge and Information Systems - Special Issue: Best Papers of the Fifth International Conference on Advanced Data Mining and Applications (ADMA 2009)
ASTERIX: towards a scalable, semistructured data platform for evolving-world models
Distributed and Parallel Databases
Automatic optimization for MapReduce programs
Proceedings of the VLDB Endowment
Programming in Scala: A Comprehensive Step-by-Step Guide, 2nd Edition
Programming in Scala: A Comprehensive Step-by-Step Guide, 2nd Edition
Column-oriented storage techniques for MapReduce
Proceedings of the VLDB Endowment
Social content matching in MapReduce
Proceedings of the VLDB Endowment
Llama: leveraging columnar storage for scalable join processing in the MapReduce framework
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
RAFT at work: speeding-up mapreduce applications under task and node failures
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Filtering: a method for solving graph problems in MapReduce
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
PigSPARQL: mapping SPARQL to Pig Latin
Proceedings of the International Workshop on Semantic Web Information Management
Efficient similarity joins for near-duplicate detection
ACM Transactions on Database Systems (TODS)
CoHadoop: flexible data placement and its exploitation in Hadoop
Proceedings of the VLDB Endowment
SystemML: Declarative machine learning on MapReduce
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Hyracks: A flexible and extensible foundation for data-intensive computing
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
RAFTing MapReduce: Fast recovery on the RAFT
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Optimizing Multiway Joins in a Map-Reduce Environment
IEEE Transactions on Knowledge and Data Engineering
MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters
IEEE Transactions on Knowledge and Data Engineering
Heuristics-Based Query Processing for Large RDF Graphs Using Cloud Computing
IEEE Transactions on Knowledge and Data Engineering
An intermediate algebra for optimizing RDF graph pattern matching on MapReduce
ESWC'11 Proceedings of the 8th extended semantic web conference on The semanic web: research and applications - Volume Part II
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast clustering using MapReduce
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering very large multi-dimensional datasets with MapReduce
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
GBASE: a scalable and general graph management system
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Spectral analysis for billion-scale graphs: discoveries and implementation
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Incoop: MapReduce for incremental computations
Proceedings of the 2nd ACM Symposium on Cloud Computing
Trojan data layouts: right shoes for a running elephant
Proceedings of the 2nd ACM Symposium on Cloud Computing
Programming Pig
ReStore: reusing results of MapReduce jobs
Proceedings of the VLDB Endowment
iMapReduce: A Distributed Computing Framework for Iterative Computation
Journal of Grid Computing
V-SMART-join: a scalable mapreduce framework for all-pair similarity joins of multisets and vectors
Proceedings of the VLDB Endowment
Distributed GraphLab: a framework for machine learning and data mining in the cloud
Proceedings of the VLDB Endowment
ReStore: reusing results of MapReduce jobs in pig
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Clydesdale: structured data processing on hadoop
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Inside "Big Data management": ogres, onions, or parfaits?
Proceedings of the 15th International Conference on Extending Database Technology
Clydesdale: structured data processing on MapReduce
Proceedings of the 15th International Conference on Extending Database Technology
Hadoop: The Definitive Guide
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Load Balancing for MapReduce-based Entity Resolution
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Stubby: a transformation-based optimizer for MapReduce workflows
Proceedings of the VLDB Endowment
Dedoop: efficient deduplication with Hadoop
Proceedings of the VLDB Endowment
ASTERIX: an open source system for "Big Data" management and analysis (demo)
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large-scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling, and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large-scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large-scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.