SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Join processing in relational databases
ACM Computing Surveys (CSUR)
Parallel database systems: the future of high performance database systems
Communications of the ACM
Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
Query processing in a system for distributed databases (SDD-1)
ACM Transactions on Database Systems (TODS)
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Map-reduce-merge: simplified relational data processing on large clusters
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Automatic optimization of parallel dataflow programs
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
A comparison of approaches to large-scale data analysis
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Cheetah: a high performance, custom data warehouse on top of MapReduce
Proceedings of the VLDB Endowment
Processing theta-joins using MapReduce
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Llama: leveraging columnar storage for scalable join processing in the MapReduce framework
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient processing of data warehousing queries in a split execution environment
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
CoHadoop: flexible data placement and its exploitation in Hadoop
Proceedings of the VLDB Endowment
An intermediate algebra for optimizing RDF graph pattern matching on MapReduce
ESWC'11 Proceedings of the 8th extended semantic web conference on The semanic web: research and applications - Volume Part II
Making standard ML a practical database programming language
Proceedings of the 16th ACM SIGPLAN international conference on Functional programming
Learning-based entity resolution with MapReduce
Proceedings of the third international workshop on Cloud data management
Efficient data distribution strategy for join query processing in the cloud
Proceedings of the third international workshop on Cloud data management
Efficient processing of RDF graph pattern matching on MapReduce platforms
Proceedings of the second international workshop on Data intensive computing in the clouds
Parallel data processing with MapReduce: a survey
ACM SIGMOD Record
Matrix chain multiplication via multi-way join algorithms in MapReduce
Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
RDFPath: path query processing on large RDF graphs with mapreduce
ESWC'11 Proceedings of the 8th international conference on The Semantic Web
Inside "Big Data management": ogres, onions, or parfaits?
Proceedings of the 15th International Conference on Extending Database Technology
Clydesdale: structured data processing on MapReduce
Proceedings of the 15th International Conference on Extending Database Technology
Adaptive MapReduce using situation-aware mappers
Proceedings of the 15th International Conference on Extending Database Technology
ComMapReduce: an improvement of mapreduce with lightweight communication mechanisms
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part II
MapReduce-based similarity join for metric spaces
Proceedings of the 1st International Workshop on Cloud Intelligence
Only aggressive elephants are fast elephants
Proceedings of the VLDB Endowment
Parallel rough set based knowledge acquisition using MapReduce from big data
Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Towards efficient join processing over large RDF graph using mapreduce
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Efficient big data processing in Hadoop MapReduce
Proceedings of the VLDB Endowment
MapReduce algorithms for big data analysis
Proceedings of the VLDB Endowment
T: a data-centric cooling energy costs reduction approach for big data analytics cloud
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
HEDC: a histogram estimator for data in the cloud
Proceedings of the fourth international workshop on Cloud data management
You can stop early with COLA: online processing of aggregate queries in the cloud
Proceedings of the 21st ACM international conference on Information and knowledge management
Join processing using Bloom filter in MapReduce
Proceedings of the 2012 ACM Research in Applied Computation Symposium
Optimizing and Tuning MapReduce Jobs to Improve the Large-Scale Data Analysis Process
International Journal of Intelligent Systems
An efficient programming model for memory-intensive recursive algorithms using parallel disks
Proceedings of the 37th International Symposium on Symbolic and Algebraic Computation
Eagle-eyed elephant: split-oriented indexing in Hadoop
Proceedings of the 16th International Conference on Extending Database Technology
Processing multi-way spatial joins on map-reduce
Proceedings of the 16th International Conference on Extending Database Technology
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Photon: fault-tolerant and scalable joining of continuous data streams
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Integrating scale out and fault tolerance in stream processing using operator state management
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Cloud MapReduce for particle filter-based data assimilation for wildfire spread simulation
Proceedings of the High Performance Computing Symposium
Cache conscious star-join in MapReduce environments
Proceedings of the 2nd International Workshop on Cloud Intelligence
Toward intersection filter-based optimization for joins in MapReduce
Proceedings of the 2nd International Workshop on Cloud Intelligence
Distributed data management using MapReduce
ACM Computing Surveys (CSUR)
Distributed matrix factorization with mapreduce using a series of broadcast-joins
Proceedings of the 7th ACM conference on Recommender systems
The family of mapreduce and large-scale data processing systems
ACM Computing Surveys (CSUR)
Hadoop GIS: a high performance spatial data warehousing system over mapreduce
Proceedings of the VLDB Endowment
ComMapReduce: An improvement of MapReduce with lightweight communication mechanisms
Data & Knowledge Engineering
Exploiting inter-operation parallelism for matrix chain multiplication using MapReduce
The Journal of Supercomputing
International Journal of Approximate Reasoning
Hi-index | 0.00 |
The MapReduce framework is increasingly being used to analyze large volumes of data. One important type of data analysis done with MapReduce is log processing, in which a click-stream or an event log is filtered, aggregated, or mined for patterns. As part of this analysis, the log often needs to be joined with reference data such as information about users. Although there have been many studies examining join algorithms in parallel and distributed DBMSs, the MapReduce framework is cumbersome for joins. MapReduce programmers often use simple but inefficient algorithms to perform joins. In this paper, we describe crucial implementation details of a number of well-known join strategies in MapReduce, and present a comprehensive experimental comparison of these join techniques on a 100-node Hadoop cluster. Our results provide insights that are unique to the MapReduce platform and offer guidance on when to use a particular join algorithm on this platform.