Tree-based techniques for query evaluation
Tree-based techniques for query evaluation
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
An adaptive query execution system for data integration
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Eddies: continuously adaptive query processing
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Continuously adaptive continuous queries over streams
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Practical Skew Handling in Parallel Joins
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
StreaMon: an adaptive engine for stream query processing
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Adaptive Caching for Continuous Queries
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Query optimization over web services
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Map-reduce-merge: simplified relational data processing on large clusters
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Maximizing the output rate of multi-way join queries over streaming information sources
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Bigtable: A Distributed Storage System for Structured Data
ACM Transactions on Computer Systems (TOCS)
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Clustera: an integrated computation and data management system
Proceedings of the VLDB Endowment
PNUTS: Yahoo!'s hosted data serving platform
Proceedings of the VLDB Endowment
Optimal splitters for database partitioning with size bounds
Proceedings of the 12th International Conference on Database Theory
Flow Algorithms for Parallel Query Optimization
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Towards scalable RDF graph analytics on MapReduce
Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
Manimal: relational optimization for data-intensive programs
Procceedings of the 13th International Workshop on the Web and Databases
Hadoop++: making a yellow elephant run like a cheetah (without it even noticing)
Proceedings of the VLDB Endowment
Map-reduce extensions and recursive queries
Proceedings of the 14th International Conference on Extending Database Technology
Automatic optimization for MapReduce programs
Proceedings of the VLDB Endowment
Parallel evaluation of conjunctive queries
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Processing theta-joins using MapReduce
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Llama: leveraging columnar storage for scalable join processing in the MapReduce framework
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
An intermediate algebra for optimizing RDF graph pattern matching on MapReduce
ESWC'11 Proceedings of the 8th extended semantic web conference on The semanic web: research and applications - Volume Part II
Query optimization for massively parallel data processing
Proceedings of the 2nd ACM Symposium on Cloud Computing
Building wavelet histograms on large data in MapReduce
Proceedings of the VLDB Endowment
Parallel data processing with MapReduce: a survey
ACM SIGMOD Record
Datalog-Based program analysis with BES and RWL
Datalog'10 Proceedings of the First international conference on Datalog Reloaded
Cluster computing, recursion and datalog
Datalog'10 Proceedings of the First international conference on Datalog Reloaded
SWIM '12 Proceedings of the 4th International Workshop on Semantic Web Information Management
Clydesdale: structured data processing on MapReduce
Proceedings of the 15th International Conference on Extending Database Technology
Proceedings of the 15th International Conference on Database Theory
Performance guarantees for distributed reachability queries
Proceedings of the VLDB Endowment
Towards efficient join processing over large RDF graph using mapreduce
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Efficient big data processing in Hadoop MapReduce
Proceedings of the VLDB Endowment
Join processing using Bloom filter in MapReduce
Proceedings of the 2012 ACM Research in Applied Computation Symposium
Optimizing and Tuning MapReduce Jobs to Improve the Large-Scale Data Analysis Process
International Journal of Intelligent Systems
SemanMR: big data processing framework based on semantics
Proceedings of the Fourth Asia-Pacific Symposium on Internetware
Eagle-eyed elephant: split-oriented indexing in Hadoop
Proceedings of the 16th International Conference on Extending Database Technology
Processing multi-way spatial joins on map-reduce
Proceedings of the 16th International Conference on Extending Database Technology
Communication steps for parallel query processing
Proceedings of the 32nd symposium on Principles of database systems
Toward intersection filter-based optimization for joins in MapReduce
Proceedings of the 2nd International Workshop on Cloud Intelligence
Distributed data management using MapReduce
ACM Computing Surveys (CSUR)
MRPacker: an SQL to mapreduce optimizer
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
The family of mapreduce and large-scale data processing systems
ACM Computing Surveys (CSUR)
Big data begets big database theory
BNCOD'13 Proceedings of the 29th British National conference on Big Data
BNCOD'13 Proceedings of the 29th British National conference on Big Data
Computing the stratified semantics of logic programs over big data through mass parallelization
RuleML'13 Proceedings of the 7th international conference on Theory, Practice, and Applications of Rules on the Web
Making queries tractable on big data with preprocessing: through the eyes of complexity theory
Proceedings of the VLDB Endowment
ComMapReduce: An improvement of MapReduce with lightweight communication mechanisms
Data & Knowledge Engineering
Hi-index | 0.00 |
Implementations of map-reduce are being used to perform many operations on very large data. We examine strategies for joining several relations in the map-reduce environment. Our new approach begins by identifying the "map-key," the set of attributes that identify the Reduce process to which a Map process must send a particular tuple. Each attribute of the map-key gets a "share," which is the number of buckets into which its values are hashed, to form a component of the identifier of a Reduce process. Relations have their tuples replicated in limited fashion, the degree of replication depending on the shares for those map-key attributes that are missing from their schema. We study the problem of optimizing the shares, given a fixed number of Reduce processes. An algorithm for detecting and fixing problems where an attribute is "mistakenly" included in the map-key is given. Then, we consider two important special cases: chain joins and star joins. In each case we are able to determine the map-key and determine the shares that yield the least replication. While the method we propose is not always superior to the conventional way of using map-reduce to implement joins, there are some important cases involving large-scale data where our method wins, including: (1) analytic queries in which a very large fact table is joined with smaller dimension tables, and (2) queries involving paths through graphs with high out-degree, such as the Web or a social network.