Parallel database systems: the future of high performance database systems
Communications of the ACM
GAMMA - A High Performance Dataflow Database Machine
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Scientific data management in the coming decade
ACM SIGMOD Record
Interpreting the data: Parallel analysis with Sawzall
Scientific Programming - Dynamic Grids and Worldwide Computing
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Mars: a MapReduce framework on graphics processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
Ad-hoc data processing in the cloud
Proceedings of the VLDB Endowment
Gordon: using flash memory to build fast, power-efficient clusters for data-intensive applications
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Adaptive workload allocation in query processing in autonomous heterogeneous environments
Distributed and Parallel Databases
Traverse: Simplified Indexing on Large Map-Reduce-Merge Clusters
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Making cluster applications energy-aware
ACDC '09 Proceedings of the 1st workshop on Automated control for datacenters and clouds
BotGraph: large scale spamming botnet detection
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
E = MC3: managing uncertain enterprise data in a cluster-computing environment
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Experiences on Processing Spatial Data with MapReduce
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Evaluating SPLASH-2 Applications Using MapReduce
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
MapReduce Programming Model for .NET-Based Cloud Computing
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Dynamic Query Processing for P2P Data Services in the Cloud
DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Efficiently support MapReduce-like computation models inside parallel DBMS
IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
Composing and executing parallel data-flow graphs with shell pipes
Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science
Nephele: efficient parallel data processing in the cloud
Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
Query processing of massive trajectory data based on mapreduce
Proceedings of the first international workshop on Cloud data management
RAPID: Enabling Scalable Ad-Hoc Analytics on the Semantic Web
ISWC '09 Proceedings of the 8th International Semantic Web Conference
An Efficient Cloud Computing-Based Architecture for Freight System Application in China Railway
CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
Optimizing joins in a map-reduce environment
Proceedings of the 13th International Conference on Extending Database Technology
DEDUCE: at the intersection of MapReduce and stream processing
Proceedings of the 13th International Conference on Extending Database Technology
Harnessing input redundancy in a MapReduce framework
Proceedings of the 2010 ACM Symposium on Applied Computing
Semi-join computation on distributed file systems using map-reduce-merge model
Proceedings of the 2010 ACM Symposium on Applied Computing
Towards scalable RDF graph analytics on MapReduce
Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
FlumeJava: easy, efficient data-parallel pipelines
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Nephele/PACTs: a programming model and execution framework for web-scale analytical processing
Proceedings of the 1st ACM symposium on Cloud computing
Efficient parallel set-similarity joins using MapReduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Indexing multi-dimensional data in a cloud system
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Integrating hadoop and parallel DBMs
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A comparison of join algorithms for log processing in MaPreduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Parallelizing XML data-streaming workflows via MapReduce
Journal of Computer and System Sciences
Flood: elastic streaming MapReduce
Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems
A Map-Reduce System with an Alternate API for Multi-core Environments
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
MapReduce for the cell broadband engine architecture
IBM Journal of Research and Development
Massive Semantic Web data compression with MapReduce
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Manimal: relational optimization for data-intensive programs
Procceedings of the 13th International Workshop on the Web and Databases
Spark: cluster computing with working sets
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Proceedings of the ACM SIGSPATIAL International Workshop on GeoStreaming
JAWS: Job-Aware Workload Scheduling for the Exploration of Turbulence Simulations
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Multidimensional arrays for warehousing data on clouds
Globe'10 Proceedings of the Third international conference on Data management in grid and peer-to-peer systems
The performance of MapReduce: an in-depth study
Proceedings of the VLDB Endowment
MRShare: sharing across multiple queries in MapReduce
Proceedings of the VLDB Endowment
Hadoop++: making a yellow elephant run like a cheetah (without it even noticing)
Proceedings of the VLDB Endowment
Behavioral simulations in MapReduce
Proceedings of the VLDB Endowment
Cheetah: a high performance, custom data warehouse on top of MapReduce
Proceedings of the VLDB Endowment
Continuous mapreduce for In-DB stream analytics
OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems
Parallel skyline computation on multicore architectures
Information Systems
CPRS: A cloud-based program recommendation system for digital TV platforms
Future Generation Computer Systems
Automatic optimization for MapReduce programs
Proceedings of the VLDB Endowment
Efficient parallel skyline processing using hyperplane projections
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Processing theta-joins using MapReduce
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Fast personalized PageRank on MapReduce
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
A platform for scalable one-pass analytics using MapReduce
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Automated partitioning design in parallel database systems
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient processing of data warehousing queries in a split execution environment
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Garbage collection auto-tuning for Java mapreduce on multi-cores
Proceedings of the international symposium on Memory management
A hierarchical framework for cross-domain MapReduce execution
Proceedings of the second international workshop on Emerging computational methods for the life sciences
Full-text indexing for optimizing selection operations in large-scale data analytics
Proceedings of the second international workshop on MapReduce and its applications
A load-balance based resource-scheduling algorithm under cloud computing environment
ICWL'10 Proceedings of the 2010 international conference on New horizons in web-based learning
Tagged mapreduce: efficiently computing multi-analytics using mapreduce
DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
Comparing high level mapreduce query languages
APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
I/O streaming evaluation of batch queries for data-intensive computational turbulence
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Improving the efficiency of subset queries on raster images
Proceedings of the ACM SIGSPATIAL Second International Workshop on High Performance and Distributed Geographic Information Systems
Extend core UDF framework for GPU-enabled analytical query evaluation
Proceedings of the 15th Symposium on International Database Engineering & Applications
Parallel data processing with MapReduce: a survey
ACM SIGMOD Record
Case study of scientific data processing on a cloud using hadoop
HPCS'09 Proceedings of the 23rd international conference on High Performance Computing Systems and Applications
Scalable splitting of massive data streams
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II
CPRS: a cloud-based program recommendation system for digital TV platforms
GPC'10 Proceedings of the 5th international conference on Advances in Grid and Pervasive Computing
Executing multiple group by query using mapreduce approach: implementation and optimization
GPC'10 Proceedings of the 5th international conference on Advances in Grid and Pervasive Computing
Tarazu: optimizing MapReduce on heterogeneous clusters
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
DVM: towards a datacenter-scale virtual machine
VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
Chapter 14: building search computing applications
Search Computing
Exploiting MapReduce-based similarity joins
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Optimizing data shuffling in data-parallel computation by understanding user-defined functions
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Clydesdale: structured data processing on MapReduce
Proceedings of the 15th International Conference on Extending Database Technology
An optimization framework for map-reduce queries
Proceedings of the 15th International Conference on Extending Database Technology
ComMapReduce: an improvement of mapreduce with lightweight communication mechanisms
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part II
Pool-Based distributed evolutionary algorithms using an object database
EvoApplications'12 Proceedings of the 2012t European conference on Applications of Evolutionary Computation
Hierarchical MapReduce Programming Model and Scheduling Algorithms
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
MapReduce-based similarity join for metric spaces
Proceedings of the 1st International Workshop on Cloud Intelligence
PQL: a purely-declarative java extension for parallel programming
ECOOP'12 Proceedings of the 26th European conference on Object-Oriented Programming
Spotting code optimizations in data-parallel pipelines through PeriSCOPE
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
SCALLA: A Platform for Scalable One-Pass Analytics Using MapReduce
ACM Transactions on Database Systems (TODS)
Join processing using Bloom filter in MapReduce
Proceedings of the 2012 ACM Research in Applied Computation Symposium
Optimizing and Tuning MapReduce Jobs to Improve the Large-Scale Data Analysis Process
International Journal of Intelligent Systems
Scalable RDF data compression with MapReduce
Concurrency and Computation: Practice & Experience
Computing scientometrics in large-scale academic search engines with mapreduce
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Efficiently compressing OLAP data cubes via R-tree based recursive partitions
ISMIS'12 Proceedings of the 20th international conference on Foundations of Intelligent Systems
Tiled-MapReduce: Efficient and Flexible MapReduce Processing on Multicore with Tiling
ACM Transactions on Architecture and Code Optimization (TACO)
Breaking the MapReduce stage barrier
Cluster Computing
Email marketing and scalability using Hadoop
Proceedings of the 5th ACM COMPUTE Conference: Intelligent & scalable system technologies
Proceedings of the 16th International ACM Sigsoft symposium on Component-based software engineering
HyMR: a hybrid MapReduce workflow system
Proceedings of the 3rd international workshop on Emerging computational methods for the life sciences
MapReduce with communication overlap (MaRCO)
Journal of Parallel and Distributed Computing
Future Generation Computer Systems
Toward intersection filter-based optimization for joins in MapReduce
Proceedings of the 2nd International Workshop on Cloud Intelligence
Distributed data management using MapReduce
ACM Computing Surveys (CSUR)
The family of mapreduce and large-scale data processing systems
ACM Computing Surveys (CSUR)
Cloud-aware processing of MapReduce-based OLAP applications
AusPDC '13 Proceedings of the Eleventh Australasian Symposium on Parallel and Distributed Computing - Volume 140
ComMapReduce: An improvement of MapReduce with lightweight communication mechanisms
Data & Knowledge Engineering
A Scalable Distributed Framework for Efficient Analytics on Ordered Datasets
UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing
Hi-index | 0.00 |
Map-Reduce is a programming model that enables easy development of scalable parallel applications to process a vast amount of data on large clusters of commodity machines. Through a simple interface with two functions, map and reduce, this model facilitates parallel implementation of many real-world tasks such as data processing jobs for search engines and machine learning. However,this model does not directly support processing multiple related heterogeneous datasets. While processing relational data is a common need, this limitation causes difficulties and/or inefficiency when Map-Reduce is applied on relational operations like joins. We improve Map-Reduce into a new model called Map-Reduce-Merge. It adds to Map-Reduce a Merge phase that can efficiently merge data already partitioned and sorted (or hashed) by map and reduce modules. We also demonstrate that this new model can express relational algebra operators as well as implement several join algorithms.