Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Programming parallel algorithms
Communications of the ACM
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Separating key management from file system security
Proceedings of the seventeenth ACM symposium on Operating systems principles
Interpreting the data: Parallel analysis with Sawzall
Scientific Programming - Dynamic Grids and Worldwide Computing
Map-reduce-merge: simplified relational data processing on large clusters
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Thousand core chips: a technology perspective
Proceedings of the 44th annual Design Automation Conference
Evaluating MapReduce for Multi-core and Multiprocessor Systems
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Merge: a programming model for heterogeneous multi-core systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Mars: a MapReduce framework on graphics processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
MapReduce for Data Intensive Scientific Analyses
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Hive: a warehousing solution over a map-reduce framework
Proceedings of the VLDB Endowment
FPMR: MapReduce framework on FPGA
Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Exploiting TLS Parallelism at Multiple Loop-Nest Levels
ICPADS '09 Proceedings of the 2009 15th International Conference on Parallel and Distributed Systems
Towards automatic optimization of MapReduce programs
Proceedings of the 1st ACM symposium on Cloud computing
Twister: a runtime for iterative MapReduce
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Multi-GPU volume rendering using MapReduce
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
An OpenCL framework for heterogeneous multicores with local memory
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
MapCG: writing parallel program portable between CPU and GPU
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
DryadInc: reusing work in large-scale computations
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Corey: an operating system for many cores
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Spark: cluster computing with working sets
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Piccolo: building fast, distributed programs with partitioned tables
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
A case for scaling applications to many-core with OS clustering
Proceedings of the sixth conference on Computer systems
Scarlett: coping with skewed content popularity in mapreduce clusters
Proceedings of the sixth conference on Computer systems
Phoenix++: modular MapReduce for shared-memory systems
Proceedings of the second international workshop on MapReduce and its applications
In-situ MapReduce for log processing
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
YSmart: Yet Another SQL-to-MapReduce Translator
ICDCS '11 Proceedings of the 2011 31st International Conference on Distributed Computing Systems
Incoop: MapReduce for incremental computations
Proceedings of the 2nd ACM Symposium on Cloud Computing
A Hierarchical Approach to Maximizing MapReduce Efficiency
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Tarazu: optimizing MapReduce on heterogeneous clusters
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Hi-index | 0.00 |
The prevalence of chip multiprocessors opens opportunities of running data-parallel applications originally in clusters on a single machine with many cores. MapReduce, a simple and elegant programming model to program large-scale clusters, has recently been shown a promising alternative to harness the multicore platform. The differences such as memory hierarchy and communication patterns between clusters and multicore platforms raise new challenges to design and implement an efficient MapReduce system on multicore. This article argues that it is more efficient for MapReduce to iteratively process small chunks of data in turn than processing a large chunk of data at a time on shared memory multicore platforms. Based on the argument, we extend the general MapReduce programming model with a “tiling strategy”, called Tiled-MapReduce (TMR). TMR partitions a large MapReduce job into a number of small subjobs and iteratively processes one subjob at a time with efficient use of resources; TMR finally merges the results of all subjobs for output. Based on Tiled-MapReduce, we design and implement several optimizing techniques targeting multicore, including the reuse of the input buffer among subjobs, a NUCA/NUMA-aware scheduler, and pipelining a subjob’s reduce phase with the successive subjob’s map phase, to optimize the memory, cache, and CPU resources accordingly. Further, we demonstrate that Tiled-MapReduce supports fine-grained fault tolerance and enables several usage scenarios such as online and incremental computing on multicore machines. Performance evaluation with our prototype system called Ostrich on a 48-core machine shows that Ostrich saves up to 87.6% memory, causes less cache misses, and makes more efficient use of CPU cores, resulting in a speedup ranging from 1.86x to 3.07x over Phoenix. Ostrich also efficiently supports fine-grained fault tolerance, online, and incremental computing with small performance penalty.