Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system

Authors:
Richard M. Yoo;Anthony Romano;Christos Kozyrakis
Affiliations:
Computer Systems Laboratory, Stanford University, USA;Computer Systems Laboratory, Stanford University, USA;Computer Systems Laboratory, Stanford University, USA
Venue:
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Year:
2009

Citing 0
Cited 40

FPMR: MapReduce framework on FPGA

Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Web-scale computer vision using MapReduce for multimedia data mining

Proceedings of the Tenth International Workshop on Multimedia Data Mining
Very large pattern databases for heuristic search

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
MapCG: writing parallel program portable between CPU and GPU

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Gossamer: a lightweight programming framework for multicore machines

HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
A MapReduce approach to Gi*(d) spatial statistic

Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems
Behavioral simulations in MapReduce

Proceedings of the VLDB Endowment
Garbage collection auto-tuning for Java mapreduce on multi-cores

Proceedings of the international symposium on Memory management
Phoenix++: modular MapReduce for shared-memory systems

Proceedings of the second international workshop on MapReduce and its applications
ETLMR: a highly scalable dimensional ETL framework based on mapreduce

DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
Elastic phoenix: malleable mapreduce for shared-memory systems

NPC'11 Proceedings of the 8th IFIP international conference on Network and parallel computing
A resistive TCAM accelerator for data-intensive computing

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Optimizing MapReduce for GPUs with effective shared memory usage

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
A Map-Reduce Based Framework for Heterogeneous Processing Element Cluster Environments

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
PARDIS: a programmable memory controller for the DDRx interfacing standards

Proceedings of the 39th Annual International Symposium on Computer Architecture
Hierarchical merge for scalable MapReduce

Proceedings of the 2012 workshop on Management of big data systems
Accelerating MapReduce on a coupled CPU-GPU architecture

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Cache-sensitive MapReduce DGEMM algorithms for shared memory architectures

Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference
ADAPT: A framework for coscheduling multithreaded programs

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Fast parallel algorithms for blocked dense matrix multiplication on shared memory architectures

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
G-Hadoop: MapReduce across distributed data centers for data-intensive computing

Future Generation Computer Systems
Accelerating text mining workloads in a MapReduce-based distributed GPU environment

Journal of Parallel and Distributed Computing
Cogset: a high performance MapReduce engine

Concurrency and Computation: Practice & Experience
Grex: An efficient MapReduce framework for graphics processing units

Journal of Parallel and Distributed Computing
Tiled-MapReduce: Efficient and Flexible MapReduce Processing on Multicore with Tiling

ACM Transactions on Architecture and Code Optimization (TACO)
Producer-Consumer: the programming model for future many-core processors

ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Active disk meets flash: a case for intelligent SSDs

Proceedings of the 27th international ACM conference on International conference on supercomputing
Managing the Quality vs. Efficiency Trade-off Using Dynamic Effort Scaling

ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Probabilistic Embedded Computing
Detection of false sharing using machine learning

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
HAT: history-based auto-tuning MapReduce in heterogeneous environments

The Journal of Supercomputing
Data-Intensive Cloud Computing: Requirements, Expectations, Challenges, and Solutions

Journal of Grid Computing
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
A lightweight infrastructure for graph analytics

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Scale-up vs scale-out for Hadoop: time to rethink?

Proceedings of the 4th annual Symposium on Cloud Computing
A programmable memory controller for the DDRx interfacing standards

ACM Transactions on Computer Systems (TOCS)
Hone: "Scaling down" Hadoop on shared-memory systems

Proceedings of the VLDB Endowment
Power-aware dynamic memory management on many-core platforms utilizing DVFS

ACM Transactions on Embedded Computing Systems (TECS) - Special Section on ESTIMedia'10
DESC: energy-efficient data exchange using synchronized counters

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Analyzing the performance of SMP memory allocators with iterative MapReduce applications

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Dynamic runtimes can simplify parallel programming by automatically managing concurrency and locality without further burdening the programmer. Nevertheless, implementing such runtime systems for large-scale, shared-memory systems can be challenging. This work optimizes Phoenix, a MapReduce runtime for shared-memory multi-cores and multiprocessors, on a quad-chip, 32-core, 256-thread UltraSPARC T2+ system with NUMA characteristics. We show how a multi-layered approach that comprises optimizations on the algorithm, implementation, and OS interaction leads to significant speedup improvements with 256 threads (average of 2.5脳 higher speedup, maximum of 19脳). We also identify the roadblocks that limit the scalability of parallel runtimes on shared-memory systems, which are inherently tied to the OS scalability on large-scale systems.