Enabling scalability and performance in a large scale CMP environment

Authors:
Bratin Saha;Ali-Reza Adl-Tabatabai;Anwar Ghuloum;Mohan Rajagopalan;Richard L. Hudson;Leaf Petersen;Vijay Menon;Brian Murphy;Tatiana Shpeisman;Eric Sprangle;Anwar Rohillah;Doug Carmean;Jesse Fang
Affiliations:
Intel Corporation;Intel Corporation;Intel Corporation;Intel Corporation;Intel Corporation;Intel Corporation;Intel Corporation;Intel Corporation;Intel Corporation;Intel Corporation;Intel Corporation;Intel Corporation;Intel Corporation
Venue:
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Year:
2007

Citing 33
Cited 21

Mach and Matchmaker: kernel and language support for object-oriented distributed systems

OOPLSA '86 Conference proceedings on Object-oriented programming systems, languages and applications
The Sprite Network Operating System

Computer
The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors

IEEE Transactions on Computers
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Compiling with continuations

Compiling with continuations
Scheduler activations: effective kernel support for the user-level management of parallelism

ACM Transactions on Computer Systems (TOCS)
Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
On micro-kernel construction

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Exokernel: an operating system architecture for application-level resource management

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Extensibility safety and performance in the SPIN operating system

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
The Flux OSKit: a substrate for kernel and language research

Proceedings of the sixteenth ACM symposium on Operating systems principles
Disco: running commodity operating systems on scalable multiprocessors

Proceedings of the sixteenth ACM symposium on Operating systems principles
Multithreaded programming with Pthreads

Multithreaded programming with Pthreads
Memory allocation for long-running server applications

Proceedings of the 1st international symposium on Memory management
Functional divisions in the Piglet multiprocessor operating system

Proceedings of the 8th ACM SIGOPS European workshop on Support for composing distributed applications
First-class user-level threads

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Scalable queue-based spin locks with timeout

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
SEDA: an architecture for well-conditioned, scalable internet services

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
OpenMP on networks of workstations

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Profile-directed optimization of event-based programs

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Queue Locks on Cache Coherent Multiprocessors

Proceedings of the 8th International Symposium on Parallel Processing
Xen and the art of virtualization

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Capriccio: scalable threads for internet services

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Best of Both Latency and Throughput

ICCD '04 Proceedings of the IEEE International Conference on Computer Design
The Open Runtime Platform: a flexible high-performance managed runtime environment: Research Articles

Concurrency and Computation: Practice & Experience - 2002 ACM Java Grande—ISCOPE Conference Part I
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
McRT-STM: a high performance software transactional memory system for a multi-core runtime

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
The java.util.concurrent synchronizer framework

Science of Computer Programming - Special issue: Concurrency and synchronization in Java programs
McRT-Malloc: a scalable transactional memory allocator

Proceedings of the 5th international symposium on Memory management
Compiler and runtime support for efficient software transactional memory

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Flash: an efficient and portable web server

ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference

Towards effective user-controlled scheduling for microkernel-based systems

ACM SIGOPS Operating Systems Review
Streamware: programming general-purpose multicore processors using streams

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Thread scheduling for multi-core platforms

HOTOS'07 Proceedings of the 11th USENIX workshop on Hot topics in operating systems
Pillar: A Parallel Implementation Language

Languages and Compilers for Parallel Computing
Programming model for a heterogeneous x86 platform

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Adapting application execution in CMPs using helper threads

Journal of Parallel and Distributed Computing
Exploiting fine-grain thread parallelism on multicore architectures

Scientific Programming - Software Development for Multi-core Computing Systems
Mining tree-structured data on multicore systems

Proceedings of the VLDB Endowment
Hera-JVM: abstracting processor heterogeneity behind a virtual machine

HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
Tessellation: space-time partitioning in a manycore client OS

HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Corey: an operating system for many cores

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Design principles for end-to-end multicore schedulers

HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Hera-JVM: a runtime system for heterogeneous multi-core architectures

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Comparing scalability prediction strategies on an SMP of CMPs

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
TM2C: a software transactional memory for many-cores

Proceedings of the 7th ACM european conference on Computer Systems
BWS: balanced work stealing for time-sharing multicores

Proceedings of the 7th ACM european conference on Computer Systems
Server-based scheduling of parallel real-time tasks

Proceedings of the tenth ACM international conference on Embedded software
Efficiently combining parallel software using fine-grained, language-level, hierarchical resource management policies

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
A lightweight VMM on many core for high performance computing

Proceedings of the 9th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Tessellation: refactoring the OS around explicit resource containers with continuous adaptation

Proceedings of the 50th Annual Design Automation Conference
HARS: A hardware-assisted runtime software for embedded many-core architectures

ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hardware trends suggest that large-scale CMP architectures, with tens to hundreds of processing cores on a single piece of silicon, are iminent within the next decade. While existing CMP machines have traditionally been handled in the same way as SMPs, this magnitude of parallelism introduces several fundamental challenges at the architectural level and this, in turn, translates to novel challenges in the design of the software stack for these platforms. This paper presents the "Many Core Run Time" (McRT), a software prototype of an integrated language runtime that was designed to explore configurations of the software stack for enabling performance and scalability on large scale CMP platforms. This paper presents the architecture of McRT and discusses our experiences with the system, including experimental evaluation that lead to several interesting, non-intuitive findings, providing key insights about the structure of the system stack at this scale. A key contribution of this paper is to demonstrate how McRT enables near linear improvements in performance and scalability for desktop workloads such as the popular XviD encoder and a set of RMS (recognition, mining, and synthesis) applications. Another key contribution of this work is its use of McRT to explore non-traditional system configurations such as a light-weight executive in which McRT runs on "bare metal" and replaces the traditional OS. Such configurations are becoming an increasingly attractive alternative to leverage heterogeneous computing uints as seen in today's CPU-GPU configurations.