Scalability-based manycore partitioning

Authors:
Hiroshi Sasaki;Teruo Tanimoto;Koji Inoue;Hiroshi Nakamura
Affiliations:
Kyushu University, Fukuoka, Japan;Fujitsu Laboratories Limited, Kawasaki, Japan;Kyushu University, Fukuoka, Japan;The University of Tokyo, Tokyo, Japan
Venue:
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Year:
2012

Citing 25
Cited 4

Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs

IEEE Micro
Comparing Gang Scheduling with Dynamic Space Sharing on Symmetric Multiprocessors Using Automatic Self-Allocating Threads (ASAT)

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Exploring the Design Space of Future CMPs

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture

Proceedings of the 30th annual international symposium on Computer architecture
Performance-Driven Processor Allocation

IEEE Transactions on Parallel and Distributed Systems
User-guided symbiotic space-sharing of real workloads

Proceedings of the 20th annual international conference on Supercomputing
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Intel® threading building blocks

Journal of Computing Sciences in Colleges
PAM: a novel performance/power aware meta-scheduler for multi-core systems

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
System-Level Performance Metrics for Multiprogram Workloads

IEEE Micro
Using OS Observations to Improve Performance in Multicore Systems

IEEE Micro
Validity of the single processor approach to achieving large scale computing capabilities

AFIPS '67 (Spring) Proceedings of the April 18-20, 1967, spring joint computer conference
Addressing shared resource contention in multicore processors via scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
An approach to resource-aware co-scheduling for CMPs

Proceedings of the 24th ACM International Conference on Supercomputing
Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications

Proceedings of the 37th annual international symposium on Computer architecture
AKULA: a toolset for experimenting and developing thread placement algorithms on multicore systems

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Contention-Aware Scheduling on Multicore Systems

ACM Transactions on Computer Systems (TOCS)
Memory system performance in a NUMA multicore multiprocessor

Proceedings of the 4th Annual International Conference on Systems and Storage
Memory management in NUMA multicore systems: trapped between cache contention and interconnect overhead

Proceedings of the international symposium on Memory management
A case for NUMA-aware contention management on multicore systems

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
FACT: a framework for adaptive contention-aware thread migrations

Proceedings of the 8th ACM International Conference on Computing Frontiers
Benchmarking modern multiprocessors

Benchmarking modern multiprocessors
Pack & Cap: adaptive DVFS and thread packing under power caps

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

Holistic run-time parallelism management for time and energy efficiency

Proceedings of the 27th international ACM conference on International conference on supercomputing
The autonomic operating system research project: achievements and future directions

Proceedings of the 50th Annual Design Automation Conference
Coordinated power-performance optimization in manycores

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
A performance-aware quality of service-driven scheduler for multicore processors

ACM SIGBED Review - Special Issue on the 3rd Embedded Operating System Workshop (EWiLi 2013)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multicore processors have been popular for years, and the industry is gradually shifting towards the era of manycore processors. Single-thread performance of microprocessors is not growing at a historical rate, but the existence of a number of active processes in the computer system and the continuing development of multi-threaded applications benefit from the growing core counts to sustain system throughput. This trend brings us a situation where a number of parallel applications simultaneously being executed on a single system. Since multi-threaded applications try to maximize its throughput by utilizing the whole system, each of them usually create equal or larger number of threads compared to underlying logical core counts. This introduces much greater number of threads to be co-scheduled in the entire system. However, each program has different characteristics (or scalability) and contends for shared resources, which are the CPU cores and memory hierarchies, with each other. Therefore, it is clear that OS thread scheduling will play a major role in achieving high system performance under such conditions. We develop a sophisticated scheduler that (1) dynamically predicts the scalability of programs via the use of hardware performance monitoring units, (2) decides the optimal number of cores to be allocated for each program, and (3) allocates the cores to programs while maximizing the system utilization to achieve fair and maximum performance. The evaluation results on a 48-core AMD Opteron system show improvements over the Linux scheduler for a variety of multiprogramming workloads.