Reinventing scheduling for multicore systems

Authors:
Silas Boyd-Wickizer;Robert Morris;M. Frans Kaashoek
Affiliations:
MIT;MIT;MIT
Venue:
HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
Year:
2009

Citing 12
Cited 10

Impact of sharing-based thread placement on multithreaded architectures

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Software caching and computation migration in Olden

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Studies of Windows NT performance using dynamic execution traces

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
The performance implications of locality information usage in shared-memory multiprocessors

Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Performance evaluation of the Orca shared-object system

ACM Transactions on Computer Systems (TOCS)
Dynamic computation migration in DSM systems

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
COOL: An Object-Based Language for Parallel Programming

Computer
Improving Processor and Cache Locality in Fine-Grain Parallel Computations using Object-Affinity Scheduling and Continuation Passing

Improving Processor and Cache Locality in Fine-Grain Parallel Computations using Object-Affinity Scheduling and Continuation Passing
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Scheduling threads for constructive cache sharing on CMPs

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Performance scalability of a multi-core web server

Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems

Locating cache performance bottlenecks using data profiling

Proceedings of the 5th European conference on Computer systems
Data marshaling for multi-core architectures

Proceedings of the 37th annual international symposium on Computer architecture
Building extensible networks with rule-based forwarding

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
FACT: a framework for adaptive contention-aware thread migrations

Proceedings of the 8th ACM International Conference on Computing Frontiers
Region scheduling: efficiently using the cache architectures via page-level affinity

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Affinity-aware DMA buffer management for reducing off-chip memory access

Proceedings of the 27th Annual ACM Symposium on Applied Computing
ADAPT: A framework for coscheduling multithreaded programs

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Fmeter: extracting indexable low-level system signatures by counting kernel function calls

Proceedings of the 13th International Middleware Conference
Model-based cache-aware dispatching of object-oriented software for multicore systems

Journal of Systems and Software
On modeling contention for shared caches in multi-core processors with techniques from ecology

Natural Computing: an international journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

High performance on multicore processors requires that schedulers be reinvented. Traditional schedulers focus on keeping execution units busy by assigning each core a thread to run. Schedulers ought to focus, however, on high utilization of on-chip memory, rather than of execution cores, to reduce the impact of expensive DRAM and remote cache accesses. A challenge in achieving good use of on-chip memory is that the memory is split up among the cores in the form of many small caches. This paper argues for a form of scheduling that assigns each object and its operations to a specific core, moving a thread among the cores as it uses different objects.