Improving server performance on multi-cores via selective off-loading of OS functionality

Authors:
David Nellans;Kshitij Sudan;Erik Brunvand;Rajeev Balasubramonian
Affiliations:
School of Computing, University of Utah;School of Computing, University of Utah;School of Computing, University of Utah;School of Computing, University of Utah
Venue:
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Year:
2010

Citing 19
Cited 3

Cache performance of operating system and multiprogramming workloads

ACM Transactions on Computer Systems (TOCS)
The interaction of architecture and operating system design

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The SPARC architecture manual (version 9)

The SPARC architecture manual (version 9)
An analysis of dynamic branch prediction schemes on system workloads

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
An analysis of operating system behavior on a simultaneous multithreaded architecture

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Understanding and improving operating system effects in control flow prediction

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Simics: A Full System Simulation Platform

Computer
Dynamically managing the communication-parallelism trade-off in future clustered processors

Proceedings of the 30th annual international symposium on Computer architecture
Computation spreading: employing hardware migration to specialize CMP cores on-the-fly

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
SPEC CPU2006 benchmark descriptions

ACM SIGARCH Computer Architecture News
Singularity: rethinking the software stack

ACM SIGOPS Operating Systems Review - Systems work at Microsoft Research
BioBench: A Benchmark Suite of Bioinformatics Applications

ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
The shared-thread multiprocessor

Proceedings of the 22nd annual international conference on Supercomputing
Using Asymmetric Single-ISA CMPs to Save Energy on Operating Systems

IEEE Micro
Fast switching of threads between cores

ACM SIGOPS Operating Systems Review
OS execution on multi-cores: is out-sourcing worthwhile?

ACM SIGOPS Operating Systems Review
The multikernel: a new OS architecture for scalable multicore systems

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines

The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines

SplitX: split guest/hypervisor execution on multi-core

WIOV'11 Proceedings of the 3rd conference on I/O virtualization
Exception-less system calls for event-driven servers

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
A file I/O system for many-core based clusters

Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern and future server-class processors will incorporate many cores. Some studies have suggested that it may be worthwhile to dedicate some of the many cores for specific tasks such as operating system execution. OS off-loading has two main benefits: improved performance due to better cache utilization and improved power efficiency due to smarter use of heterogeneous cores. However, OS off-loading is a complex process that involves balancing the overheads of off-loading against the potential benefit, which is unknown while making the off-loading decision. In prior work, OS off-loading has been implemented by first profiling system call behavior and then manually instrumenting some OS routines (out of hundreds) to support off-loading. We propose a hardware-based mechanism to help automate the off-load decision-making process, and provide high quality dynamic decisions via performance feedback. Our mechanism dynamically estimates the off-load requirements of the application and relies on a run-length predictor for the upcoming OS system call invocation. The resulting hardware based off-loading policy yields a throughput improvement of up to 18% over a baseline without off-loading, 13% over a static software based policy, and 23% over a dynamic software based policy.