When slower is faster: on heterogeneous multicores for reliable systems

Authors:
Tomas Hruby;Herbert Bos;Andrew S. Tanenbaum
Affiliations:
The Network Institute, VU University Amsterdam;The Network Institute, VU University Amsterdam;The Network Institute, VU University Amsterdam
Venue:
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Year:
2013

Citing 19
Cited 0

An analysis of operating system behavior on a simultaneous multithreaded architecture

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance

Proceedings of the 31st annual international symposium on Computer architecture
Full TCP/IP for 8-bit architectures

Proceedings of the 1st international conference on Mobile systems, applications and services
Dynamic thread assignment on heterogeneous multiprocessor architectures

Proceedings of the 3rd conference on Computing frontiers
Core fusion: accommodating software diversity in chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Using Asymmetric Single-ISA CMPs to Save Energy on Operating Systems

IEEE Micro
Fast switching of threads between cores

ACM SIGOPS Operating Systems Review
Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency

Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency
Bias scheduling in heterogeneous multi-core architectures

Proceedings of the 5th European conference on Computer systems
We crashed, now what?

HotDep'10 Proceedings of the Sixth international conference on Hot topics in system dependability
FlexSC: flexible system call scheduling with exception-less system calls

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Efficient interaction between OS and architecture in heterogeneous platforms

ACM SIGOPS Operating Systems Review
Leveraging Core Specialization via OS Scheduling to Improve Performance on Asymmetric Multicore Systems

ACM Transactions on Computer Systems (TOCS)
Scheduling heterogeneous multi-cores through Performance Impact Estimation (PIE)

Proceedings of the 39th Annual International Symposium on Computer Architecture
Netmap: a novel framework for fast packet I/O

USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Keep net working - on a dependable and fast networking stack

DSN '12 Proceedings of the 2012 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
Safe and automatic live update for operating systems

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
MorphCore: An Energy-Efficient Microarchitecture for High Performance ILP and High Throughput TLP

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Breaking up the OS in many small components is attractive from a dependability point of view. If one of the components crashes or needs an update, we can replace it on the fly without taking down the system. The question is how to achieve this without sacrificing performance and without wasting resources unnecessarily. In this paper, we show that heterogeneous multicore architectures allow us to run OS code efficiently by executing each of the OS components on the most suitable core. Thus, components that require high single-thread performance run on (expensive) high-performance cores, while components that are less performance critical run on wimpy cores. Moreover, as current trends suggest that there will be no shortage of cores, we can give each component its own dedicated core when performance is of the essence, and consolidate multiple functions on a single core (saving power and resources) when performance is less critical for these components. Using frequency scaling to emulate different ×86 cores, we evaluate our design on the most demanding subsystem of our operating system--the network stack. We show that less is sometimes more and that we can deliver better throughput with slower and, likely, less power hungry cores. For instance, we support network processing at close to 10 Gbps (the maximum speed of our NIC), while using an average of just 60% of the core speeds. Moreover, even if we scale all the cores of the network stack down to as little as 200 MHz, we still achieve 1.8 Gbps, which may be enough for many applications.