FlexSC: flexible system call scheduling with exception-less system calls

Authors:
Livio Soares;Michael Stumm
Affiliations:
University of Toronto;University of Toronto
Venue:
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Year:
2010

Citing 21
Cited 32

Cache performance of operating system and multiprogramming workloads

ACM Transactions on Computer Systems (TOCS)
The effect of context switches on cache performance

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Scheduler activations: effective kernel support for the user-level management of parallelism

ACM Transactions on Computer Systems (TOCS)
The impact of operating system structure on memory system performance

SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Lazy receiver processing (LRP): a network subsystem architecture for server systems

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Soft timers: efficient microsecond software timer support for network processing

ACM Transactions on Computer Systems (TOCS)
An analysis of operating system behavior on a simultaneous multithreaded architecture

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
SEDA: an architecture for well-conditioned, scalable internet services

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Understanding and improving operating system effects in control flow prediction

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Using Cohort-Scheduling to Enhance Server Performance

ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Xen and the art of virtualization

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Capriccio: scalable threads for internet services

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Computation spreading: employing hardware migration to specialize CMP cores on-the-fly

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Lazy asynchronous I/O for event-driven servers

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Why events are a bad idea (for high-concurrency servers)

HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Cassyopia: compiler assisted system optimization

HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Comparing the performance of web server architectures

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Using Asymmetric Single-ISA CMPs to Save Energy on Operating Systems

IEEE Micro
Factored operating systems (fos): the case for a scalable operating system for multicores

ACM SIGOPS Operating Systems Review
OS execution on multi-cores: is out-sourcing worthwhile?

ACM SIGOPS Operating Systems Review
Corey: an operating system for many cores

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation

Mind the gap: reconnecting architecture and OS research

HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
The case for VOS: the vector operating system

HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Multicore OSes: looking forward from 1991, er, 2011

HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
SplitX: split guest/hypervisor execution on multi-core

WIOV'11 Proceedings of the 3rd conference on I/O virtualization
Exception-less system calls for event-driven servers

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
MyUT: Design and implementation of efficient user-level thread management for improving cache utilization

ICCSA'11 Proceedings of the 2011 international conference on Computational science and Its applications - Volume Part V
Improving per-node efficiency in the datacenter with new OS abstractions

Proceedings of the 2nd ACM Symposium on Cloud Computing
Improving network connection locality on multicore systems

Proceedings of the 7th ACM european conference on Computer Systems
Evaluating Dynamics and Bottlenecks of Memory Collaboration in Cluster Systems

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
A file I/O system for many-core based clusters

Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
Optimizing latency and throughput for spawning processes on massively multicore processors

Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
The power of batching in the Click modular router

Proceedings of the Asia-Pacific Workshop on Systems
Methodologies for generating HTTP streaming video workloads to evaluate web server performance

Proceedings of the 5th Annual International Systems and Storage Conference
The power of batching in the click modular router

APSys'12 Proceedings of the Third ACM SIGOPS Asia-Pacific conference on Systems
Who watches the watchmen? - protecting operating system reliability mechanisms

HotDep'12 Proceedings of the Eighth USENIX conference on Hot Topics in System Dependability
MegaPipe: a new programming interface for scalable network I/O

OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
NIX: A Case for a Manycore System for Cloud Computing

Bell Labs Technical Journal
Using vector interfaces to deliver millions of IOPS from a networked key-value storage server

Proceedings of the Third ACM Symposium on Cloud Computing
GPUfs: integrating a file system with GPUs

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Linux block IO: introducing multi-queue SSD access on multi-core systems

Proceedings of the 6th International Systems and Storage Conference
Optimizing process creation and execution on multi-core architectures

International Journal of High Performance Computing Applications
Storage-class memory needs flexible interfaces

Proceedings of the 4th Asia-Pacific Workshop on Systems
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
VirtuOS: an operating system with kernel virtualization

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
DANBI: dynamic scheduling of irregular stream programs for many-core systems

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
NVM heaps for accelerating browser-based applications

Proceedings of the 1st Workshop on Interactions of NVM/FLASH with Operating Systems and Workloads
On the core affinity and file upload performance of Hadoop

DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
When slower is faster: on heterogeneous multicores for reliable systems

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Disengaged scheduling for fair, protected access to fast computational accelerators

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
GPUfs: Integrating a file system with GPUs

ACM Transactions on Computer Systems (TOCS)
Shrinking the hypervisor one subsystem at a time: a userspace packet switch for virtual machines

Proceedings of the 10th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
mTCP: a highly scalable user-level TCP stack for multicore systems

NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation

Quantified Score

Hi-index	0.01

Visualization

Abstract

For the past 30+ years, system calls have been the de facto interface used by applications to request services from the operating system kernel. System calls have almost universally been implemented as a synchronous mechanism, where a special processor instruction is used to yield userspace execution to the kernel. In the first part of this paper, we evaluate the performance impact of traditional synchronous system calls on system intensive workloads. We show that synchronous system calls negatively affect performance in a significant way, primarily because of pipeline flushing and pollution of key processor structures (e.g., TLB, data and instruction caches, etc.). We propose a new mechanism for applications to request services from the operating system kernel: exception-less system calls. They improve processor efficiency by enabling flexibility in the scheduling of operating system work, which in turn can lead to significantly increased temporal and spacial locality of execution in both user and kernel space, thus reducing pollution effects on processor structures. Exception-less system calls are particularly effective on multicore processors. They primarily target highly threaded server applications, such as Web servers and database servers. We present FlexSC, an implementation of exceptionless system calls in the Linux kernel, and an accompanying user-mode thread package (FlexSC-Threads), binary compatible with POSIX threads, that translates legacy synchronous system calls into exception-less ones transparently to applications. We show how FlexSC improves performance of Apache by up to 116%, MySQL by up to 40%, and BIND by up to 105% while requiring no modifications to the applications.