Cache performance of operating system and multiprogramming workloads
ACM Transactions on Computer Systems (TOCS)
The effect of context switches on cache performance
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Scheduler activations: effective kernel support for the user-level management of parallelism
ACM Transactions on Computer Systems (TOCS)
The impact of operating system structure on memory system performance
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Lazy receiver processing (LRP): a network subsystem architecture for server systems
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Soft timers: efficient microsecond software timer support for network processing
ACM Transactions on Computer Systems (TOCS)
An analysis of operating system behavior on a simultaneous multithreaded architecture
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
SEDA: an architecture for well-conditioned, scalable internet services
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Understanding and improving operating system effects in control flow prediction
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Using Cohort-Scheduling to Enhance Server Performance
ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Xen and the art of virtualization
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Capriccio: scalable threads for internet services
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Computation spreading: employing hardware migration to specialize CMP cores on-the-fly
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Lazy asynchronous I/O for event-driven servers
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Why events are a bad idea (for high-concurrency servers)
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Cassyopia: compiler assisted system optimization
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Comparing the performance of web server architectures
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Factored operating systems (fos): the case for a scalable operating system for multicores
ACM SIGOPS Operating Systems Review
OS execution on multi-cores: is out-sourcing worthwhile?
ACM SIGOPS Operating Systems Review
Corey: an operating system for many cores
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Mind the gap: reconnecting architecture and OS research
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
The case for VOS: the vector operating system
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Multicore OSes: looking forward from 1991, er, 2011
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
SplitX: split guest/hypervisor execution on multi-core
WIOV'11 Proceedings of the 3rd conference on I/O virtualization
Exception-less system calls for event-driven servers
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
ICCSA'11 Proceedings of the 2011 international conference on Computational science and Its applications - Volume Part V
Improving per-node efficiency in the datacenter with new OS abstractions
Proceedings of the 2nd ACM Symposium on Cloud Computing
Improving network connection locality on multicore systems
Proceedings of the 7th ACM european conference on Computer Systems
Evaluating Dynamics and Bottlenecks of Memory Collaboration in Cluster Systems
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
A file I/O system for many-core based clusters
Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
Optimizing latency and throughput for spawning processes on massively multicore processors
Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
The power of batching in the Click modular router
Proceedings of the Asia-Pacific Workshop on Systems
Methodologies for generating HTTP streaming video workloads to evaluate web server performance
Proceedings of the 5th Annual International Systems and Storage Conference
The power of batching in the click modular router
APSys'12 Proceedings of the Third ACM SIGOPS Asia-Pacific conference on Systems
Who watches the watchmen? - protecting operating system reliability mechanisms
HotDep'12 Proceedings of the Eighth USENIX conference on Hot Topics in System Dependability
MegaPipe: a new programming interface for scalable network I/O
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
NIX: A Case for a Manycore System for Cloud Computing
Bell Labs Technical Journal
Using vector interfaces to deliver millions of IOPS from a networked key-value storage server
Proceedings of the Third ACM Symposium on Cloud Computing
GPUfs: integrating a file system with GPUs
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Linux block IO: introducing multi-queue SSD access on multi-core systems
Proceedings of the 6th International Systems and Storage Conference
Optimizing process creation and execution on multi-core architectures
International Journal of High Performance Computing Applications
Storage-class memory needs flexible interfaces
Proceedings of the 4th Asia-Pacific Workshop on Systems
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
VirtuOS: an operating system with kernel virtualization
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
DANBI: dynamic scheduling of irregular stream programs for many-core systems
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
NVM heaps for accelerating browser-based applications
Proceedings of the 1st Workshop on Interactions of NVM/FLASH with Operating Systems and Workloads
On the core affinity and file upload performance of Hadoop
DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
When slower is faster: on heterogeneous multicores for reliable systems
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Disengaged scheduling for fair, protected access to fast computational accelerators
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
GPUfs: Integrating a file system with GPUs
ACM Transactions on Computer Systems (TOCS)
Shrinking the hypervisor one subsystem at a time: a userspace packet switch for virtual machines
Proceedings of the 10th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
mTCP: a highly scalable user-level TCP stack for multicore systems
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.01 |
For the past 30+ years, system calls have been the de facto interface used by applications to request services from the operating system kernel. System calls have almost universally been implemented as a synchronous mechanism, where a special processor instruction is used to yield userspace execution to the kernel. In the first part of this paper, we evaluate the performance impact of traditional synchronous system calls on system intensive workloads. We show that synchronous system calls negatively affect performance in a significant way, primarily because of pipeline flushing and pollution of key processor structures (e.g., TLB, data and instruction caches, etc.). We propose a new mechanism for applications to request services from the operating system kernel: exception-less system calls. They improve processor efficiency by enabling flexibility in the scheduling of operating system work, which in turn can lead to significantly increased temporal and spacial locality of execution in both user and kernel space, thus reducing pollution effects on processor structures. Exception-less system calls are particularly effective on multicore processors. They primarily target highly threaded server applications, such as Web servers and database servers. We present FlexSC, an implementation of exceptionless system calls in the Linux kernel, and an accompanying user-mode thread package (FlexSC-Threads), binary compatible with POSIX threads, that translates legacy synchronous system calls into exception-less ones transparently to applications. We show how FlexSC improves performance of Apache by up to 116%, MySQL by up to 40%, and BIND by up to 105% while requiring no modifications to the applications.