Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
Software caching and computation migration in Olden
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Exokernel: an operating system architecture for application-level resource management
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Operating system support for improving data locality on CC-NUMA compute servers
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Disco: running commodity operating systems on scalable multiprocessors
Proceedings of the sixteenth ACM symposium on Operating systems principles
Tornado: maximizing locality and concurrency in a shared memory multiprocessor operating system
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Cellular Disco: resource management using virtual clusters on shared-memory multiprocessors
Proceedings of the seventeenth ACM symposium on Operating systems principles
Dynamic computation migration in DSM systems
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Automatic measurement of memory hierarchy parameters
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Scalable locality-conscious multithreaded memory allocation
Proceedings of the 5th international symposium on Memory management
Scheduling threads for constructive cache sharing on CMPs
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Enabling scalability and performance in a large scale CMP environment
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Experience distributing objects in an SMMP OS
ACM Transactions on Computer Systems (TOCS)
Proceedings of the 44th annual Design Automation Conference
Evaluating MapReduce for Multi-core and Multiprocessor Systems
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Performance scalability of a multi-core web server
Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Factored operating systems (fos): the case for a scalable operating system for multicores
ACM SIGOPS Operating Systems Review
Multicore diversity: a software developer's nightmare
ACM SIGOPS Operating Systems Review
Sora: high performance software radio using general purpose multi-core processors
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
A scalable micro wireless interconnect structure for CMPs
Proceedings of the 15th annual international conference on Mobile computing and networking
The multikernel: a new OS architecture for scalable multicore systems
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Helios: heterogeneous multiprocessing with satellite kernels
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Proceedings of the VLDB Endowment
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Checking process-oriented operating system behaviour using CSP and refinement
ACM SIGOPS Operating Systems Review
Checking process-oriented operating system behaviour using CSP and refinement
Proceedings of the Fifth Workshop on Programming Languages and Operating Systems
Locating cache performance bottlenecks using data profiling
Proceedings of the 5th European conference on Computer systems
Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
An operating system for multicore and clouds: mechanisms and implementation
Proceedings of the 1st ACM symposium on Cloud computing
Challenges in operating-systems reengineering for many cores
Proceedings of the 3rd International Workshop on Multicore Software Engineering
PacketShader: a GPU-accelerated software router
Proceedings of the ACM SIGCOMM 2010 conference
Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Operating systems should provide transactions
HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
Your computer is already a distributed system. why isn't your OS?
HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
Tessellation: space-time partitioning in a manycore client OS
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
IsoStack: highly efficient network processing on dedicated cores
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Distributed systems meet economics: pricing in the cloud
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Sora: high-performance software radio using general-purpose multi-core processors
Communications of the ACM
Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
An analysis of Linux scalability to many cores
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
FlexSC: flexible system call scheduling with exception-less system calls
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Efficient system-enforced deterministic parallelism
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Database engines on multicores, why parallelize when you can distribute?
Proceedings of the sixth conference on Computer systems
A case for scaling applications to many-core with OS clustering
Proceedings of the sixth conference on Computer systems
SSLShader: cheap SSL acceleration with commodity processors
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Multicore OS benchmarks: we can do better
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Exploiting MISD performance opportunities in multi-core systems
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
The case for VOS: the vector operating system
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Comparison of lock thrashing avoidance methods and its performance implications for lock design
Proceedings of the third international workshop on Large-scale system and application performance
Quarantine: fault tolerance for concurrent servers with data-driven selective isolation
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Exception-less system calls for event-driven servers
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Task synchronization and allocation for many-core real-time systems
EMSOFT '11 Proceedings of the ninth ACM international conference on Embedded software
Improving per-node efficiency in the datacenter with new OS abstractions
Proceedings of the 2nd ACM Symposium on Cloud Computing
A latency simulator for many-core systems
Proceedings of the 44th Annual Simulation Symposium
A case for secure and scalable hypervisor using safe language
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Scalable address spaces using RCU balanced trees
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Efficient system-enforced deterministic parallelism
Communications of the ACM
A case for coordinated resource management in heterogeneous multicore platforms
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
An efficient scheduler of RTOS for multi/many-core system
Computers and Electrical Engineering
The case for elastic operating system services in fos
Proceedings of the 49th Annual Design Automation Conference
Delegation and nesting in best-effort hardware transactional memory
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
A file I/O system for many-core based clusters
Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
Toward scalable Web systems on multicore clusters: making use of virtual machines
The Journal of Supercomputing
For extreme parallelism, your OS is Sooooo last-millennium
HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Survey of scheduling techniques for addressing shared resources in multicore processors
ACM Computing Surveys (CSUR)
A real-time, energy-efficient system software suite for heterogeneous multicore platforms
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Supporting distributed execution of smartphone workloads on loosely coupled heterogeneous processors
HotPower'12 Proceedings of the 2012 USENIX conference on Power-Aware Computing and Systems
MegaPipe: a new programming interface for scalable network I/O
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
NUMA-aware graph mining techniques for performance and energy efficiency
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Chronos: predictable low latency for data center applications
Proceedings of the Third ACM Symposium on Cloud Computing
NetSlices: scalable multi-core packet processing in user-space
Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Tiled-MapReduce: Efficient and Flexible MapReduce Processing on Multicore with Tiling
ACM Transactions on Architecture and Code Optimization (TACO)
Traffic management: a holistic approach to memory placement on NUMA systems
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
A lightweight VMM on many core for high performance computing
Proceedings of the 9th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
RadixVM: scalable address spaces for multithreaded applications
Proceedings of the 8th ACM European Conference on Computer Systems
nuKernel: MicroKernel for multi-core DSP SoCs with load sharing and priority interrupts
Proceedings of the 28th Annual ACM Symposium on Applied Computing
ZSim: fast and accurate microarchitectural simulation of thousand-core systems
Proceedings of the 40th Annual International Symposium on Computer Architecture
Locality-aware task management for unstructured parallelism: a quantitative limit study
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Tessellation: refactoring the OS around explicit resource containers with continuous adaptation
Proceedings of the 50th Annual Design Automation Conference
New wine in old skins: the case for distributed operating systems in the data center
Proceedings of the 4th Asia-Pacific Workshop on Systems
Frontiers of Computer Science: Selected Publications from Chinese Universities
HARS: A hardware-assisted runtime software for embedded many-core architectures
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
The scalable commutativity rule: designing scalable software for multicore processors
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Everything you always wanted to know about synchronization but were afraid to ask
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
VirtuOS: an operating system with kernel virtualization
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
K2: a mobile operating system for heterogeneous coherence domains
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
A tool to analyze the performance of multithreaded programs on NUMA architectures
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
A virtualized separation kernel for mixed criticality systems
Proceedings of the 10th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
KV-Cache: A Scalable High-Performance Web-Object Cache for Manycore
UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing
MultiLanes: providing virtualized storage for OS-level virtualization on many cores
FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
mTCP: a highly scalable user-level TCP stack for multicore systems
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.03 |
Multiprocessor application performance can be limited by the operating system when the application uses the operating system frequently and the operating system services use data structures shared and modified by multiple processing cores. If the application does not need the sharing, then the operating system will become an unnecessary bottleneck to the application's performance. This paper argues that applications should control sharing: the kernel should arrange each data structure so that only a single processor need update it, unless directed otherwise by the application. Guided by this design principle, this paper proposes three operating system abstractions (address ranges, kernel cores, and shares) that allow applications to control inter-core sharing and to take advantage of the likely abundance of cores by dedicating cores to specific operating system functions. Measurements of microbenchmarks on the Corey prototype operating system, which embodies the new abstractions, show how control over sharing can improve performance. Application benchmarks, using MapReduce and a Web server, show that the improvements can be significant for overall performance: MapReduce on Corey performs 25% faster than on Linux when using 16 cores. Hardware event counters confirm that these improvements are due to avoiding operations that are expensive on multicore machines.