The duality of memory and communication in the implementation of a multiprocessor operating system
SOSP '87 Proceedings of the eleventh ACM Symposium on Operating systems principles
Simple but effective techniques for NUMA memory management
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Lightweight remote procedure call
ACM Transactions on Computer Systems (TOCS)
ACM Transactions on Programming Languages and Systems (TOPLAS)
Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Scheduler activations: effective kernel support for the user-level management of parallelism
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
The robustness of NUMA memory management
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
UNIX systems for modern architectures: symmetric multiprocessing and caching for kernel programmers
UNIX systems for modern architectures: symmetric multiprocessing and caching for kernel programmers
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Operating system support for improving data locality on CC-NUMA compute servers
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Empirical studies of competitve spinning for a shared-memory multiprocessor
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Linux Journal
Performance issues in parallelized network protocols
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
OverCite: a distributed, cooperative citeseer
NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Experience distributing objects in an SMMP OS
ACM Transactions on Computer Systems (TOCS)
SNZI: scalable NonZero indicators
Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Evaluating MapReduce for Multi-core and Multiprocessor Systems
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Performance scalability of a multi-core web server
Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Communications of the ACM - Remembering Jim Gray
Scalability Evaluation and Optimization of Multi-Core SIP Proxy Server
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Performance Studies of Commercial Workloads on a Multi-core System
IISWC '07 Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization
Factored operating systems (fos): the case for a scalable operating system for multicores
ACM SIGOPS Operating Systems Review
A view of the parallel computing landscape
Communications of the ACM - A View of Parallel Computing
RouteBricks: exploiting parallelism to scale software routers
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
The multikernel: a new OS architecture for scalable multicore systems
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Locating cache performance bottlenecks using data profiling
Proceedings of the 5th European conference on Computer systems
Corey: an operating system for many cores
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Database engines on multicores, why parallelize when you can distribute?
Proceedings of the sixth conference on Computer systems
A case for scaling applications to many-core with OS clustering
Proceedings of the sixth conference on Computer systems
Linux kernel co-scheduling for bulk synchronous parallel applications
Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Multicore OS benchmarks: we can do better
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Exploiting MISD performance opportunities in multi-core systems
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
The case for VOS: the vector operating system
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Comparison of lock thrashing avoidance methods and its performance implications for lock design
Proceedings of the third international workshop on Large-scale system and application performance
Quarantine: fault tolerance for concurrent servers with data-driven selective isolation
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Improving per-node efficiency in the datacenter with new OS abstractions
Proceedings of the 2nd ACM Symposium on Cloud Computing
Assessing the scalability of garbage collectors on many cores
PLOS '11 Proceedings of the 6th Workshop on Programming Languages and Operating Systems
A latency simulator for many-core systems
Proceedings of the 44th Annual Simulation Symposium
Thread Tranquilizer: Dynamically reducing performance variation
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Assessing the scalability of garbage collectors on many cores
ACM SIGOPS Operating Systems Review
A case for secure and scalable hypervisor using safe language
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Scalable address spaces using RCU balanced trees
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Improving network connection locality on multicore systems
Proceedings of the 7th ACM european conference on Computer Systems
Why on-chip cache coherence is here to stay
Communications of the ACM
Toward predictable performance in software packet-processing platforms
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Design and implementation of a consolidated middlebox architecture
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Delegation and nesting in best-effort hardware transactional memory
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
A file I/O system for many-core based clusters
Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
Optimizing latency and throughput for spawning processes on massively multicore processors
Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
A parallel page cache: IOPS and caching for multicore systems
HotStorage'12 Proceedings of the 4th USENIX conference on Hot Topics in Storage and File Systems
MemProf: a memory profiler for NUMA multicore systems
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Journal of Experimental Algorithmics (JEA)
Comparing high-performance multi-core web-server architectures
Proceedings of the 5th Annual International Systems and Storage Conference
Methodologies for generating HTTP streaming video workloads to evaluate web server performance
Proceedings of the 5th Annual International Systems and Storage Conference
MegaPipe: a new programming interface for scalable network I/O
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
NIX: A Case for a Manycore System for Cloud Computing
Bell Labs Technical Journal
Chronos: predictable low latency for data center applications
Proceedings of the Third ACM Symposium on Cloud Computing
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Beyond expert-only parallel programming?
Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability
Parallelizing live migration of virtual machines
Proceedings of the 9th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
A lightweight VMM on many core for high performance computing
Proceedings of the 9th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Holistic run-time parallelism management for time and energy efficiency
Proceedings of the 27th international ACM conference on International conference on supercomputing
A scalable lock manager for multicores
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Proceedings of the 8th ACM European Conference on Computer Systems
RadixVM: scalable address spaces for multithreaded applications
Proceedings of the 8th ACM European Conference on Computer Systems
Improving the scalability of a multi-core web server
Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Uncovering CPU load balancing policies with harmony
Proceedings of the ACM International Conference on Computing Frontiers
Linux block IO: introducing multi-queue SSD access on multi-core systems
Proceedings of the 6th International Systems and Storage Conference
Mercury: bringing efficiency to key-value stores
Proceedings of the 6th International Systems and Storage Conference
Escape capsule: explicit state is robust and scalable
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Rethinking network stack design with memory snapshots
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Optimizing process creation and execution on multi-core architectures
International Journal of High Performance Computing Applications
Proceedings of the 4th Asia-Pacific Workshop on Systems
Toward millions of file system IOPS on low-cost, commodity hardware
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
HARS: A hardware-assisted runtime software for embedded many-core architectures
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
The scalable commutativity rule: designing scalable software for multicore processors
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Everything you always wanted to know about synchronization but were afraid to ask
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Small is better: avoiding latency traps in virtualized data centers
Proceedings of the 4th annual Symposium on Cloud Computing
DANBI: dynamic scheduling of irregular stream programs for many-core systems
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Cloud-aware processing of MapReduce-based OLAP applications
AusPDC '13 Proceedings of the Eleventh Australasian Symposium on Parallel and Distributed Computing - Volume 140
Towards a scalable microkernel personality for multicore processors
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
VSwapper: a memory swapper for virtualized environments
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
PREDATOR: predictive false sharing detection
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Performance evaluation of View-Oriented Transactional Memory
Parallel Computing
MultiLanes: providing virtualized storage for OS-level virtualization on many cores
FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
Hi-index | 0.02 |
This paper analyzes the scalability of seven system applications (Exim, memcached, Apache, PostgreSQL, gmake, Psearchy, and MapReduce) running on Linux on a 48- core computer. Except for gmake, all applications trigger scalability bottlenecks inside a recent Linux kernel. Using mostly standard parallel programming techniques-- this paper introduces one new technique, sloppy counters-- these bottlenecks can be removed from the kernel or avoided by changing the applications slightly. Modifying the kernel required in total 3002 lines of code changes. A speculative conclusion from this analysis is that there is no scalability reason to give up on traditional operating system organizations just yet.