Comprehensive Hardware and Software Support for Operating Systems to Exploit MP Memory Hierarchies

Authors:
Chun Xia;Josep Torrellas
Affiliations:
BrightInfo;Univ. of Illinois, Urbana-Champaign
Venue:
IEEE Transactions on Computers
Year:
1999

Citing 17
Cited 2

Cache coherence protocols: evaluation using a multiprocessor simulation model

ACM Transactions on Computer Systems (TOCS)
Cache performance of operating system and multiprogramming workloads

ACM Transactions on Computer Systems (TOCS)
Achieving high instruction cache performance with an optimizing compiler

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Profile guided code positioning

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
The interaction of architecture and operating system design

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Characterizing the caching and synchronization performance of a multiprocessor operating system

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
The impact of operating system structure on memory system performance

SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Contrasting characteristics and cache performance of technical and multi-user commercial workloads

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Memory system performance of UNIX on CC-NUMA multiprocessors

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The impact of architectural trends on operating system performance

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Instruction fetching: coping with code bloat

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Instruction prefetching of systems codes with layout optimized for reduced cache misses

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The interaction of software prefetching with ILP processors in shared-memory systems

Proceedings of the 24th annual international symposium on Computer architecture
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Optimizing instruction cache performance for operating system intensive workloads

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Improving the Data Cache Performance of Multiprocessor Operating Systems

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
The Impact of Instruction-Level Parallelism on Multiprocessor Performance and Simulation Methodology

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture

Characterizing operating system activity in SPECjvm98 Benchmarks

Workload characterization of emerging computer applications
Run-time modeling and estimation of operating system power consumption

SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems

Quantified Score

Hi-index	14.98

Visualization

Abstract

High-performance multiprocessor workstations are becoming increasingly popular. Since many of the workloads running on these machines are operating-system intensive, we are interested in exploring the types of support for the operating system that the memory hierarchy of these machines should provide. In this paper, we evaluate a comprehensive set of hardware and software supports that minimize the performance losses for the operating system in a sophisticated cache hierarchy. These supports, selected from recent papers, are code layout optimization, guarded sequential instruction prefetching, instruction stream buffers, support for block operations, support for coherence activity, and software data prefetching. We evaluate these supports under a simulated environment. We show that they have a largely complementary impact and that, when combined, speed up the operating system by an average of 40 percent. Finally, a cost-performance comparison of these schemes suggests that the most cost-effective ones are code layout optimization and block operation support, while the least cost-effective one is software data prefetching.