Optimal allocation of on-chip memory for multiple-API operating systems

Authors:
D. Nagle;R. Uhlig;T. Mudge;S. Sechrest
Affiliations:
Department of Electrical Engineering and Computer Science, University of Michigan;Department of Electrical Engineering and Computer Science, University of Michigan;Department of Electrical Engineering and Computer Science, University of Michigan;Department of Electrical Engineering and Computer Science, University of Michigan
Venue:
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Year:
1994

Citing 32
Cited 17

On the use of registers vs. cache to minimize memory traffic

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Cache performance of operating system and multiprogramming workloads

ACM Transactions on Computer Systems (TOCS)
Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems

IEEE Transactions on Computers
Performance evaluation of on-chip register and cache organizations

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A simulation study of two-level caches

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Measuring VAX 8800 performance with a histogram hardware monitor

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Characteristics of performance-optimal multi-level cache hierarchies

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Improving performance of small on-chip instruction caches

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
The interaction of architecture and operating system design

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Implementing a cache for a high-performance GaAs microprocessor

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Using continuations to implement thread management and communication in operating systems

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Inside Windows NT

Inside Windows NT
Characterizing the caching and synchronization performance of a multiprocessor operating system

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Analysis of multi-megabyte secondary CPU cache memories

Analysis of multi-megabyte secondary CPU cache memories
Design tradeoffs for software-managed TLBs

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Architectural support for translation table management in large address space machines

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Effectiveness of trace sampling for performance debugging tools

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The impact of operating system structure on memory system performance

SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Kernel-based memory simulation (extended abstract)

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Multi-configuration simulation algorithms for the evaluation of computer architecture designs

Multi-configuration simulation algorithms for the evaluation of computer architecture designs
Performance of the VAX-11/780 translation buffer: simulation and measurement

ACM Transactions on Computer Systems (TOCS)
Performance Trade-Offs for Microprocessor Cache Memories

IEEE Micro
Mach

Proceedings of the Workshop on Micro-kernels and Other Kernel Architectures
The increasing irrelevance of IPC Performance for Micro-kernel-Based Operating Systems

Proceedings of the Workshop on Micro-kernels and Other Kernel Architectures
A Model and Prototype of VMS Using the Mach 3.0 Kernel

Proceedings of the Workshop on Micro-kernels and Other Kernel Architectures
The KeyKOS Nanokernel Architecture

Proceedings of the Workshop on Micro-kernels and Other Kernel Architectures
Data Movement in Kernelized Systems

Proceedings of the Workshop on Micro-kernels and Other Kernel Architectures
Chorus

Proceedings of the Workshop on Micro-kernels and Other Kernel Architectures
Using cache memory to reduce processor-memory traffic

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Experimental evaluation of on-chip microprocessor cache memories

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Using the Mach Communication Primitives in X11

Using the Mach Communication Primitives in X11
Performance of a Software MPEG Video Decoder

Performance of a Software MPEG Video Decoder

Kernel-based memory simulation (extended abstract)

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Trap-driven simulation with Tapeworm II

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Surpassing the TLB performance of superpages with less operating system support

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
On micro-kernel construction

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Instruction fetching: coping with code bloat

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
CAT—caching address tags: a technique for reducing area cost of on-chip caches

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Instruction prefetching of systems codes with layout optimized for reduced cache misses

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Trap-driven memory simulation with Tapeworm II

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Trace-driven memory simulation: a survey

ACM Computing Surveys (CSUR)
Minimizing Area Cost of On-Chip Cache Memories by Caching Address Tags

IEEE Transactions on Computers
A look at several memory management units, TLB-refill mechanisms, and page table organizations

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Optimizing the Instruction Cache Performance of the Operating System

IEEE Transactions on Computers
Tradeoffs in the Design of Single Chip Multiprocessors

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Trace-Driven Memory Simulation: A Survey

Performance Evaluation: Origins and Directions
Systematic objective-driven computer architecture optimization

ARVLSI '95 Proceedings of the 16th Conference on Advanced Research in VLSI (ARVLSI'95)
Optimizing instruction cache performance for operating system intensive workloads

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Software prefetching and caching for translation lookaside buffers

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation

Quantified Score

Hi-index	0.01

Visualization

Abstract

The allocation of die area to different processor components is a central issue in the design of single-chip microprocessors. Chip area is occupied by both core execution logic, such as ALU and FPU datapaths, and memory structures, such as caches, TLBs, and write buffers. This work focuses on the allocation of die area to memory structures through a cost/benefit analysis. The cost of memory structures with different sizes and associativities is estimated by using an established area model for on-chip memory. The performance benefits of selecting a given structure are measured through a collection of methods including on-the-fly hardware monitoring, trace-driven simulation and kernel-based analysis. Special consideration is given to operating systems that support multiple application programming interfaces (APIs), a software trend that substantially affects on-chip memory allocation decisions.Results: Small adjustments in cache and TLB design parameters can significantly impact overall performance. Operating systems that support multiple APIs, such as Mach 3.0, increase the relative importance of on-chip instruction caches and TLBs when compared against single-APl systems such as Ultrix.