On the use of registers vs. cache to minimize memory traffic
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Cache performance of operating system and multiprogramming workloads
ACM Transactions on Computer Systems (TOCS)
Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems
IEEE Transactions on Computers
Performance evaluation of on-chip register and cache organizations
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A simulation study of two-level caches
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Measuring VAX 8800 performance with a histogram hardware monitor
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Characteristics of performance-optimal multi-level cache hierarchies
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Improving performance of small on-chip instruction caches
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
The interaction of architecture and operating system design
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Implementing a cache for a high-performance GaAs microprocessor
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Using continuations to implement thread management and communication in operating systems
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Inside Windows NT
Characterizing the caching and synchronization performance of a multiprocessor operating system
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Analysis of multi-megabyte secondary CPU cache memories
Analysis of multi-megabyte secondary CPU cache memories
Design tradeoffs for software-managed TLBs
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Architectural support for translation table management in large address space machines
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Effectiveness of trace sampling for performance debugging tools
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The impact of operating system structure on memory system performance
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Kernel-based memory simulation (extended abstract)
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Multi-configuration simulation algorithms for the evaluation of computer architecture designs
Multi-configuration simulation algorithms for the evaluation of computer architecture designs
Performance of the VAX-11/780 translation buffer: simulation and measurement
ACM Transactions on Computer Systems (TOCS)
Proceedings of the Workshop on Micro-kernels and Other Kernel Architectures
The increasing irrelevance of IPC Performance for Micro-kernel-Based Operating Systems
Proceedings of the Workshop on Micro-kernels and Other Kernel Architectures
A Model and Prototype of VMS Using the Mach 3.0 Kernel
Proceedings of the Workshop on Micro-kernels and Other Kernel Architectures
The KeyKOS Nanokernel Architecture
Proceedings of the Workshop on Micro-kernels and Other Kernel Architectures
Data Movement in Kernelized Systems
Proceedings of the Workshop on Micro-kernels and Other Kernel Architectures
Proceedings of the Workshop on Micro-kernels and Other Kernel Architectures
Using cache memory to reduce processor-memory traffic
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Experimental evaluation of on-chip microprocessor cache memories
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Using the Mach Communication Primitives in X11
Using the Mach Communication Primitives in X11
Performance of a Software MPEG Video Decoder
Performance of a Software MPEG Video Decoder
Kernel-based memory simulation (extended abstract)
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Trap-driven simulation with Tapeworm II
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Surpassing the TLB performance of superpages with less operating system support
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Instruction fetching: coping with code bloat
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
CAT—caching address tags: a technique for reducing area cost of on-chip caches
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Instruction prefetching of systems codes with layout optimized for reduced cache misses
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Trap-driven memory simulation with Tapeworm II
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Trace-driven memory simulation: a survey
ACM Computing Surveys (CSUR)
Minimizing Area Cost of On-Chip Cache Memories by Caching Address Tags
IEEE Transactions on Computers
A look at several memory management units, TLB-refill mechanisms, and page table organizations
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Optimizing the Instruction Cache Performance of the Operating System
IEEE Transactions on Computers
Tradeoffs in the Design of Single Chip Multiprocessors
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Trace-Driven Memory Simulation: A Survey
Performance Evaluation: Origins and Directions
Systematic objective-driven computer architecture optimization
ARVLSI '95 Proceedings of the 16th Conference on Advanced Research in VLSI (ARVLSI'95)
Optimizing instruction cache performance for operating system intensive workloads
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Software prefetching and caching for translation lookaside buffers
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Hi-index | 0.01 |
The allocation of die area to different processor components is a central issue in the design of single-chip microprocessors. Chip area is occupied by both core execution logic, such as ALU and FPU datapaths, and memory structures, such as caches, TLBs, and write buffers. This work focuses on the allocation of die area to memory structures through a cost/benefit analysis. The cost of memory structures with different sizes and associativities is estimated by using an established area model for on-chip memory. The performance benefits of selecting a given structure are measured through a collection of methods including on-the-fly hardware monitoring, trace-driven simulation and kernel-based analysis. Special consideration is given to operating systems that support multiple application programming interfaces (APIs), a software trend that substantially affects on-chip memory allocation decisions.Results: Small adjustments in cache and TLB design parameters can significantly impact overall performance. Operating systems that support multiple APIs, such as Mach 3.0, increase the relative importance of on-chip instruction caches and TLBs when compared against single-APl systems such as Ultrix.