The effect of sharing on the cache and bus performance of parallel programs
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Evaluating the performance of four snooping cache coherency protocols
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
Quick and easy cache performance analysis
ACM SIGARCH Computer Architecture News
Cache performance of the integer SPEC benchmarks on a RISC
ACM SIGARCH Computer Architecture News
An analysis of the information content of address reference streams
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Evaluating Design Choices for Shared Bus Multiprocessors in a Throughput-Oriented Environment
IEEE Transactions on Computers
Eliminating the address translation bottleneck for physical address cache
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Efficient simulation of caches under optimal replacement with applications to miss characterization
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Optimizing dynamically-dispatched calls with run-time type feedback
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Architectural support for performance tuning: a case study on the SPARCcenter 2000
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Expected I-cache miss rates via the gap model
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Avoiding conflict misses dynamically in large direct-mapped caches
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Reconciling responsiveness with performance in pure object-oriented languages
ACM Transactions on Programming Languages and Systems (TOPLAS)
A quantitative analysis of loop nest locality
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Unix and the Am29000 Microprocessor
IEEE Micro
Branch Target Buffer Design and Optimization
IEEE Transactions on Computers
False Sharing and Spatial Locality in Multiprocessor Caches
IEEE Transactions on Computers
Efficient Stack Simulation for Set-Associative Virtual Address Caches With Real Tags
IEEE Transactions on Computers
Do Object-Oriented Languages Need Special Hardware Support?
ECOOP '95 Proceedings of the 9th European Conference on Object-Oriented Programming
Dynamic tracking of page miss ratio curve for memory management
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Algorithms for DMA communications
CompSysTech '04 Proceedings of the 5th international conference on Computer systems and technologies
Hi-index | 0.01 |
Techniques are developed in this dissertation to efficiently evaluate direct-mapped and set-associative caches for single-chip RISC microprocessors. This research is motivated in general by the importance of cache memories to computer performance, and more specifically by work done to design the caches in SPUR, a multiprocessor workstation designed at U. C. Berkeley. The studies focus not only on abstract measures of performance such as miss ratios, but also include, when appropriate, detailed implementation factors, such as access times and gate delays. The simulation algorithms developed compute miss ratios for numerous alternative caches with one pass through an address trace, provided all caches have the same block size, and use demand fetching and LRU replacement. One algorithm (forest simulation) simulates direct-mapped caches by relying on inclusion, a property that all larger caches contain a superset of the data in smaller caches. The other algorithm (all associativity simulation) simulates a broader class of direct-mapped and set-associative caches than could previously be studies with a one-pass algorithm, although somewhat less efficiently than forest simulation, since inclusion does not hold. The analysis of set-associative caches yields two major results. First, constant factors are obtained which relate the miss ratios for set-associative caches to miss ratios for other set-associative caches. Then those results are combined with sample cache implementations to show that above certain cache sizes, direct mapped caches have lower effective access times than set-associative caches, despite having higher miss ratios. Finally, instruction buffers and target instruction buffers are examined as organizations for instruction memory on single-chip microprocessors. The analysis focuses closely on implementation considerations, including the interaction between instruction fetches, instruction prefetches and data references, and uses the SPUR RISC design as the case study. Results show the effects of varying numerous design parameters, suggest some superior designs, and demonstrate that instruction buffers will be preferred to target instruction buffers in future RISC microprocessors implemented on single CMOS chips.