Cache design of a sub-micron CMOS system/370
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Performance tradeoffs in cache design
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A Case for Direct-Mapped Caches
Computer
Inexpensive implementations of set-associativity
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Machine organization of the IBM RISC System/6000 processor
IBM Journal of Research and Development
Characterizing the Storage Process and Its Effect on the Update of Main Memory by Write Through
Journal of the ACM (JACM)
ACM Computing Surveys (CSUR)
Aspects of cache memory and instruction buffer performance
Aspects of cache memory and instruction buffer performance
IEEE Transactions on Software Engineering
The IBM 3090 system: an overview
IBM Systems Journal
Hi-index | 0.00 |
In most high performance computers the speeds of cache accessing are critical in determining the cycle times. A classical method for designing set-associative caches is to late-select array data based on the results of cache directory Iookups. The impact on the critical path timing due to late-select will become more significant in future microprocessors with very high clock frequencies. In this paper we propose a new approach to the optimization of array access timing for set-associative caches. The basic idea is to utilize a relatively small partial address directory (PAD) for fast and accurate approximations of cache access coordinates. The PAD can speed up most cache array access by accurately predicting cache locations without having to wait for results from conventional cache directory Iookups. Occasionally when the PAD guesses wrong, a memory access can be re-issued with only 1-cycle delays. The PAD may be closely integrated with the array design. The effectiveness of the PAD method is analyzed through combinatorial and simulation studies.