Performance optimization of pipelined primary cache
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Evaluation of A+B=K Conditions Without Carry Propagation
IEEE Transactions on Computers
The SPARC architecture manual (version 9)
The SPARC architecture manual (version 9)
Shade: a fast instruction-set simulator for execution profiling
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Streamlining data cache access with fast address calculation
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ACM Computing Surveys (CSUR)
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Designing domain-specific processors
Proceedings of the ninth international symposium on Hardware/software codesign
Direct load: dependence-linked dataflow resolution of load address and cache coordinate
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Address-free memory access based on program syntax correlation of loads and stores
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2001 international conference on computer design (ICCD)
Reducing non-deterministic loads in low-power caches via early cache set resolution
Microprocessors & Microsystems
Hi-index | 0.00 |
Load latency contributes significantly to execution time. Because most cache accesses hit, cache-hit latency becomes an important component of expected load latency. Most modern microprocessors have base+offset addressing loads; thus effective cache-hit latency includes an addition as well as the RAM access.This paper introduces a new technique used in the UltraSPARC III microprocessor, Sum-Addressed Memory (SAM), which performs true addition using the decoder of the RAM array, with very low latency. We compare SAM with other methods for reducing the add part of load latency. These methods include sum-prediction with recovery, and bitwise indexing with duplicate-tolerance. The results demonstrate the superior performance of SAM.