How not to lie with statistics: the correct way to summarize benchmark results
Communications of the ACM - The MIT Press scientific computation series
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Line (block) size choice for CPU cache memories
IEEE Transactions on Computers
Branch folding in the CRISP microprocessor: reducing branch delay to zero
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Series 32000 programmer's reference manual
Series 32000 programmer's reference manual
A Case for Direct-Mapped Caches
Computer
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Performance comparison of load/store and symmetric instruction set architectures
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ACM Computing Surveys (CSUR)
Executing compressed programs on an embedded RISC architecture
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Micro-operation cache: a power aware frontend for variable instruction length ISA
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
Hi-index | 0.00 |
A Decoded INstruction Cache (DINC) serves as a buffer between the instruction decoder and the other instruction-pipeline stages. In this paper we explain how techniques that reduce the branch penalty based on such a cache, can improve CPU performance. We analyze the impact of some of the design parameters of DINCs on variable instruction-length computers, e.g., CISC machines.Our study indicates that tuning the mapping function of the instructions into the cache, can improve the performance substantially. This tuning must be based on the instruction length distribution for a specific architecture. In addition, the associativity degree has a greater effect on the DINC's performance, than on the performance of regular caches. We also discuss the difference between the performance of DINCs and other caches, when longer cache lines are used. The results presented were obtained by both analytical study and trace-driven simulations of several integer UNIX applications.