The filter cache: an energy efficient memory structure
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Optimizing the Instruction Cache Performance of the Operating System
IEEE Transactions on Computers
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
NetBench: a benchmarking suite for network processors
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Performance Tradeoffs in Multithreaded Processors
IEEE Transactions on Parallel and Distributed Systems
HiPC '99 Proceedings of the 6th International Conference on High Performance Computing
Computer Architecture: A Quantitative Approach
Computer Architecture: A Quantitative Approach
Conjoined-Core Chip Multiprocessing
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Managing Wire Delay in Large Chip-Multiprocessor Caches
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
A NUCA substrate for flexible CMP cache sharing
Proceedings of the 19th annual international conference on Supercomputing
Maximizing CMP Throughput with Mediocre Cores
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Performance Models for Network Processor Design
IEEE Transactions on Parallel and Distributed Systems
CommBench-a telecommunications benchmark for network processors
ISPASS '00 Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software
Visions for application development on hybrid computing systems
Parallel Computing
Providing performance guarantees in multipass network processors
IEEE/ACM Transactions on Networking (TON)
Hi-index | 0.00 |
This paper proposes the use of very small instruction caches, called micro-caches (μ-caches), consisting of tens to hundreds of bytes, at the bottom of the instruction delivery hierarchy in chip-multiprocessors (CMP). Multi-core architectures place a novel emphasis on the performance/area efficiency of processor cores, and we note that traditional instruction cache sizes reflect an emphasis on hit-rate performance rather than efficiency. In brief, ¼-caches reduce the area footprint of individual cores, thus allowing additional cores to fit within a given die area. We use commercial design tools and a commercial processor core to evaluate this tradeoff in the context of high-performance networking, where CMP architectures have had their greatest commercial impact to date. Our results suggest that the use of u-caches can yield a 25% improvement in efficiency relative to traditional hierarchies. In our evaluation, we consider a range of architectural options (cluster organization, non-blocking caches, cache parameters) and justify our conclusions while accounting for the errors inherent in die area estimates.