The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Compiler-Directed Array Interleaving for Reducing Energy in Multi-Bank Memories
ASP-DAC '02 Proceedings of the 2002 Asia and South Pacific Design Automation Conference
The Alpha 21264 Microprocessor Architecture
ICCD '98 Proceedings of the International Conference on Computer Design
Static Energy Reduction Techniques for Microprocessor Caches
ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Static energy reduction techniques for microprocessor caches
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2001 international conference on computer design (ICCD)
Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Managing Wire Delay in Large Chip-Multiprocessor Caches
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Optimizing Replication, Communication, and Capacity Allocation in CMPs
Proceedings of the 32nd annual international symposium on Computer Architecture
Exploring the limits of leakage power reduction in caches
ACM Transactions on Architecture and Code Optimization (TACO)
Power reduction techniques for microprocessor systems
ACM Computing Surveys (CSUR)
Interconnect design considerations for large NUCA caches
Proceedings of the 34th annual international symposium on Computer architecture
A Domain-Specific On-Chip Network Design for Large Scale Cache Systems
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Quantitative analysis and optimization techniques for on-chip cache leakage power
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A NUCA Substrate for Flexible CMP Cache Sharing
IEEE Transactions on Parallel and Distributed Systems
LRU-PEA: a smart replacement policy for non-uniform cache architectures on chip multiprocessors
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
The auction: optimizing banks usage in Non-Uniform Cache Architectures
Proceedings of the 24th ACM International Conference on Supercomputing
International Journal of High Performance Systems Architecture
A power-efficient migration mechanism for D-NUCA caches
Proceedings of the Conference on Design, Automation and Test in Europe
Comparing last-level cache designs for CMP architectures
Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies
The migration prefetcher: Anticipating data promotion in dynamic NUCA caches
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Replacement techniques for dynamic NUCA cache designs on CMPs
The Journal of Supercomputing
Exploiting replication to improve performances of NUCA-based CMP systems
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Hi-index | 0.00 |
NUCA caches are large L2 on-chip cache memories characterized by multi-bank partitioning and designed to hide wire delay effects. They exhibit high hit rates while keeping access latency low. Proposed designs for such caches are Static NUCA, in which data are statically allocated to the cache banks, and Dynamic NUCA, in which data may reside in different banks, and a migration mechanism is introduced to better tolerate wire delay effects. The two architectures permit to achieve different performances by acting on architectural parameters and data management policies, at the cost of different balances between static and dynamic power consumption and energy dissipation. In this work, we propose preliminary results of the characterization of such balances, by presenting an evaluation of performance and energy consumption of conventional UCAs, and Static and Dynamic NUCA caches. All the considered caches architectures are equal sized and they are supposed to be used in an aggressive high frequency system running some applications from the SPEC CPU2000 and the NAS Parallel Benchmarks suites. The experimental results obtained indicate that, although the migration of data contributes to increase the dynamic energy consumption in Dynamic NUCA caches, the higher IPC achieved permits to save static energy, which dominates the power/energy balance in all the considered architectures. As a consequence, such results would designate NUCA caches as the most performing and energy saving architectures. Besides, according to the obtained results, future power improvements for NUCA caches should concentrate on static energy, while, for the dynamic energy, the on-chip network is the most critical element. Migration of data is acceptable, since it has a positive impact on performance, and the increased dynamic energy is overwhelmed by the static energy savings resulting from the shorter execution time. In order to give a general validity to such statements, we need to explore more design space points for each architecture (by varying the running clock rate and other design parameters) and to evaluate them considering a larger set of benchmarks.