IEEE Transactions on Computers
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Orion: a power-performance simulator for interconnection networks
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Dynamic Voltage Scaling with Links for Power Optimization of Interconnection Networks
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Power-driven Design of Router Microarchitectures in On-chip Networks
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Interconnect-power dissipation in a microprocessor
Proceedings of the 2004 international workshop on System level interconnect prediction
Microarchitectural Wire Management for Performance and Power in Partitioned Architectures
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors
Proceedings of the 32nd annual international symposium on Computer Architecture
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling
Proceedings of the 32nd annual international symposium on Computer Architecture
Organizing the Last Line of Defense before Hitting the Memory Wall for CMPs
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Interconnect-Aware Coherence Protocols for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Integrating complete-system and user-level performance/power simulators: the SimWattch approach
ISPASS '03 Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software
The M5 Simulator: Modeling Networked Systems
IEEE Micro
Design space exploration for multicore architectures: a power/performance/thermal view
Proceedings of the 20th annual international conference on Supercomputing
AINAW '07 Proceedings of the 21st International Conference on Advanced Information Networking and Applications Workshops - Volume 01
Journal of Systems Architecture: the EUROMICRO Journal
A high efficient on-chip interconnection network in SIMD CMPs
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Silicon-aware distributed switch architecture for on-chip networks
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 0.01 |
Continuous improvements in integration scale have made possible the inclusion of several processor cores on the same chip. Such designs have been named chip-multiprocessors (or CMPs) and constitute a good alternative to traditional monolithic designs for several reasons, among others, better levels of performance, scalability, and performance/energy ratio. On the other hand, higher clock frequencies and increasing number of transistors available on a single chip have revealed energy consumption as a critical design issue in current and future microarchitectures. In these architectures, the design of the on-chip interconnection network has proven to have significant impact on overall system performance and energy consumption, and that the wires used in such interconnect can be designed with varying latency, bandwidth, and power characteristics.In this work, we present a detailed characterization of the energy-efficiency of a CMP for parallel scientific applications using Sim-PowerCMP, a detailed architectural-level power-performance simulation tool for CMP architectures that integrates several well-known contemporary simulators (RSIM, Hot Leakage and Orion) into a single framework that allows precise analysis and optimization of power dissipation (both dynamic and static) taking into account performance. In this characterization, we pay special attention to the energy consumed on the interconnection network. Results for an 8- and 16-core CMP show that the most power consuming messages are the replies that carry data (almost 70% on average of the total energy consumed in the interconnect) although they represent 30% of the total number of messages. Furthermore, we show that using on-chip wires with varying latency, bandwidth, and energy characteristics can reduce the energy dissipated by the links of the interconnection network about 65% with an average impact of 10% in the execution time.