Route packets, not wires: on-chip inteconnection networks
Proceedings of the 38th annual Design Automation Conference
Symbiotic jobscheduling for a simultaneous multithreaded processor
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Orion: a power-performance simulator for interconnection networks
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
A Delay Model and Speculative Architecture for Pipelined Routers
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Design space exploration for 3D architectures
ACM Journal on Emerging Technologies in Computing Systems (JETC)
PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Die Stacking (3D) Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 45th annual Design Automation Conference
Architecting phase change memory as a scalable dram alternative
Proceedings of the 36th annual international symposium on Computer architecture
Scalable high performance main memory system using phase-change memory technology
Proceedings of the 36th annual international symposium on Computer architecture
Enhancing lifetime and security of PCM-based main memory with start-gap wear leveling
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Energy reduction for STT-RAM using early write termination
Proceedings of the 2009 International Conference on Computer-Aided Design
Resistive computation: avoiding the power wall with low-leakage, STT-MRAM based computing
Proceedings of the 37th annual international symposium on Computer architecture
Energy- and endurance-aware design of phase change memory caches
Proceedings of the Conference on Design, Automation and Test in Europe
Modeling, Architecture, and Applications for Emerging Memory Technologies
IEEE Design & Test
Proceedings of the 49th Annual Design Automation Conference
Design space exploration of workload-specific last-level caches
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
On-chip caches built on multilevel spin-transfer torque RAM cells and its optimizations
ACM Journal on Emerging Technologies in Computing Systems (JETC) - Special issue on memory technologies
SPaC: a segment-based parallel compression for backup acceleration in nonvolatile processors
Proceedings of the Conference on Design, Automation and Test in Europe
A network congestion-aware memory subsystem for manycore
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Wireless Health Systems, On-Chip and Off-Chip Network Architectures
Optimal placement of vertical connections in 3D Network-on-Chip
Journal of Systems Architecture: the EUROMICRO Journal
DHeating: dispersed heating repair for self-healing NAND flash memory
Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
Hi-index | 0.00 |
Emerging memory technologies such as STT-RAM, PCRAM, and resistive RAM are being explored as potential replacements to existing on-chip caches or main memories for future multi-core architectures. This is due to the many attractive features these memory technologies posses: high density, low leakage, and non-volatility. However, the latency and energy overhead associated with the write operations of these emerging memories has become a major obstacle in their adoption. Previous works have proposed various circuit and architectural level solutions to mitigate the write overhead. In this paper, we study the integration of STT-RAM in a 3D multi-core environment and propose solutions at the on-chip network level to circumvent the write overhead problem in the cache architecture with STT-RAM technology. Our scheme is based on the observation that instead of staggering requests to a write-busy STT-RAM bank, the network should schedule requests to other idle cache banks for effectively hiding the latency. Thus, we prioritize cache accesses to the idle banks by delaying accesses to the STT-RAM cache banks that are currently serving long latency write requests. Through a detailed characterization of the cache access patterns of 42 applications, we propose an efficient mechanism to facilitate such delayed writes to cache banks by (a) accurately estimating the busy time of each cache bank through logical partitioning of the cache layer and (b) prioritizing packets in a router requesting accesses to idle banks. Evaluations on a 3D architecture, consisting of 64 cores and 64 STT-RAM cache banks, show that our proposed approach provides 14% average IPC improvement for multi-threaded benchmarks, 19% instruction throughput benefits for multi-programmed workloads, and 6% latency reduction compared to a recently proposed write buffering mechanism.