Eager writeback - a technique for improving bandwidth utilization
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Symbiotic jobscheduling with priorities for a simultaneous multithreading processor
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Architecting phase change memory as a scalable dram alternative
Proceedings of the 36th annual international symposium on Computer architecture
A durable and energy efficient main memory using phase change memory technology
Proceedings of the 36th annual international symposium on Computer architecture
Scalable high performance main memory system using phase-change memory technology
Proceedings of the 36th annual international symposium on Computer architecture
Enhancing lifetime and security of PCM-based main memory with start-gap wear leveling
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
The virtual write queue: coordinating DRAM and last-level cache policies
Proceedings of the 37th annual international symposium on Computer architecture
Use ECP, not ECC, for hard failures in resistive memories
Proceedings of the 37th annual international symposium on Computer architecture
Proceedings of the 37th annual international symposium on Computer architecture
Practical and secure PCM systems by online detection of malicious write streams
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Delta-compressed caching for overcoming the write bandwidth limitation of hybrid main memory
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Compiler directed write-mode selection for high performance low power volatile PCM
Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Phase change memory in enterprise storage systems: silver bullet or snake oil?
Proceedings of the 1st Workshop on Interactions of NVM/FLASH with Operating Systems and Workloads
NVM duet: unified working memory and persistent store architecture
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
WADE: Writeback-aware dynamic cache management for NVM-based main memory system
ACM Transactions on Architecture and Code Optimization (TACO)
C1C: A configurable, compiler-guided STT-RAM L1 cache
ACM Transactions on Architecture and Code Optimization (TACO)
FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
Hi-index | 0.00 |
Phase Change Memory (PCM) is a promising technology for building future main memory systems. A prominent characteristic of PCM is that it has write latency much higher than read latency. Servicing such slow writes causes significant contention for read requests. For our baseline PCM system, the slow writes increase the effective read latency by almost 2X, causing significant performance degradation. This paper alleviates the problem of slow writes by exploiting the fundamental property of PCM devices that writes are slow only in one direction (SET operation) and are almost as fast as reads in the other direction (RESET operation). Therefore, a write operation to a line in which all memory cells have been SET prior to the write, will incur much lower latency. We propose PreSET, an architectural technique that leverages this property to pro-actively SET all the bits in a given memory line well in advance of the anticipated write to that memory line. Our proposed design initiates a PreSET request for a memory line as soon as that line becomes dirty in the cache, thereby allowing a large window of time for the PreSET operation to complete. Our evaluations show that PreSET is more effective and incurs lower storage overhead than previously proposed write cancellation techniques. We also describe static and dynamic throttling schemes to limit the rate of PreSET operations. Our proposal reduces effective read latency from 982 cycles to 594 cycles and increases system performance by 34%, while improving the energy-delay-product by 25%.