The filter cache: an energy efficient memory structure
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A performance comparison of contemporary DRAM architectures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Proceedings of the 27th annual international symposium on Computer architecture
Interconnect characteristics of 2.5-D system integration scheme
Proceedings of the 2001 international symposium on Physical design
Design and performance evaluation of a cache assist to implement selective caching
ICCD '97 Proceedings of the 1997 International Conference on Computer Design (ICCD '97)
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Design space exploration for 3D architectures
ACM Journal on Emerging Technologies in Computing Systems (JETC)
Die Stacking (3D) Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
L1 Cache Filtering Through Random Selection of Memory References
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
3D-Stacked Memory Architectures for Multi-core Processors
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Bobcat: AMD's Low-Power x86 Processor
IEEE Micro
Reducing memory access latency with asymmetric DRAM bank organizations
Proceedings of the 40th Annual International Symposium on Computer Architecture
Hi-index | 0.00 |
Die-stacked DRAMs have been proposed that combine multiple layers of dense memory cells with a base logic layer to implement peripheral circuitry (decoders, sense amps), interface logic, and test structures. Even after implementing these various features, the base logic layer still contains significant unutilized space, providing an opportunity to add more functionality to the memory stack. One seemingly obvious approach is to add a cache to the base layer, which can potentially provide faster memory access while reducing the number of slow and power-hungry row buffer activations and closings. However, once the details of the internal DRAM buses are properly modeled, along with the timing constraints imposed by modern DRAM technologies, a conventional cache only provides a modest performance benefit. This work proposes a "file-managed" row buffer cache (FM-RB$) approach inspired by traditional register allocation and peep-hole optimization ideas from compiler design. By explicitly managing the allocation and deallocation of the row buffer "registers," the FM-RB$ can deliver performance benefits beyond a conventional cache approach.