The Chinese remainder theorem and the prime memory system
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Proceedings of the 27th annual international symposium on Computer architecture
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Memory controller policies for DRAM power management
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Symbiotic jobscheduling with priorities for a simultaneous multithreading processor
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Scheduler-based DRAM energy management
Proceedings of the 39th annual Design Automation Conference
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
DRAM Energy Management Using Sof ware and Hardware Directed Power Mode Control
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Reducing DRAM Latencies with an Integrated Memory Hierarchy Design
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Dynamic tracking of page miss ratio curve for memory management
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Performance directed energy management for main memory and disks
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Improving energy efficiency by making DRAM less randomly accessed
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Modern dram memory systems: performance analysis and scheduling algorithm
Modern dram memory systems: performance analysis and scheduling algorithm
The M5 Simulator: Modeling Networked Systems
IEEE Micro
Design and implementation of power-aware virtual memory
ATEC '03 Proceedings of the annual conference on USENIX Annual Technical Conference
Power provisioning for a warehouse-sized computer
Proceedings of the 34th annual international symposium on Computer architecture
Limiting the power consumption of main memory
Proceedings of the 34th annual international symposium on Computer architecture
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Software thermal management of dram memory for multicore systems
SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Decoupled DIMM: building high-bandwidth memory system using low-speed DRAM devices
Proceedings of the 36th annual international symposium on Computer architecture
Future scaling of processor-memory interfaces
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Micro-pages: increasing DRAM efficiency with locality-aware data placement
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Virtualized and flexible ECC for main memory
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Rethinking DRAM design and organization for energy-constrained multi-cores
Proceedings of the 37th annual international symposium on Computer architecture
Rank-aware cache replacement and write buffering to improve DRAM energy efficiency
Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
Handling the problems and opportunities posed by multiple on-chip memory controllers
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Understanding the Energy Consumption of Dynamic Random Access Memories
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
MemScale: active low-power modes for main memory
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Coordinating processor and main memory for efficientserver power control
Proceedings of the international conference on Supercomputing
Adaptive granularity memory systems: a tradeoff between storage efficiency and throughput
Proceedings of the 38th annual international symposium on Computer architecture
Utilizing RF-I and intelligent scheduling for better throughput/watt in a mobile GPU memory system
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Improving System Energy Efficiency with Memory Rank Subsetting
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the 49th Annual Design Automation Conference
Multiple sub-row buffers in DRAM: unlocking performance and energy improvement opportunities
Proceedings of the 26th ACM international conference on Supercomputing
MultiScale: memory system DVFS with multiple memory controllers
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
RAIDR: Retention-Aware Intelligent DRAM Refresh
Proceedings of the 39th Annual International Symposium on Computer Architecture
PARDIS: a programmable memory controller for the DDRx interfacing standards
Proceedings of the 39th Annual International Symposium on Computer Architecture
BOOM: enabling mobile memory based low-power server DIMMs
Proceedings of the 39th Annual International Symposium on Computer Architecture
Towards energy-proportional datacenter memory with mobile DRAM
Proceedings of the 39th Annual International Symposium on Computer Architecture
LOT-ECC: localized and tiered reliability mechanisms for commodity memory systems
Proceedings of the 39th Annual International Symposium on Computer Architecture
A case for exploiting subarray-level parallelism (SALP) in DRAM
Proceedings of the 39th Annual International Symposium on Computer Architecture
The dynamic granularity memory system
Proceedings of the 39th Annual International Symposium on Computer Architecture
ViPZonE: OS-level memory variability-driven physical address zoning for energy savings
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
RAMZzz: rank-aware dram power management with dynamic migrations and demotions
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A survey of architectural techniques for DRAM power management
International Journal of High Performance Systems Architecture
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Rethinking DRAM Power Modes for Energy Proportionality
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Conservative row activation to improve memory power efficiency
Proceedings of the 27th international ACM conference on International conference on supercomputing
Reducing memory access latency with asymmetric DRAM bank organizations
Proceedings of the 40th Annual International Symposium on Computer Architecture
Resilient die-stacked DRAM caches
Proceedings of the 40th Annual International Symposium on Computer Architecture
XDRA: exploration and optimization of last-level cache for energy reduction in DDR DRAMs
Proceedings of the 50th Annual Design Automation Conference
Exploring DRAM organizations for energy-efficient and resilient exascale memories
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Effect of page frame allocation pattern on bank conflicts in multi-core systems
Proceedings of the 2013 Research in Adaptive and Convergent Systems
Coordinate page allocation and thread group for improving main memory power efficiency
Proceedings of the Workshop on Power-Aware Computing and Systems
Racing and pacing to idle: an evaluation of heuristics for energy-aware resource allocation
Proceedings of the Workshop on Power-Aware Computing and Systems
A programmable memory controller for the DDRx interfacing standards
ACM Transactions on Computer Systems (TOCS)
A locality-aware memory hierarchy for energy-efficient GPU architectures
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
RowClone: fast and energy-efficient in-DRAM bulk data copy and initialization
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
ACM Transactions on Architecture and Code Optimization (TACO)
Reducing DRAM row activations with eager read/write clustering
ACM Transactions on Architecture and Code Optimization (TACO)
A generalized software framework for accurate and efficient management of performance goals
Proceedings of the Eleventh ACM International Conference on Embedded Software
Hi-index | 0.00 |
The widespread use of multicore processors has dramatically increased the demand on high memory bandwidth and large memory capacity. As DRAM subsystem designs stretch to meet the demand, memory power consumption is now approaching that of processors. However, the conventional DRAM architecture prevents any meaningful power and performance trade-offs for memory-intensive workloads. We propose a novel idea called mini-rank for DDRx (DDR/DDR2/DDR3) DRAMs, which uses a small bridge chip on each DRAM DIMM to break a conventional DRAM rank into multiple smaller mini-ranks so as to reduce the number of devices involved in a single memory access. The design dramatically reduces the memory power consumption with only a slight increase on the memory idle latency. It does not change the DDRx bus protocol and its configuration can be adapted for the best performance-power trade-offs. Our experimental results using four-core multiprogramming workloads show that using x32 mini-ranks reduces memory power by 27.0% with 2.8% performance penalty and using x16 mini-ranks reduces memory power by 44.1% with 7.4% performance penalty on average for memory-intensive workloads, respectively.