Decoupled sectored caches: conciliating low tag implementation cost
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A data cache with multiple caching strategies tuned to different types of locality
ICS '95 Proceedings of the 9th international conference on Supercomputing
Exploiting spatial locality in data caches using spatial footprints
Proceedings of the 25th annual international symposium on Computer architecture
The pool of subsectors cache design
ICS '99 Proceedings of the 13th international conference on Supercomputing
Adapting cache line size to application behavior
ICS '99 Proceedings of the 13th international conference on Supercomputing
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Proceedings of the 27th annual international symposium on Computer architecture
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
IEEE Transactions on Computers
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Accurate and Complexity-Effective Spatial Pattern Prediction
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
DRAMsim: a memory system simulator
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Proceedings of the 33rd annual international symposium on Computer Architecture
IEEE Transactions on Computers
Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
The Cray BlackWidow: a highly scalable vector multiprocessor
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Mini-rank: Adaptive DRAM architecture for improving memory power efficiency
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
DRAM errors in the wild: a large-scale field study
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Decoupled DIMM: building high-bandwidth memory system using low-speed DRAM devices
Proceedings of the 36th annual international symposium on Computer architecture
Multicore DIMM: an Energy Efficient Memory Module with Independently Controlled DRAMs
IEEE Computer Architecture Letters
Future scaling of processor-memory interfaces
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Structural aspects of the system/360 model 85: II the cache
IBM Systems Journal
Error-correcting codes for semiconductor memory applications: a state-of-the-art review
IBM Journal of Research and Development
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Virtualized and flexible ECC for main memory
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Proceedings of the 49th Annual Design Automation Conference
Multiple sub-row buffers in DRAM: unlocking performance and energy improvement opportunities
Proceedings of the 26th ACM international conference on Supercomputing
BOOM: enabling mobile memory based low-power server DIMMs
Proceedings of the 39th Annual International Symposium on Computer Architecture
The dynamic granularity memory system
Proceedings of the 39th Annual International Symposium on Computer Architecture
MAGE: adaptive granularity and ECC for resilient and power efficient memory systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A survey of architectural techniques for DRAM power management
International Journal of High Performance Systems Architecture
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Reducing memory access latency with asymmetric DRAM bank organizations
Proceedings of the 40th Annual International Symposium on Computer Architecture
Rethinking algorithm-based fault tolerance with a cooperative software-hardware approach
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Effect of page frame allocation pattern on bank conflicts in multi-core systems
Proceedings of the 2013 Research in Adaptive and Convergent Systems
Decoupled compressed cache: exploiting spatial locality for energy-optimized compressed caching
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
A locality-aware memory hierarchy for energy-efficient GPU architectures
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
We propose adaptive granularity to combine the best of fine-grained and coarse-grained memory accesses. We augment virtual memory to allow each page to specify its preferred granularity of access based on spatial locality and error-tolerance tradeoffs. We use sector caches and sub-ranked memory systems to implement adaptive granularity. We also show how to incorporate adaptive granularity into memory access scheduling. We evaluate our architecture with and without ECC using memory intensive benchmarks from the SPEC, Olden, PARSEC, SPLASH2, and HPCS benchmark suites and micro-benchmarks. The evaluation shows that performance is improved by 61% without ECC and 44% with ECC in memory-intensive applications, while the reduction in memory power consumption (29% without ECC and 14% with ECC) and traffic (78% without ECC and 66% with ECC) is significant.