Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
An effective programmable prefetch engine for on-chip caches
Proceedings of the 28th annual international symposium on Microarchitecture
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Selective cache ways: on-demand cache resource allocation
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Smart Memories: a modular reconfigurable architecture
Proceedings of the 27th annual international symposium on Computer architecture
Reconfigurable caches and their application to media processing
Proceedings of the 27th annual international symposium on Computer architecture
ACM Computing Surveys (CSUR)
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Let caches decay: reducing leakage energy via exploitation of cache generational behavior
ACM Transactions on Computer Systems (TOCS)
Reuse Distance-Based Cache Hint Selection
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Predicting whole-program locality through reuse distance analysis
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Cross-architecture performance predictions for scientific applications using parameterized models
Proceedings of the joint international conference on Measurement and modeling of computer systems
Single-vDD and single-vT super-drowsy techniques for low-leakage high-performance instruction caches
Proceedings of the 2004 international symposium on Low power electronics and design
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
BioBench: A Benchmark Suite of Bioinformatics Applications
ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Pinpointing and Exploiting Opportunities for Enhancing Data Reuse
ISPASS '08 Proceedings of the ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software
Evaluation techniques for storage hierarchies
IBM Systems Journal
Understanding the Energy Consumption of Dynamic Random Access Memories
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Communications of the ACM
All-window profiling and composable models of cache sharing
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Computer Architecture, Fifth Edition: A Quantitative Approach
Computer Architecture, Fifth Edition: A Quantitative Approach
Bundled execution of recurring traces for energy-efficient general purpose processing
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
QsCores: trading dark silicon for scalable energy efficiency with quasi-specific cores
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Power Limitations and Dark Silicon Challenge the Future of Multicore
ACM Transactions on Computer Systems (TOCS)
Application data prefetching on the IBM blue gene/Q supercomputer
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
The locality-aware adaptive cache coherence protocol
Proceedings of the 40th Annual International Symposium on Computer Architecture
Hi-index | 0.00 |
The end of Dennard scaling has made energy-efficiency a critical challenge in the continued increase of computing performance. An important approach to increasing energy-efficiency is hardware customization. In this study we explore the opportunity for energy-efficiency via memory hierarchy customization and present a methodology to identify application-specific energy efficient configurations. Using a workload of 37 diverse benchmarks, we address three key questions: 1) How much energy saving is possible?, 2) How much reconfiguration is required?, and 3) Can we use application characterization to automatically select an energy-optimal memory hierarchy configuration? Our results show that the potential benefit is large -- average reductions close to 70% in memory hierarchy energy with no performance loss. Further, our results show that the number of configurations need not be large; 13 carefully chosen configurations can deliver 93% of this benefit (64% energy reduction), and even coarse-grain reconfigurations of an existing hierarchy can deliver 81% of this benefit (56% energy reduction), suggesting that reconfigurable hierarchies may be practically realizable. Finally, as a first step towards automatic reconfiguration, we explore application characterization via reuse distance as a guide to select the best memory hierarchy configuration; we show that reuse distance can effectively predict the application-specific configuration which will both maintain performance and deliver energy efficiency.