Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
A performance comparison of contemporary DRAM architectures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Proceedings of the 27th annual international symposium on Computer architecture
Dynamic Access Ordering for Streamed Computations
IEEE Transactions on Computers
ARVLSI '97 Proceedings of the 17th Conference on Advanced Research in VLSI (ARVLSI '97)
Impulse: Building a Smarter Memory Controller
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Reducing DRAM Latencies with an Integrated Memory Hierarchy Design
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Adaptive History-Based Memory Schedulers
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Modern dram memory systems: performance analysis and scheduling algorithm
Modern dram memory systems: performance analysis and scheduling algorithm
DRAMsim: a memory system simulator
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Leveraging Optical Technology in Future Bus-based Chip Multiprocessors
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Identifying energy-efficient concurrency levels using machine learning
CLUSTER '07 Proceedings of the 2007 IEEE International Conference on Cluster Computing
An approach to resource-aware co-scheduling for CMPs
Proceedings of the 24th ACM International Conference on Supercomputing
Hi-index | 0.00 |
Multi-core designs have become the industry imperative, replacing our reliance on increasingly complicated micro-architectural designs and VLSI improvements to deliver increased performance at lower power budgets. Performance of these multi-core chips will be limited by the DRAM memory system: we demonstrate this by modeling a cycle-accurate DDR2 memory controller with SPLASH-2 workloads. Surprisingly, benchmarks that appear to scale well with the number of processors fail to do so when memory is accurately modeled. We frequently find that the most efficient configuration is not the one with the most threads. By choosing the most efficient number of threads for each benchmark, average energy delay efficiency improves by a factor of 3.39, and performance improves by 19.7%, on average. We also introduce a shadow row of sense amplifiers, an alternative to cached DRAM, to explore potential power/performance impacts. The shadow row works in conjunction with the L2 Cache to leverage temporal and spatial locality across memory accesses, thus attaining average and peak speedups of 13% and 43%, respectively, when compared to a state-of-the-art DRAM memory scheduler.