Design and Optimization of Large Size and Low Overhead Off-Chip Caches
IEEE Transactions on Computers
High-bandwidth network memory system through virtual pipelines
IEEE/ACM Transactions on Networking (TON)
Micro-pages: increasing DRAM efficiency with locality-aware data placement
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
Architecture and optimal configuration of a real-time multi-channel memory controller
Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
Configurations of contemporary DRAM memory systems become increasingly complex. A recent study shows that application performance is highly sensitive to choices of configurations, and suggests that tuning burst sizes and channel configurations be an effective way to optimize the DRAM performance for a given memory-intensive workload. However, this approach is workload dependent. In this study we show that, by utilizing fine-grain priority access scheduling, we are able to find a workload independent configuration that achieves optimal performance on a multi-channel memory system. Our approach can well utilize the available high concurrency and high bandwidth on such memory systems, and effectively reduce the memory stall time of memory-intensive applications. Conducting execution-driven simulation of a 4-way issue, 2 GHz processor, we show that the average performance improvement for fifteen memory-intensive SPEC2000 programs by using an optimized fine-grain priority scheduling is about 13% and 8% for a 2-channel and a 4-channel Direct Rambus DRAM memory systems, respectively, compared with gang scheduling. Compared with burst scheduling, the average performance improvement is 16% and 14% for the 2-channel and 4-channel memory systems, respectively.