Software-hardware cooperative DRAM bank partitioning for chip multiprocessors
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A fair thread-aware memory scheduling algorithm for chip multiprocessor
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Multiple sub-row buffers in DRAM: unlocking performance and energy improvement opportunities
Proceedings of the 26th ACM international conference on Supercomputing
Hi-index | 0.00 |
On systems with multi-core processors, the memory access scheduling scheme plays an important role not only in utilizing the limited memory bandwidth but also in balancing the program execution on all cores. In this study, we propose a scheme, called ME-LREQ, which considers the utilization of both processor cores and memory subsystem. It takes into consideration both the long-term and short-term gains of serving a memory request by prioritizing requests hitting on the row buffers and from the cores that can utilize memory more efficiently and have fewer pending requests. We have also thoroughly evaluated a set of memory scheduling schemes that differentiate and prioritize requests from different cores. Our simulation results show that for memory-intensive, multiprogramming workloads, the new policy improves the overall performance by 10.7% on average and up to 17.7% on a four-core processor, when compared with scheme that serves row buffers hit memory requests first and allows memory reads bypassing writes; and by up to 9.2% (6.4% on average) when compared with the scheme that serves requests from the core with the fewest pending requests first.