Regularities considered harmful: forcing randomness to memory accesses to reduce row buffer conflicts for multi-core, multi-bank systems

Authors:
Heekwon Park;Seungjae Baek;Jongmoo Choi;Donghee Lee;Sam H. Noh
Affiliations:
University of Pittsburgh, Pittsburgh, PA, USA;University of Pittsburgh, Pittsburgh, PA, USA;Dankook University, Yongin, South Korea;University of Seoul, Seoul, South Korea;Hongik University, Seoul, South Korea
Venue:
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Year:
2013

Citing 20
Cited 3

Memory access scheduling

Proceedings of the 27th annual international symposium on Computer architecture
A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Memory Controller Optimizations for Web Servers

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Fast allocation and deallocation with an improved buddy system

Acta Informatica
DRAMsim: a memory system simulator

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Operating system multilevel load balancing

Proceedings of the 2006 ACM symposium on Applied computing
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Mini-rank: Adaptive DRAM architecture for improving memory power efficiency

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Future scaling of processor-memory interfaces

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Improving memory bank-level parallelism in the presence of prefetching

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Micro-pages: increasing DRAM efficiency with locality-aware data placement

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive granularity memory systems: a tradeoff between storage efficiency and throughput

Proceedings of the 38th annual international symposium on Computer architecture
Memory access pattern-aware DRAM performance model for multi-core systems

ISPASS '11 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software
Thread Tranquilizer: Dynamically reducing performance variation

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Minimalist open-page: a DRAM page-mode scheduling policy for the many-core era

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Reducing memory interference in multicore systems via application-aware memory channel partitioning

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Balancing DRAM locality and parallelism in shared memory CMP systems

HPCA '12 Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture
Performance analysis of thread mappings with a holistic view of the hardware resources

ISPASS '12 Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software

Optimizing RAM-latency dominated applications

Proceedings of the 4th Asia-Pacific Workshop on Systems
Effect of page frame allocation pattern on bank conflicts in multi-core systems

Proceedings of the 2013 Research in Adaptive and Convergent Systems
BPM/BPM+: Software-based dynamic memory partitioning mechanisms for mitigating DRAM bank-/channel-level interferences in multicore systems

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a novel kernel-level memory allocator, called M3 (M-cube, Multi-core Multi-bank Memory allocator), that has the following two features. First, it introduces and makes use of a notion of a memory container, which is defined as a unit of memory that comprises the minimum number of page frames that can cover all the banks of the memory organization, by exclusively assigning a container to a core so that each core achieves bank parallelism as much as possible. Second, it orchestrates page frame allocation so that pages that threads access are dispersed randomly across multiple banks so that each thread's access pattern is randomized. The development of M3 is based on a tool that we develop to fully understand the architectural characteristics of the underlying memory organization. Using an extension of this tool, we observe that the same application that accesses pages in a random manner outperforms one that accesses pages in a regular pattern such as sequential or same ordered accesses. This is because such randomized accesses reduces inter-thread access interference on the row-buffer in memory banks. We implement M3 in the Linux kernel version 2.6.32 on the Intel Xeon system that has 16 cores and 32GB DRAM. Performance evaluation with various workloads show that M3 improves the overall performance for memory intensive benchmarks by up to 85% with an average of about 40%.