Achieving predictable performance through better memory controller placement in many-core CMPs

Authors:
Dennis Abts;Natalie D. Enright Jerger;John Kim;Dan Gibson;Mikko H. Lipasti
Affiliations:
Google Inc, Madison, WI, USA;University of Toronto, Toronto, ON, Canada;KAIST, Daejeon, South Korea;University of Wisconsin - Madison, Madison, WI, USA;University of Wisconsin - Madison, Madison, WI, USA
Venue:
Proceedings of the 36th annual international symposium on Computer architecture
Year:
2009

Citing 20
Cited 26

A performance comparison of contemporary DRAM architectures

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Genetic Algorithms and Grouping Problems

Genetic Algorithms and Grouping Problems
Worst-case traffic for oblivious routing functions

Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
High-Speed Electrical Signaling: Overview and Limitations

IEEE Micro
Virtual-Channel Flow Control

IEEE Transactions on Parallel and Distributed Systems
Redeeming IPC as a Performance Metric for Multithreaded Programs

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Principles and Practices of Interconnection Networks

Principles and Practices of Interconnection Networks
Thermal-Aware IP Virtualization and Placement for Networks-on-Chip Architecture

ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Near-Optimal Worst-Case Throughput Routing for Two-Dimensional Mesh Networks

Proceedings of the 32nd annual international symposium on Computer Architecture
A technique for low energy mapping and routing in network-on-chip architectures

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Energy-aware mapping for tile-based NoC architectures under performance constraints

ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
Fair Queuing Memory Systems

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
The AMD Opteron Northbridge Architecture

IEEE Micro
Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
On-Chip Interconnection Architecture of the Tile Processor

IEEE Micro
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Flattened Butterfly Topology for On-Chip Networks

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
An Evaluation of Server Consolidation Workloads for Multi-Core Designs

IISWC '07 Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization

The era of many-modules SoC: revisiting the NoC mapping problem

Proceedings of the 2nd International Workshop on Network on Chip Architectures
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

Proceedings of the 37th annual international symposium on Computer architecture
Handling the problems and opportunities posed by multiple on-chip memory controllers

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
An efficient distributed memory interface for many-core platform with 3D stacked DRAM

Proceedings of the Conference on Design, Automation and Test in Europe
SigNet: network-on-chip filtering for coarse vector directories

Proceedings of the Conference on Design, Automation and Test in Europe
Netrace: dependency-driven trace-based network-on-chip simulation

Proceedings of the Third International Workshop on Network on Chip Architectures
Process scheduling for future multicore processors

Proceedings of the Fifth International Workshop on Interconnection Network Architecture: On-Chip, Multi-Chip
Throughput-Effective On-Chip Networks for Manycore Accelerators

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Probabilistic Distance-Based Arbitration: Providing Equality of Service for Many-Core CMPs

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A case for heterogeneous on-chip interconnects for CMPs

Proceedings of the 38th annual international symposium on Computer architecture
Optimal memory controller placement for chip multiprocessor

CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A minimal average accessing time scheduler for multicore processors

ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
TM2C: a software transactional memory for many-cores

Proceedings of the 7th ACM european conference on Computer Systems
Cost-efficient buffer sizing in shared-memory 3D-MPSoCs using wide I/O interfaces

Proceedings of the 49th Annual Design Automation Conference
MultiScale: memory system DVFS with multiple memory controllers

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Energy-guided exploration of on-chip network design for exa-scale computing

Proceedings of the International Workshop on System Level Interconnect Prediction
Cost-effective contention avoidance in a CMP with shared memory controllers

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Regional cache organization for NoC based many-core processors

Journal of Computer and System Sciences
A network congestion-aware memory subsystem for manycore

ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Wireless Health Systems, On-Chip and Off-Chip Network Architectures
A heterogeneous multiple network-on-chip design: an application-aware approach

Proceedings of the 50th Annual Design Automation Conference
Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures

ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special Section on Networks on Chip: Architecture, Tools, and Methodologies
Efficient programming paradigm for video streaming processing on TILE64 platform

The Journal of Supercomputing
Designing on-chip networks for throughput accelerators

ACM Transactions on Architecture and Code Optimization (TACO)
Design space exploration of on-chip ring interconnection for a CPU-GPU heterogeneous architecture

Journal of Parallel and Distributed Computing
MOMA: mapping of memory-intensive software-pipelined applications for systems with multiple memory controllers

Proceedings of the International Conference on Computer-Aided Design
Direct distributed memory access for CMPs

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the near term, Moore's law will continue to provide an increasing number of transistors and therefore an increasing number of on-chip cores. Limited pin bandwidth prevents the integration of a large number of memory controllers on-chip. With many cores, and few memory controllers, where to locate the memory controllers in the on-chip interconnection fabric becomes an important and as yet unexplored question. In this paper we show how the location of the memory controllers can reduce contention (hot spots) in the on-chip fabric and lower the variance in reference latency. This in turn provides predictable performance for memory-intensive applications regardless of the processing core on which a thread is scheduled. We explore the design space of on-chip fabrics to find optimal memory controller placement relative to different topologies (i.e. mesh and torus), routing algorithms, and workloads.