A performance comparison of contemporary DRAM architectures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Genetic Algorithms and Grouping Problems
Genetic Algorithms and Grouping Problems
Worst-case traffic for oblivious routing functions
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
IEEE Transactions on Parallel and Distributed Systems
Redeeming IPC as a Performance Metric for Multithreaded Programs
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
Thermal-Aware IP Virtualization and Placement for Networks-on-Chip Architecture
ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Near-Optimal Worst-Case Throughput Routing for Two-Dimensional Mesh Networks
Proceedings of the 32nd annual international symposium on Computer Architecture
A technique for low energy mapping and routing in network-on-chip architectures
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Energy-aware mapping for tile-based NoC architectures under performance constraints
ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
The AMD Opteron Northbridge Architecture
IEEE Micro
Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Flattened Butterfly Topology for On-Chip Networks
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
An Evaluation of Server Consolidation Workloads for Multi-Core Designs
IISWC '07 Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization
The era of many-modules SoC: revisiting the NoC mapping problem
Proceedings of the 2nd International Workshop on Network on Chip Architectures
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
Proceedings of the 37th annual international symposium on Computer architecture
Handling the problems and opportunities posed by multiple on-chip memory controllers
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
An efficient distributed memory interface for many-core platform with 3D stacked DRAM
Proceedings of the Conference on Design, Automation and Test in Europe
SigNet: network-on-chip filtering for coarse vector directories
Proceedings of the Conference on Design, Automation and Test in Europe
Netrace: dependency-driven trace-based network-on-chip simulation
Proceedings of the Third International Workshop on Network on Chip Architectures
Process scheduling for future multicore processors
Proceedings of the Fifth International Workshop on Interconnection Network Architecture: On-Chip, Multi-Chip
Throughput-Effective On-Chip Networks for Manycore Accelerators
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Probabilistic Distance-Based Arbitration: Providing Equality of Service for Many-Core CMPs
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A case for heterogeneous on-chip interconnects for CMPs
Proceedings of the 38th annual international symposium on Computer architecture
Optimal memory controller placement for chip multiprocessor
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A minimal average accessing time scheduler for multicore processors
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
TM2C: a software transactional memory for many-cores
Proceedings of the 7th ACM european conference on Computer Systems
Cost-efficient buffer sizing in shared-memory 3D-MPSoCs using wide I/O interfaces
Proceedings of the 49th Annual Design Automation Conference
MultiScale: memory system DVFS with multiple memory controllers
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Energy-guided exploration of on-chip network design for exa-scale computing
Proceedings of the International Workshop on System Level Interconnect Prediction
Cost-effective contention avoidance in a CMP with shared memory controllers
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Regional cache organization for NoC based many-core processors
Journal of Computer and System Sciences
A network congestion-aware memory subsystem for manycore
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Wireless Health Systems, On-Chip and Off-Chip Network Architectures
A heterogeneous multiple network-on-chip design: an application-aware approach
Proceedings of the 50th Annual Design Automation Conference
Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special Section on Networks on Chip: Architecture, Tools, and Methodologies
Efficient programming paradigm for video streaming processing on TILE64 platform
The Journal of Supercomputing
Designing on-chip networks for throughput accelerators
ACM Transactions on Architecture and Code Optimization (TACO)
Design space exploration of on-chip ring interconnection for a CPU-GPU heterogeneous architecture
Journal of Parallel and Distributed Computing
Proceedings of the International Conference on Computer-Aided Design
Direct distributed memory access for CMPs
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
In the near term, Moore's law will continue to provide an increasing number of transistors and therefore an increasing number of on-chip cores. Limited pin bandwidth prevents the integration of a large number of memory controllers on-chip. With many cores, and few memory controllers, where to locate the memory controllers in the on-chip interconnection fabric becomes an important and as yet unexplored question. In this paper we show how the location of the memory controllers can reduce contention (hot spots) in the on-chip fabric and lower the variance in reference latency. This in turn provides predictable performance for memory-intensive applications regardless of the processing core on which a thread is scheduled. We explore the design space of on-chip fabrics to find optimal memory controller placement relative to different topologies (i.e. mesh and torus), routing algorithms, and workloads.