On-chip traffic regulation to reduce coherence protocol cost on a microthreaded many-core architecture with distributed caches

Authors:
Qiang Yang;Jian Fu;Raphael Poss;Chris Jesshope
Affiliations:
University of Amsterdam, Amsterdam, Netherlands;University of Amsterdam, Amsterdam, Netherlands;University of Amsterdam, Amsterdam, Netherlands;University of Amsterdam, Amsterdam, Netherlands
Venue:
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Year:
2014

Citing 28
Cited 0

Memory bandwidth limitations of future microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Write buffer design for cache-coherent shared-memory multiprocessors

ICCD '95 Proceedings of the 1995 International Conference on Computer Design: VLSI in Computers and Processors
Reducing Memory Traffic Via Redundant Store Instructions

HPCN Europe '99 Proceedings of the 7th International Conference on High-Performance Computing and Networking
Design Issues and Tradeoffs for Write Buffers

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
SCIMA: Software Controlled Integrated Memory Architecture for High Performance Computing

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
The Memory Bandwidth Bottleneck and its Amelioration by a Compiler

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Eliminating Useless Messages in Write-Update Protocols on Scalable Multiprocessors

Eliminating Useless Messages in Write-Update Protocols on Scalable Multiprocessors
Technology Independent Area and Delay Estimations for MicroprocessorBuilding Blocks

Technology Independent Area and Delay Estimations for MicroprocessorBuilding Blocks
Instruction Level Parallelism through Microthreading---A Scalable Approach to Chip Multiprocessors

The Computer Journal
Comparing memory systems for chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Flattened Butterfly Topology for On-Chip Networks

IEEE Computer Architecture Letters
Network Interface Sharing Techniques for Area Optimized NoC Architectures

DSD '08 Proceedings of the 2008 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools
Application-aware prioritization mechanisms for on-chip networks

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Write buffer-oriented energy reduction in the L1 data cache of two-level caches for the embedded system

Proceedings of the 20th symposium on Great lakes symposium on VLSI
WAYPOINT: scaling coherence to thousand-core architectures

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Throughput-Effective On-Chip Networks for Manycore Accelerators

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Memory system performance in a NUMA multicore multiprocessor

Proceedings of the 4th Annual International Conference on Systems and Storage
Memory systems in the many-core era: challenges, opportunities, and solution directions

Proceedings of the international symposium on Memory management
Kilo-NOC: a heterogeneous network-on-chip architecture for scalability and service guarantees

Proceedings of the 38th annual international symposium on Computer architecture
A Micro Threading Based Concurrency Model for Parallel Computing

IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
SPATL: Honey, I Shrunk the Coherence Directory

PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
SCD: A scalable coherence directory with flexible sharer set encoding

HPCA '12 Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture
Why on-chip cache coherence is here to stay

Communications of the ACM
A Bandwidth-Optimized Multi-core Architecture for Irregular Applications

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
On-chip networks from a networking perspective: congestion and scalability in many-core interconnects

Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
UTLEON3: Exploring Fine-Grain Multi-Threading in FPGAs

UTLEON3: Exploring Fine-Grain Multi-Threading in FPGAs
Apple-CORE: Microgrids of SVP Cores -- Flexible, General-Purpose, Fine-Grained Hardware Concurrency Management

DSD '12 Proceedings of the 2012 15th Euromicro Conference on Digital System Design
Apple-CORE: Harnessing general-purpose many-cores with hardware concurrency management

Microprocessors & Microsystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

When hardware cache coherence scales to many cores on chip, over saturated traffic of the shared memory system may offset the benefit from massive hardware concurrency. In this article, we investigate the cost of a write-update protocol in terms of on-chip memory network traffic and its adverse effects on the system performance based on a multithreaded many-core architecture with distributed caches. We discuss possible software and hardware solutions to alleviate the network pressure. We find that in the context of massive concurrency, by introducing a write-merging buffer with 0.46% area overhead to each core, applications with good locality and concurrency are boosted up by 18.74% in performance on average. Other applications also benefit from this addition and even achieve a throughput increase of 5.93%. In addition, this improvement indicates that higher levels of concurrency per core can be exploited without impacting performance, thus tolerating latency better and giving higher processor efficiencies compared to other solutions.