A workload-adaptive and reconfigurable bus architecture for multicore processors

Authors:
Shoaib Akram;Alexandros Papakonstantinou;Rakesh Kumar;Deming Chen
Affiliations:
Department of Electrical and Computer Engineering, University of Illinois at Urbana Champaign, Urbana, IL;Department of Electrical and Computer Engineering, University of Illinois at Urbana Champaign, Urbana, IL;Department of Electrical and Computer Engineering, University of Illinois at Urbana Champaign, Urbana, IL;Department of Electrical and Computer Engineering, University of Illinois at Urbana Champaign, Urbana, IL
Venue:
International Journal of Reconfigurable Computing
Year:
2010

Citing 34
Cited 1

A class of compatible cache consistency protocols and their support by the IEEE futurebus

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Hierarchical cache/bus architecture for shared memory multiprocessors

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Performance of Multiprocessor Interconnection Networks

Computer
A Survey of Cache Coherence Schemes for Multiprocessors

Computer
Design of the Munin distributed shared memory system

Journal of Parallel and Distributed Computing - Special issue on distributed shared memory systems
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Memory sharing predictor: the key to a speculative coherent DSM

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Route packets, not wires: on-chip inteconnection networks

Proceedings of the 38th annual Design Automation Conference
Power and energy reduction via pipeline balancing

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Lock-free reference counting

Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
Architecture and CAD for Deep-Submicron FPGAs

Architecture and CAD for Deep-Submicron FPGAs
Shared Memory Consistency Models: A Tutorial

Computer
FPGA and CPLD Architectures: A Tutorial

IEEE Design & Test
Networks on Silicon: Blessing or Nightmare?

DSD '02 Proceedings of the Euromicro Symposium on Digital Systems Design
Lockup-free instruction fetch/prefetch cache organization

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Phase tracking and prediction

Proceedings of the 30th annual international symposium on Computer architecture
JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Characterizing and Predicting Program Behavior and its Variability

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Principles and Practices of Interconnection Networks

Principles and Practices of Interconnection Networks
Memory coherence activity prediction in commercial workloads

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence

Proceedings of the 32nd annual international symposium on Computer Architecture
Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking

Proceedings of the 32nd annual international symposium on Computer Architecture
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling

Proceedings of the 32nd annual international symposium on Computer Architecture
POWER5 System microarchitecture

IBM Journal of Research and Development - POWER5 and packaging
The M5 Simulator: Modeling Networked Systems

IEEE Micro
Design tradeoffs for tiled CMP on-chip networks

Proceedings of the 20th annual international conference on Supercomputing
Coherence Ordering for Ring-based Chip Multiprocessors

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
In-Network Cache Coherence

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Predicting reconfigurable interconnect performance in distributed shared-memory systems

Integration, the VLSI Journal
Introduction to the Configurable, Highly Parallel Computer

Computer
Polymorphic On-Chip Networks

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Dynamic thermal management via architectural adaptation

Proceedings of the 46th Annual Design Automation Conference
A dynamically reconfigurable interconnect for array processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

A hybrid NoC design for cache coherence optimization for chip multiprocessors

Proceedings of the 49th Annual Design Automation Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Interconnection networks for multicore processors are traditionally designed to serve a diversity of workloads. However, different workloads or even different execution phases of the same workload may benefit from different interconnect configurations. In this paper, we first motivate the need for workload-adaptive interconnection networks. Subsequently, we describe an interconnection network framework based on reconfigurable switches for use in medium-scale (up to 32 cores) shared memory multicore processors. Our cost-effective reconfigurable interconnection network is implemented on a traditional shared bus interconnect with snoopy-based coherence, and it enables improvedmulticore performance. The proposed interconnect architecture distributes the cores of the processor into clusters with reconfigurable logic between clusters to support workload-adaptive policies for inter-cluster communication. Our interconnection scheme is complemented by interconnect-aware scheduling and additional interconnect optimizations which help boost the performance of multiprogramming and multithreaded workloads. We provide experimental results that show that the overall throughput of multiprogramming workloads (consisting of two and four programs) can be improved by up to 60% with our configurable bus architecture. Similar gains can be achieved also for multithreaded applications as shown by further experiments. Finally, we present the performance sensitivity of the proposed interconnect architecture on shared memory bandwidth availability.