A class of compatible cache consistency protocols and their support by the IEEE futurebus
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Hierarchical cache/bus architecture for shared memory multiprocessors
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Design of the Munin distributed shared memory system
Journal of Parallel and Distributed Computing - Special issue on distributed shared memory systems
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Memory sharing predictor: the key to a speculative coherent DSM
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Route packets, not wires: on-chip inteconnection networks
Proceedings of the 38th annual Design Automation Conference
Power and energy reduction via pipeline balancing
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
Architecture and CAD for Deep-Submicron FPGAs
Architecture and CAD for Deep-Submicron FPGAs
FPGA and CPLD Architectures: A Tutorial
IEEE Design & Test
Networks on Silicon: Blessing or Nightmare?
DSD '02 Proceedings of the Euromicro Symposium on Digital Systems Design
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Proceedings of the 30th annual international symposium on Computer architecture
JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Characterizing and Predicting Program Behavior and its Variability
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
Memory coherence activity prediction in commercial workloads
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence
Proceedings of the 32nd annual international symposium on Computer Architecture
Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking
Proceedings of the 32nd annual international symposium on Computer Architecture
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling
Proceedings of the 32nd annual international symposium on Computer Architecture
POWER5 System microarchitecture
IBM Journal of Research and Development - POWER5 and packaging
The M5 Simulator: Modeling Networked Systems
IEEE Micro
Design tradeoffs for tiled CMP on-chip networks
Proceedings of the 20th annual international conference on Supercomputing
Coherence Ordering for Ring-based Chip Multiprocessors
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Predicting reconfigurable interconnect performance in distributed shared-memory systems
Integration, the VLSI Journal
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Dynamic thermal management via architectural adaptation
Proceedings of the 46th Annual Design Automation Conference
A dynamically reconfigurable interconnect for array processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A hybrid NoC design for cache coherence optimization for chip multiprocessors
Proceedings of the 49th Annual Design Automation Conference
Hi-index | 0.00 |
Interconnection networks for multicore processors are traditionally designed to serve a diversity of workloads. However, different workloads or even different execution phases of the same workload may benefit from different interconnect configurations. In this paper, we first motivate the need for workload-adaptive interconnection networks. Subsequently, we describe an interconnection network framework based on reconfigurable switches for use in medium-scale (up to 32 cores) shared memory multicore processors. Our cost-effective reconfigurable interconnection network is implemented on a traditional shared bus interconnect with snoopy-based coherence, and it enables improvedmulticore performance. The proposed interconnect architecture distributes the cores of the processor into clusters with reconfigurable logic between clusters to support workload-adaptive policies for inter-cluster communication. Our interconnection scheme is complemented by interconnect-aware scheduling and additional interconnect optimizations which help boost the performance of multiprogramming and multithreaded workloads. We provide experimental results that show that the overall throughput of multiprogramming workloads (consisting of two and four programs) can be improved by up to 60% with our configurable bus architecture. Similar gains can be achieved also for multithreaded applications as shown by further experiments. Finally, we present the performance sensitivity of the proposed interconnect architecture on shared memory bandwidth availability.