Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures

Authors:
Michael Bedford Taylor;Walter Lee;Saman Amarasinghe;Anant Agarwal
Affiliations:
-;-;-;-
Venue:
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Year:
2003

Citing 15
Cited 44

Anatomy of a message in the Alewife multiprocessor

ICS '93 Proceedings of the 7th international conference on Supercomputing
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Partitioned register file for TTAs

Proceedings of the 28th annual international symposium on Microarchitecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
iWarp: anatomy of a parallel computing system

iWarp: anatomy of a parallel computing system
Space-time scheduling of instruction-level parallelism on a raw machine

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Maps: a compiler-managed memory system for raw machines

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Smart Memories: a modular reconfigurable architecture

Proceedings of the 27th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
A VLSI Architecture for Concurrent Data Structures

A VLSI Architecture for Concurrent Data Structures
An instruction set and microarchitecture for instruction level distributed processing

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A design space evaluation of grid processor architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs

IEEE Micro
Increasing and Detecting Memory Address Congruence

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
The RAW benchmark suite: computation structures for general purpose computing

FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines

A performance analysis of PIM, stream processing, and tiled processing on memory-intensive signal processing kernels

Proceedings of the 30th annual international symposium on Computer architecture
Energy characterization of a tiled architecture processor with on-chip networks

Proceedings of the 2003 international symposium on Low power electronics and design
A fast parallel reed-solomon decoder on a reconfigurable architecture

Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Power-driven Design of Router Microarchitectures in On-chip Networks

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams

Proceedings of the 31st annual international symposium on Computer architecture
Efficient orchestration of sub-word parallelism in media processors

Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
High-level power analysis for on-chip networks

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
On-Chip Interconnects and Instruction Steering Schemes for Clustered Microarchitectures

IEEE Transactions on Parallel and Distributed Systems
Scalar Operand Networks

IEEE Transactions on Parallel and Distributed Systems
A Technology-Aware and Energy-Oriented Topology Exploration for On-Chip Networks

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Near-Optimal Worst-Case Throughput Routing for Two-Dimensional Mesh Networks

Proceedings of the 32nd annual international symposium on Computer Architecture
A reconfigurable architecture for load-balanced rendering

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
A Distributed Control Path Architecture for VLIW Processors

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Hardware-modulated parallelism in chip multiprocessors

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Mapping and configuration methods for multi-use-case networks on chips

ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Modeling instruction placement on a spatial architecture

Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Code and data partitioning for fine-grain parallelism

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Implementation and Evaluation of a Dynamically Routed Processor Operand Network

NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
On-Chip Interconnection Architecture of the Tile Processor

IEEE Micro
On-Chip Interconnection Networks of the TRIPS Chip

IEEE Micro
Trends toward on-chip networked microsystems

International Journal of High Performance Computing and Networking
Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Circuit-Switched Coherence

NOCS '08 Proceedings of the Second ACM/IEEE International Symposium on Networks-on-Chip
Application-specific Processor Architecture: Then and Now

Journal of Signal Processing Systems
Adaptive data compression for high-performance low-power on-chip networks

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Efficient unicast and multicast support for CMPs

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Recursive partitioning multicast: A bandwidth-efficient routing for Networks-on-Chip

NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
Exploring concentration and channel slicing in on-chip network router

NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
Mesh-of-trees and alternative interconnection networks for single-chip parallelism

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
rMPI: message passing on multicore processors with on-chip interconnect

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
An overview of achieving energy efficiency in on-chip networks

International Journal of Communication Networks and Distributed Systems
Efficient lookahead routing and header compression for multicasting in networks-on-chip

Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Weighted random oblivious routing on torus networks

Proceedings of the 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
A novel 3D layer-multiplexed on-chip network

Proceedings of the 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
A low-area multi-link interconnect architecture for GALS chip multiprocessors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A pattern for efficient parallel computation on multicore processors with scalar operand networks

Proceedings of the 2010 Workshop on Parallel Programming Patterns
Reducing Network-on-Chip energy consumption through spatial locality speculation

NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
3D NOC for many-core processors

Microelectronics Journal
Hardware support for OpenMP collective operations

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Exploring topologies for source-synchronous ring-based network-on-chip

Proceedings of the Conference on Design, Automation and Test in Europe
Randomized partially-minimal routing: near-optimal oblivious routing for 3-D mesh networks

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The sharing architecture: sub-core configurability for IaaS clouds

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Dual partitioning multicasting for high-performance on-chip networks

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The bypass paths and multiported register files in microprocessors serve as an implicit interconnect to communicate operand values among pipeline stages and multiple ALUs. Previous superscalar designs implemented this interconnect using centralized structures that do not scale with increasing ILP demands. In search of scalability, recent microprocessor designs in industry and academia exhibit a trend towards distributed resources such as partitioned register files, banked caches, multiple independent computer pipelines, and evenmultiple program counters. Some of these partitioned microprocessor designs have begun to implement bypassing and operand transport using point-to-point interconnects rather than centralized networks. We call interconnects optimized for scalar data transport, whether centralized or distributed, scalar operand networks. Although these networks share many of the challenges of multiprocessor networks such as scalability and deadlock avoidance, they have many unique requirements, including ultra-low latencies (a few cycles versus tens of cycles) and ultra-fast operation-operand matching. This paper discusses the unique properties of scalar operand networks, examines alternative ways of implementing them, and describes in detail the implementation of one such network in the Raw microprocessor. The paper analyzes the performance of these networks for ILP workloads and the sensitivity of over all ILP performance to network properties.