Anatomy of a message in the Alewife multiprocessor
ICS '93 Proceedings of the 7th international conference on Supercomputing
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Partitioned register file for TTAs
Proceedings of the 28th annual international symposium on Microarchitecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
iWarp: anatomy of a parallel computing system
iWarp: anatomy of a parallel computing system
Space-time scheduling of instruction-level parallelism on a raw machine
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Maps: a compiler-managed memory system for raw machines
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Smart Memories: a modular reconfigurable architecture
Proceedings of the 27th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
A VLSI Architecture for Concurrent Data Structures
A VLSI Architecture for Concurrent Data Structures
An instruction set and microarchitecture for instruction level distributed processing
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A design space evaluation of grid processor architectures
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Increasing and Detecting Memory Address Congruence
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
The RAW benchmark suite: computation structures for general purpose computing
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Proceedings of the 30th annual international symposium on Computer architecture
Energy characterization of a tiled architecture processor with on-chip networks
Proceedings of the 2003 international symposium on Low power electronics and design
A fast parallel reed-solomon decoder on a reconfigurable architecture
Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Power-driven Design of Router Microarchitectures in On-chip Networks
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams
Proceedings of the 31st annual international symposium on Computer architecture
Efficient orchestration of sub-word parallelism in media processors
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
High-level power analysis for on-chip networks
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
On-Chip Interconnects and Instruction Steering Schemes for Clustered Microarchitectures
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
A Technology-Aware and Energy-Oriented Topology Exploration for On-Chip Networks
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Near-Optimal Worst-Case Throughput Routing for Two-Dimensional Mesh Networks
Proceedings of the 32nd annual international symposium on Computer Architecture
A reconfigurable architecture for load-balanced rendering
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
A Distributed Control Path Architecture for VLIW Processors
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Hardware-modulated parallelism in chip multiprocessors
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Mapping and configuration methods for multi-use-case networks on chips
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Modeling instruction placement on a spatial architecture
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Code and data partitioning for fine-grain parallelism
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Implementation and Evaluation of a Dynamically Routed Processor Operand Network
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Trends toward on-chip networked microsystems
International Journal of High Performance Computing and Networking
Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
NOCS '08 Proceedings of the Second ACM/IEEE International Symposium on Networks-on-Chip
Application-specific Processor Architecture: Then and Now
Journal of Signal Processing Systems
Adaptive data compression for high-performance low-power on-chip networks
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Efficient unicast and multicast support for CMPs
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Recursive partitioning multicast: A bandwidth-efficient routing for Networks-on-Chip
NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
Exploring concentration and channel slicing in on-chip network router
NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
Mesh-of-trees and alternative interconnection networks for single-chip parallelism
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
rMPI: message passing on multicore processors with on-chip interconnect
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
An overview of achieving energy efficiency in on-chip networks
International Journal of Communication Networks and Distributed Systems
Efficient lookahead routing and header compression for multicasting in networks-on-chip
Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Weighted random oblivious routing on torus networks
Proceedings of the 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
A novel 3D layer-multiplexed on-chip network
Proceedings of the 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
A low-area multi-link interconnect architecture for GALS chip multiprocessors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A pattern for efficient parallel computation on multicore processors with scalar operand networks
Proceedings of the 2010 Workshop on Parallel Programming Patterns
Reducing Network-on-Chip energy consumption through spatial locality speculation
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
3D NOC for many-core processors
Microelectronics Journal
Hardware support for OpenMP collective operations
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Exploring topologies for source-synchronous ring-based network-on-chip
Proceedings of the Conference on Design, Automation and Test in Europe
Randomized partially-minimal routing: near-optimal oblivious routing for 3-D mesh networks
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The sharing architecture: sub-core configurability for IaaS clouds
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Dual partitioning multicasting for high-performance on-chip networks
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
The bypass paths and multiported register files in microprocessors serve as an implicit interconnect to communicate operand values among pipeline stages and multiple ALUs. Previous superscalar designs implemented this interconnect using centralized structures that do not scale with increasing ILP demands. In search of scalability, recent microprocessor designs in industry and academia exhibit a trend towards distributed resources such as partitioned register files, banked caches, multiple independent computer pipelines, and evenmultiple program counters. Some of these partitioned microprocessor designs have begun to implement bypassing and operand transport using point-to-point interconnects rather than centralized networks. We call interconnects optimized for scalar data transport, whether centralized or distributed, scalar operand networks. Although these networks share many of the challenges of multiprocessor networks such as scalability and deadlock avoidance, they have many unique requirements, including ultra-low latencies (a few cycles versus tens of cycles) and ultra-fast operation-operand matching. This paper discusses the unique properties of scalar operand networks, examines alternative ways of implementing them, and describes in detail the implementation of one such network in the Raw microprocessor. The paper analyzes the performance of these networks for ILP workloads and the sensitivity of over all ILP performance to network properties.