Communications of the ACM - Special section on computer architecture
The connection machine
The Distribution of Waiting Times in Clocked Multistage Interconnection Networks
IEEE Transactions on Computers
The architecture and programming of the Ametek series 2010 multicomputer
C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
The horizon supercomputing system: architecture and software
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Warp: an integrated solution of high-speed parallel computing
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Performance of the Direct Binary n-Cube Network for Multiprocessors
IEEE Transactions on Computers
Performance Analysis of k-ary n-cube Interconnection Networks
IEEE Transactions on Computers
Interconnection networks for large-scale parallel processing: theory and case studies (2nd ed.)
Interconnection networks for large-scale parallel processing: theory and case studies (2nd ed.)
LimitLESS directories: A scalable cache coherence scheme
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
APRIL: a processor architecture for multiprocessing
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
A VLSI Architecture for Concurrent Data Structures
A VLSI Architecture for Concurrent Data Structures
The MuNet: A scalable decentralized architecture for parallel computation
ISCA '80 Proceedings of the 7th annual symposium on Computer Architecture
A large scale, homogeneous, fully distributed parallel machine, I
ISCA '77 Proceedings of the 4th annual symposium on Computer architecture
A complexity theory for VLSI
Theory, Volume 1, Queueing Systems
Theory, Volume 1, Queueing Systems
Planar-adaptive routing: low-cost adaptive networks for multiprocessors
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The impact of communication locality on large-scale multiprocessor performance
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Access normalization: loop restructuring for NUMA compilers
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Access normalization: loop restructuring for NUMA computers
ACM Transactions on Computer Systems (TOCS)
The J-machine multicomputer: an architectural evaluation
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
A comparison of adaptive wormhole routing algorithms
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Designing interconnection networks for multi-level packaging
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
The interaction between virtual channel flow control and adaptive routing in wormhole networks
ICS '94 Proceedings of the 8th international conference on Supercomputing
An approach to scalability study of shared memory parallel systems
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The effectiveness of multiple hardware contexts
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Planar-adaptive routing: low-cost adaptive networks for multiprocessors
Journal of the ACM (JACM)
On characterizing bandwidth requirements of parallel applications
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
NIFDY: a low overhead, high throughput network interface
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Petri net modeling of interconnection networks for massively parallel architectures
ICS '95 Proceedings of the 9th international conference on Supercomputing
Optimal Layouts of Midimew Networks
IEEE Transactions on Parallel and Distributed Systems
Designing Clustered Multiprocessor Systems under Packaging and Technological Advancements
IEEE Transactions on Parallel and Distributed Systems
Synchronization and communication in the T3E multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
ICS '96 Proceedings of the 10th international conference on Supercomputing
IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of Buffering Schemes in Wormhole Routers
IEEE Transactions on Computers
PP-MESS-SIM: A Flexible and Extensible Simulator for Evaluating Multicomputer Networks
IEEE Transactions on Parallel and Distributed Systems
A performance evaluation of cluster architectures
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Accelerated waveform methods for parallel transient simulation of semiconductor devices
ICCAD '93 Proceedings of the 1993 IEEE/ACM international conference on Computer-aided design
Toward a More Realistic Performance Evaluation of Interconnection Networks
IEEE Transactions on Parallel and Distributed Systems
A Theory of Fault-Tolerant Routing in Wormhole Networks
IEEE Transactions on Parallel and Distributed Systems
A Cost and Speed Model for k-ary n-Cube Wormhole Routers
IEEE Transactions on Parallel and Distributed Systems
LoGPC: modeling network contention in message-passing programs
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Macro-Star Networks: Efficient Low-Degree Alternatives to Star Graphs
IEEE Transactions on Parallel and Distributed Systems
Wormhole routing techniques for directly connected multicomputer systems
ACM Computing Surveys (CSUR)
An Application-Driven Study of Parallel System Overheads and Network Bandwidth Requirements
IEEE Transactions on Parallel and Distributed Systems
The Offset Cube: A Three-Dimensional Multicomputer Network Topology Using Through-Wafer Optics
IEEE Transactions on Parallel and Distributed Systems
Low-level router design and its impact on supercomputer system performance
ICS '99 Proceedings of the 13th international conference on Supercomputing
A new method to make communication latency uniform: distributed routing balancing
ICS '99 Proceedings of the 13th international conference on Supercomputing
ICS '99 Proceedings of the 13th international conference on Supercomputing
A Performance Model for Duato's Fully Adaptive Routing Algorithm in k$k$-Ary n$n$-Cubes
IEEE Transactions on Computers
Performance-Based Constraints for Multidimensional Networks
IEEE Transactions on Parallel and Distributed Systems
Improving parallel system performance by changing the arrangement of the network links
Proceedings of the 14th international conference on Supercomputing
Analysis of adaptive wormhole-routed torus networks with IPP input traffic
Proceedings of the 2001 ACM symposium on Applied computing
LoGPC: Modeling Network Contention in Message-Passing Programs
IEEE Transactions on Parallel and Distributed Systems
Analytical Modeling of Wormhole-Routed k-Ary n-Cubes in the Presence of Hot-Spot Traffic
IEEE Transactions on Computers
A Cost-Effective Approach to Deadlock Handling in Wormhole Networks
IEEE Transactions on Parallel and Distributed Systems
A simple mathematical model of adaptive routing in wormhole k-ary n-cubes
Proceedings of the 2002 ACM symposium on Applied computing
On the Performance of Parallel Matrix Factorisation on the Hypermesh
The Journal of Supercomputing
Modeling of interconnection subsystems for massively parallel computers
Performance Evaluation
The Journal of Supercomputing
A Comparative Study of Switching Methods in Multicomputer Networks
The Journal of Supercomputing
A distributed formation of smallest faulty orthogonal convex polygons in 2-D meshes
Journal of Parallel and Distributed Computing
Behavioral Modeling and Simulation of Optical Integrated Devices
Analog Integrated Circuits and Signal Processing
Design and evaluation of a DAMQ multiprocessor network with self-compacting buffers
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Hypermeshes: implementation and performance
Journal of Systems Architecture: the EUROMICRO Journal
On the merits of hypermeshes and tori with adaptive routing
Journal of Systems Architecture: the EUROMICRO Journal
Performance Evaluation of Hierarchical Ring-Based Shared Memory Multiprocessors
IEEE Transactions on Computers
A Performance Model of Pipelined k-ary n-cubes
IEEE Transactions on Computers
Valved Routing: Efficient Flow Control for Adaptive Nonminimal Routing in Interconnection Networks
IEEE Transactions on Computers
Performance Tradeoffs in Multithreaded Processors
IEEE Transactions on Parallel and Distributed Systems
Performance of Pruning-Cache Directories for Large-Scale Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
The Impact of Pipelined Channels on k-ary n-Cube Networks
IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of Mesh Interconnection Networks with Deterministic Routing
IEEE Transactions on Parallel and Distributed Systems
Alleviating Consumption Channel Bottleneck in Wormhole-Routed k-ary n-Cube Systems
IEEE Transactions on Parallel and Distributed Systems
Communication in Parallel Applications: Characterization and Sensitivity Analysis
ICPP '97 Proceedings of the international Conference on Parallel Processing
Network Performance under Physical Constraints
ICPP '97 Proceedings of the international Conference on Parallel Processing
Multidimensional Network Performance with Unidirectional Links
ICPP '97 Proceedings of the international Conference on Parallel Processing
Software-Based Deadlock Recovery Technique for True Fully Adaptive Routing in Wormhole Networks
ICPP '97 Proceedings of the international Conference on Parallel Processing
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Software Techniques for Improving MPP Bulk-Transfer Performance
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Deadlock- and Livelock-Free Routing Protocols for Wave Switching
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Performance Analysis of Wormhole-Switched k-Ary n-Cubes with Bursty Traffic
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Edge-Bisection of Chordal Rings
MFCS '00 Proceedings of the 25th International Symposium on Mathematical Foundations of Computer Science
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Avoiding Network Congestion with Local Information
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Bidirectional versus Unidirectional Networks: Cost/Performance Trade-Offs
MASCOTS '95 Proceedings of the 3rd International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems
A New Reliability Model for Interconnection Networks
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
An Analytical Model of Deterministic Routing in the Presence of Hot-Spot Traffic
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
A Customizable Simulator for Workstation Networks
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
On the Design of a High-Performance Adaptive Router for CC-NUMA Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Analysis of k-ary n-cubes with dimension-ordered routing
Future Generation Computer Systems - Selected papers from CCGRID 2002
Communication Delay in Wormhole-Switched Tori Networks under Bursty Workloads
The Journal of Supercomputing
Abstracting network characteristics and locality properties of parallel systems
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Modeling virtual channel flow control in hypercubes
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Distance-Adaptive Update Protocols for Scalable Shared-Memory Multiprocessors
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
RMB -- A Reconfigurable Multiple Bus Network
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Model Validation of a Wormhole Router System
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Object Oriented Parallel Architecture Simulator Design and Validation
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Analysis of Buffer Design for Adaptive Routing in Direct Networks
MASCOTS '96 Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Timed Petri net models of multithreaded multiprocessor architectures
PNPM '97 Proceedings of the 6th International Workshop on Petri Nets and Performance Models
Trojan: A High-Performance Simulator for Shared Memory Architectures
SS '96 Proceedings of the 29th Annual Simulation Symposium (SS '96)
A Performance Model of Adaptive Routing in k-Ary n-Cubes with Matrix-Transpose Traffic
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Analytical modelling of wormhole-routed k-ary n-cubes in the presence of matrix-transpose traffic
Journal of Parallel and Distributed Computing
A queueing model for wormhole routing with timeout
ICCCN '95 Proceedings of the 4th International Conference on Computer Communications and Networks
Modeling Latency in Deterministic Wormhole-Routed Hypercubes under Hot-Spot Traffic
The Journal of Supercomputing
The hierarchical cliques interconnection network
Journal of Parallel and Distributed Computing
Switch fabric architecture analysis for a scalable bi-directionally reconfigurable IP router
Journal of Systems Architecture: the EUROMICRO Journal
Analysis of true fully adaptive routing with software-based deadlock recovery
Journal of Systems and Software - Special issue: Computer systems
High-level power analysis for on-chip networks
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
On the performance of multicomputer interconnection networks
Journal of Systems Architecture: the EUROMICRO Journal
Performance Evaluation - Special issue: Distributed systems performance
The Effect of Virtual Channel Organization on the Performance of Interconnection Networks
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 14 - Volume 15
Comparative Modeling of Network Topologies and Routing Strategies in Multicomputers
International Journal of High Performance Computing Applications
Microarchitecture of a High-Radix Router
Proceedings of the 32nd annual international symposium on Computer Architecture
Prediction of communication delay in torus networks under multiple time-scale correlated traffic
Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
A Family of Mechanisms for Congestion Control in Wormhole Networks
IEEE Transactions on Parallel and Distributed Systems
Compiler-directed proactive power management for networks
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Design and analysis of an NoC architecture from performance, reliability and energy perspective
Proceedings of the 2005 ACM symposium on Architecture for networking and communications systems
Compiler-directed channel allocation for saving power in on-chip networks
Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Application-specific buffer space allocation for networks-on-chip router design
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Feasibility analysis of messages for on-chip networks using wormhole routing
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Stochastic Analysis of Deterministic Routing Algorithms in the Presence of Self-Similar Traffic
The Journal of Supercomputing
Compiler-directed voltage scaling on communication links for reducing power consumption
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Switch fabric design for high performance IP routers: a survey
Journal of Systems Architecture: the EUROMICRO Journal
A performance model of compressionless routing in k-ary n-cube networks
Performance Evaluation
An analytical model for hypercubes in the presence of multiple time-scale bursty traffic
Journal of Systems Architecture: the EUROMICRO Journal
Explanation of Performance Degradation in Turn Model
The Journal of Supercomputing
Modelling and simulation of off-chip communication architectures for high-speed packet processors
Journal of Systems and Software
Analytical communication networks model for enterprise Grid computing
Future Generation Computer Systems
Performance of Switch Blocking on Multithreaded Architectures
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Mathematical performance modelling of adaptive wormhole routing in optoelectronic hypercubes
Journal of Parallel and Distributed Computing
Microprocessors & Microsystems
Performance analysis of fault-tolerant routing algorithm in wormhole-switched interconnections
The Journal of Supercomputing
Communication delay analysis of fault-tolerant pipelined circuit switching in torus
Journal of Computer and System Sciences
An accurate mathematical performance model of adaptive routing in the star graph
Future Generation Computer Systems
Pipelined circuit switching: Analysis for the torus with non-uniform traffic
Journal of Systems Architecture: the EUROMICRO Journal
Combinatorial performance modelling of toroidal cubes
Journal of Systems Architecture: the EUROMICRO Journal
High performance architectures for Chip-to-Chip Communications on Network Line Cards
Journal of High Speed Networks
Analytic performance comparison of hypercubes and star graphs with implementation constraints
Journal of Computer and System Sciences
Parallel Lagrange interpolation on k-ary n-cubes with maximum channel utilization
The Journal of Supercomputing
Performance Evaluation of Fully Adaptive Routing for the Torus Interconnect Networks
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part IV: ICCS 2007
Future Generation Computer Systems
Off-chip communication architectures for high throughput network processors
Computer Communications
Resource placement in three-dimensional tori
Parallel Computing
Mathematical performance modelling of stretched hypercubes
ICDCN'08 Proceedings of the 9th international conference on Distributed computing and networking
Communication-prediction of scouting switching in adaptively-routed torus networks
NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
GLOBECOM'09 Proceedings of the 28th IEEE conference on Global telecommunications
Performance modeling of n-dimensional mesh networks
Performance Evaluation
Performance evaluation of wormhole routed network processor-memory interconnects
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A comparative performance analysis of n-cubes and star graphs
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Analytical performance modelling of adaptive wormhole routing in the star interconnection network
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
An analytical model for Network-on-Chip with finite input buffer
Frontiers of Computer Science in China
Optimal network architectures for minimizing average distance in k-ary n-dimensional mesh networks
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Modeling Billion-Node Torus Networks Using Massively Parallel Discrete-Event Simulation
PADS '11 Proceedings of the 2011 IEEE Workshop on Principles of Advanced and Distributed Simulation
The Journal of Supercomputing
Analytic performance modeling of a fully adaptive routing algorithm in the torus
ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
Network on chip for parallel DSP architectures
ICESS'05 Proceedings of the Second international conference on Embedded Software and Systems
Designing on-chip network based on optimal latency criteria
ICESS'05 Proceedings of the Second international conference on Embedded Software and Systems
A performance model of fault-tolerant routing algorithm in interconnect networks
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part I
Mathematical and Computer Modelling: An International Journal
Exploiting communication and packaging locality for cost-effective large scale networks
Proceedings of the 26th ACM international conference on Supercomputing
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Stochastic communication delay analysis of adaptive wormhole-switched routings in tori with faults
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Scalable high-radix router microarchitecture using a network switch organization
ACM Transactions on Architecture and Code Optimization (TACO)
On the topological properties of HyperX
The Journal of Supercomputing
Hi-index | 0.02 |
The latency of direct networks is modeled, taking into account both switch and wiredelays. A simple closed-form expression for contention in buffered, direct networks is derived and found to agree closely with simulations. The model includes the effects of packet size and communication locality. Network analysis under various constraints and under different workload parameters reveals that performance is highly sensitive to these constraints and workloads. A two-dimensional network is shown to have the lowest latency only when switch delays and network contention are ignored; three- or four-dimensional networks are favored otherwise. If communication locality exists, two-dimensional networks regain their advantage. Communication locality decreases both the base network latency and the network bandwidth requirements of applications. It is shown that a much larger fraction of the resulting performance improvement arises from the reduction in bandwidth requirements than from the decrease in latency.