Communications of the ACM - Special section on computer architecture
Data networks
Multicomputer networks: message-based parallel processing
Multicomputer networks: message-based parallel processing
High-performance multi-queue buffers for VLSI communications switches
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Finite-grain message passing concurrent computers
C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
Performance Analysis of Multibuffered Packet-Switching Networks in Multiprocessor Systems
IEEE Transactions on Computers
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
The Use of Feedback in Multiprocessors and Its Application to Tree Saturation Control
IEEE Transactions on Parallel and Distributed Systems
VLSI Communication Components for Multicomputer Networks
VLSI Communication Components for Multicomputer Networks
Petri net modeling of interconnection networks for massively parallel architectures
ICS '95 Proceedings of the 9th international conference on Supercomputing
A Family of Interconnection Networks for Nonuniform Traffic
IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of Buffering Schemes in Wormhole Routers
IEEE Transactions on Computers
Improving the performance of bristled CC-NUMA systems using virtual channels and adaptivity
ICS '99 Proceedings of the 13th international conference on Supercomputing
A VLSI wrapped wave front arbiter for crossbar switches
GLSVLSI '01 Proceedings of the 11th Great Lakes symposium on VLSI
A General Theory for Deadlock-Free Adaptive Routing Using a Mixed Set of Resources
IEEE Transactions on Parallel and Distributed Systems
HIPIQS: A High-Performance Switch Architecture Using Input Queuing
IEEE Transactions on Parallel and Distributed Systems
Fair and Efficient Packet Scheduling Using Elastic Round Robin
IEEE Transactions on Parallel and Distributed Systems
Modeling of interconnection subsystems for massively parallel computers
Performance Evaluation
Design and evaluation of a DAMQ multiprocessor network with self-compacting buffers
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
On the Design of a High-Performance Adaptive Router for CC-NUMA Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Modeling the buffer allocation strategies and flow control schemes in ATM networks
ISCC '97 Proceedings of the 2nd IEEE Symposium on Computers and Communications (ISCC '97)
ICCCN '95 Proceedings of the 4th International Conference on Computer Communications and Networks
HIPIQS: A High-Performance Switch Architecture using Input Queuing
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Deadlock-Free Dynamic Reconfiguration Schemes for Increased Network Dependability
IEEE Transactions on Parallel and Distributed Systems
Evaluation of queue designs for true fully adaptive routers
Journal of Parallel and Distributed Computing
Part I: A Theory for Deadlock-Free Dynamic Network Reconfiguration
IEEE Transactions on Parallel and Distributed Systems
Efficient Reduction of HOL Blocking in Multistage Networks
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
Evaluating kilo-instruction multiprocessors
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Matching output queueing with a multiple input/output-queued switch
IEEE/ACM Transactions on Networking (TON)
Resource allocation and cross-layer control in wireless networks
Foundations and Trends® in Networking
Deadlock-free connection-based adaptive routing with dynamic virtual circuits
Journal of Parallel and Distributed Computing
Rotary router: an efficient architecture for CMP interconnection networks
Proceedings of the 34th annual international symposium on Computer architecture
A logarithmic scheduling algorithm for high speed input-queued switches
Computer Communications
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A DAMQ shared buffer scheme for network-on-chip
CSS '07 Proceedings of the Fifth IASTED International Conference on Circuits, Signals and Systems
Operation and data mapping for CGRAs with multi-bank memory
Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
HiPC'08 Proceedings of the 15th international conference on High performance computing
An efficient strategy for reducing head-of-line blocking in fat-trees
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Memory access optimization in compilation for coarse-grained reconfigurable architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES)
OBQA: Smart and cost-efficient queue scheme for Head-of-Line blocking elimination in fat-trees
Journal of Parallel and Distributed Computing
Dynamic evolution of congestion trees: analysis and impact on switch architecture
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
On the correct sizing on meshes through an effective congestion management strategy
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
FeatherWeight: low-cost optical arbitration with QoS support
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A parallel packet switch with multiplexors containing virtual input queues
Computer Communications
Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special Section on Networks on Chip: Architecture, Tools, and Methodologies
BBQ: a straightforward queuing scheme to reduce hol-blocking in high-performance hybrid networks
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
A NOC closed-loop performance monitor and adapter
Microprocessors & Microsystems
A new proposal to deal with congestion in InfiniBand-based fat-trees
Journal of Parallel and Distributed Computing
Hi-index | 14.98 |
Small n*n switches are key components of interconnection networks used in multiprocessors and multicomputers. The architecture of these n*n switches, particularly their internal buffers, is critical for achieving high-throughput low-latency communication with cost-effective implementations. Several buffer structures are discussed and compared in terms of implementation complexity, inter-switch handshaking requirements, and their ability to deal with variations in traffic patterns and message lengths. A design for buffers that provide non-FIFO message handling and efficient storage allocation for variable size packets using linked lists managed by a simple on-chip controller is presented. The new buffer design is evaluated by comparing it to several alternative designs in the context of a multistage interconnection network. The modeling and simulation show that the new buffer outperforms alternative buffers and can thus be used to improve the performance of a wide variety of systems currently using less efficient buffers.