Performance analysis of MR-1, a clustered shared-memory multiprocessor
Journal of Parallel and Distributed Computing
“Hypermeshes”: optical interconnection networks for parallel computing
Journal of Parallel and Distributed Computing
Experimenting with a shared virtual memory environment for hypercubes
Journal of Parallel and Distributed Computing - Special issue on distributed shared memory systems
S-connect: from networks of workstations to supercomputer performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Evaluation of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Sorting, Selection, and Routing on the Array with Reconfigurable Optical Buses
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Computers
Linear array with a reconfigurable pipelined bus system—concepts and applications
Information Sciences: an International Journal - special issue on parallel and distributed processing
Realizing Common Communication Patterns in Partitioned Optical Passive Stars (POPS) Networks
IEEE Transactions on Computers
Parallel Matrix Multiplication on a Linear Array with a Reconfigurable Pipelined Bus System
IEEE Transactions on Computers
On Time Bounds, the Work-Time Scheduling Principle, and Optimality for BSR
IEEE Transactions on Parallel and Distributed Systems
On the Performance of Parallel Matrix Factorisation on the Hypermesh
The Journal of Supercomputing
A Simulation Study of Hardware-Oriented DSM Approaches
IEEE Parallel & Distributed Technology: Systems & Technology
Balanced Parallel Sort on Hypercube Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Algorithms and Average Time Bounds of Sorting on a Mesh-Connected Computer
IEEE Transactions on Parallel and Distributed Systems
The Combined Effectiveness of Unimodular Transformations, Tiling, and Software Prefetching
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Comparing and Combining Read Miss Clustering and Software Prefetching
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Cost-Effective Compiler Directed Memory Prefetching and Bypassing
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
CLUSTER '01 Proceedings of the 3rd IEEE International Conference on Cluster Computing
Neighborhood Prefetching on Multiprocessors Using Instruction History
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Cache Injection on Bus Based Multiprocessors
SRDS '98 Proceedings of the The 17th IEEE Symposium on Reliable Distributed Systems
Efficient Integration of Compiler-Directed Cache Coherence and Data Prefetching
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Fast sorting algorithms on reconfigurable array of processors with optical buses
ICPADS '96 Proceedings of the 1996 International Conference on Parallel and Distributed Systems
Optimal Parallel Merging Algorithms on BSR
ISPAN '00 Proceedings of the 2000 International Symposium on Parallel Architectures, Algorithms and Networks
IWCC '99 Proceedings of the 1st IEEE Computer Society International Workshop on Cluster Computing
Brazos: a third generation DSM system
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Parallel merging with restriction
The Journal of Supercomputing
Expert Systems with Applications: An International Journal
Merging data records on EREW PRAM
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
A new light-based solution to the Hamiltonian path problem
Future Generation Computer Systems
Hi-index | 0.00 |
Due to advances in fiber-optics and VLSI technology, interconnection networks which allow multiple simultaneous broadcasts are becoming feasible. This paper presents the multiprocessor architecture of the Simultaneous Optical Multiprocessor Exchange Bus (SOME-Bus), and examines the performance of representative algorithms for matrix operations, merging and sorting, using the message-passing and distributed-shared-memory paradigms. It shows that simple enhancements to the network interface and the cache and directory controllers can result in communication time of O(1) for the matrix-vector multiplication algorithm using DSM. The SOME-Bus is a low-latency, high-bandwidth, fiber-optic interconnection network which directly links arbitrary pairs of processor nodes without contention, and can efficiently interconnect over 100 nodes. It contains a dedicated channel for the data output of each node, eliminating the need for global arbitration and providing bandwidth that scales directly with the number of nodes in the system. Each of P nodes has an array of receivers, with one receiver dedicated to each node output channel. No node is ever blocked from transmitting by another transmitter or due to contention for shared switching logic. The entire P receiver array can be integrated on a single chip at a comparatively minor cost resulting in O(P) complexity. The SOME-Bus has much more functionality than a crossbar by supporting multiple simultaneous broadcasts of messages, allowing cache consistency protocols to complete much faster.