The rice parallel processing testbed
SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Performance Analysis of k-ary n-cube Interconnection Networks
IEEE Transactions on Computers
The Stanford Dash Multiprocessor
Computer
Architectural requirements of parallel scientific applications with explicit communication
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Working sets, cache sizes, and node granularity issues for large-scale multiprocessors
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Mechanisms for cooperative shared memory
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The Wisconsin Wind Tunnel: virtual prototyping of parallel computers
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Evaluating multigauge architectures for computer vision
Journal of Parallel and Distributed Computing - Special issue on heterogeneous processing
Modeling communication in parallel algorithms: a fruitful interaction between theory and systems?
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
An approach to scalability study of shared memory parallel systems
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A simulation-based scalability study of parallel systems
Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
ICS '90 Proceedings of the 4th international conference on Supercomputing
Limits on Interconnection Network Performance
IEEE Transactions on Parallel and Distributed Systems
The Scalability of FFT on Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of Mesh Interconnection Networks with Deterministic Routing
IEEE Transactions on Parallel and Distributed Systems
A large scale, homogeneous, fully distributed parallel machine, I
ISCA '77 Proceedings of the 4th annual symposium on Computer architecture
Abstracting network characteristics and locality properties of parallel systems
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
PROTEUS: A HIGH-PERFORMANCE PARALLEL-ARCHITECTURE SIMULATOR
PROTEUS: A HIGH-PERFORMANCE PARALLEL-ARCHITECTURE SIMULATOR
SPLASH: Stanford parallel applications for shared-memory
SPLASH: Stanford parallel applications for shared-memory
The complexity of parallel computations
The complexity of parallel computations
Execution-driven simulators for parallel systems design
Proceedings of the 29th conference on Winter simulation
An Application-Driven Study of Parallel System Overheads and Network Bandwidth Requirements
IEEE Transactions on Parallel and Distributed Systems
The Journal of Supercomputing - Special issue on embedded fault-tolerance systems
Communication in Parallel Applications: Characterization and Sensitivity Analysis
ICPP '97 Proceedings of the international Conference on Parallel Processing
Hi-index | 0.00 |
Synthesizing architectural requirements from an application viewpoint can help in making important architectural design decisions towards building large scale parallel machines. In this paper, we quantify the link bandwidth requirement on a binary hypercube topology for a set of five parallel applications. We use an execution-driven simulator called SPASM to collect data points for system sizes that are feasible to be simulated. These data points are then used in a regression analysis for projecting the link bandwidth requirements for larger systems. The requirements are projected as a function of the following system parameters: number of processors, CPU clock speed, and problem size. These results are also used to project the link bandwidths for other network topologies. Our study quantifies the link bandwidth that has to be made available to limit the network overhead in an application to a specified tolerance level. The results show that typical link bandwidths (200-300 MBytes/sec) found in current commercial parallel architectures (such as Intel Paragon and Cray T3D) would have fairly low network overhead for the applications considered in this study. For two of the applications, this overhead is negligible. For the other applications, this overhead can be limited to about 30% of the execution time provided the problem sizes are increased commensurate with the processor clock speed. The technique presented can be useful to a system architect to synthesize the bandwidth requirements for realizing well-balanced parallel architectures.