The Effects of Problem Partitioning, Allocation, and Granularity on the Performance of Multiple-Processor Systems

Authors:
Z. Cvetanovic
Affiliations:
-
Venue:
IEEE Transactions on Computers
Year:
1987

Citing 9
Cited 23

Performance analysis of multiple-processor systems

Performance analysis of multiple-processor systems
Performance analysis of the FFT algorithm on a shared-memory parallel architecture

IBM Journal of Research and Development
Best worst mappings for the omega network

IBM Journal of Research and Development
The influence of parallel decomposition strategies on the performance of multiprocessor systems

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
The NYU Ultracomputer Designing an MIMD Shared Memory Parallel Computer

IEEE Transactions on Computers
Interconnections Between Processors and Memory Modules Using the Shuffle-Exchange Network

IEEE Transactions on Computers
Access and Alignment of Data in an Array Processor

IEEE Transactions on Computers
On the Impact of Communication Complexity on the Design of Parallel Numerical Algorithms

IEEE Transactions on Computers
Modeling the Weather with a Data Flow Supercomputer

IEEE Transactions on Computers

Performance analysis of the FFT algorithm on a shared-memory parallel architecture

IBM Journal of Research and Development
Best worst mappings for the omega network

IBM Journal of Research and Development
Data placement in Bubba

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Iterative Algorithms for Solution of Large Sparse Systems of Linear Equations on Hypercubes

IEEE Transactions on Computers
Exploiting variable grain parallelism at runtime

PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Implementation and performance analysis of parallel assignment algorithms on a hypercube computer

C3P Proceedings of the third conference on Hypercube concurrent computers and applications - Volume 2
Measuring the scalability of parallel computer systems

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Shared Block Contention in a Cache Coherence Protocol

IEEE Transactions on Computers
The KYKLOS Multicomputer Network: Interconnection Strategies, Properties, and Applications

IEEE Transactions on Computers
Improved Algorithms for Mapping Pipelined and Parallel Computations

IEEE Transactions on Computers
Models of machines and computation for mapping in multicomputers

ACM Computing Surveys (CSUR)
Fault simulation in a distributed environment

DAC '88 Proceedings of the 25th ACM/IEEE Design Automation Conference
A Distributed Heterogeneous Supercomputing Management System

Computer
Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes

IEEE Transactions on Parallel and Distributed Systems
Synchronization and Communication Costs of Loop Partitioning on Shared-Memory Multiprocessor Systems

IEEE Transactions on Parallel and Distributed Systems
Analysis of Macro-Dataflow Dynamic Scheduling on Nonuniform Memory Access Architectures

IEEE Transactions on Parallel and Distributed Systems
Scheduling DAG's for Asynchronous Multiprocessor Execution

IEEE Transactions on Parallel and Distributed Systems
Parallelism in a Main-Memory DBMS: The Performance of PRISMA/DB

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
A Non-Uniform Data Fragmentation Strategy for Parallel Main-Menory Database Systems

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Parallel program performance prediction using deterministic task graph analysis

ACM Transactions on Computer Systems (TOCS)
Sequential and parallel real-time simulation of a flexible manipulator system

Robotica
A case study of transport protocols to improve the execution of applications in virtual organisations utilising multicluster network configurations

International Journal of Networking and Virtual Organisations
SCTP, XTP and TCP as transport protocols for high performance computing on multi-cluster grid environments

HPCS'09 Proceedings of the 23rd international conference on High Performance Computing Systems and Applications

Quantified Score

Hi-index	14.99

Visualization

Abstract

In this paper we analyze the effects of the problem decomposition, the allocation of subproblems to processors, and the grain size of subproblems on the performance of a multiple- processor shared-memory architecture. Our results indicate that for algorithms where both the computation and the communication overhead can be fully decomposed among N processors, the speedup is a nondecreasing function of the level of granularity for arbitrary interconnection structure and allocation of subproblems to processors. For these algorithms, the speedup is an increasing function of the level of granularity provided that the interconnection bandwidth is greater than unity. If the bandwidth is equal to unity, then the speedup converges to the value equal to the ratio of processing time to communication time. For algorithms where the computation is decomposable but the communication overhead cannot be decomposed, the speedup is a nondecreasing function of the level of granularity for the best case bandwidth only. If the bandwidth is less than N, the speedup reaches its maximum and then decreases approaching zero as the level of granularity grows. For algorithms where the computation consists of parallel and serial sections of code and the communication overhead is fully decomposable, the speedup converges to a value inversely proportional to the fraction of time spent in the serial code even for the best case interconnection bandwidth.