Cache coherence protocols: evaluation using a multiprocessor simulation model
ACM Transactions on Computer Systems (TOCS)
Memory coherence in shared virtual memory systems
PODC '86 Proceedings of the fifth annual ACM symposium on Principles of distributed computing
Vector Computer Memory Bank Contention
IEEE Transactions on Computers
Memory conflicts and machine performance
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Measurement of memory access contentions in multiple vector processor systems
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Memory contention for shared memory vector multiprocessors
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
ACM Computing Surveys (CSUR)
Parallel Algorithms and Matrix Computation
Parallel Algorithms and Matrix Computation
Hi-index | 0.00 |
The general problem considered in the paper is partitioning of a matrix operation between processors of a parallel system in an optimum load-balanced way without potential memory contention. The considered parallel system is defined by several features the main of which is availability of a virtual shared memory divided into segments. If partitioning of a matrix operation causes parallel access to the same memory segment with writing data to the segment by at least one processor, then contention between processors arises which implies performance degradation. To eliminate such situation, a restriction is imposed on a class of possible partitionings, so that no two processors would write data to the same segment. On the resulting class of contention-free partitionings, a load-balanced optimum partitioning is defined as satisfying independent minimax criteria. The main result of the paper is an algorithm for finding the optimum partitioning by means of analytical solution of respective minimax problems. The paper also discusses implementation and performance issues related to the algorithm, on the basis of experience at Kendall Square Research Corporation, where the partitioning algorithm was used for creating high-performance parallel matrix libraries.