Parallelization and performance evaluation of circuit simulation on a shared-memory multiprocessor

Authors:
P. Sadayappan;V. Visvanathan
Affiliations:
Ohio State Univ., Columbus;AT&T Bell Laboratories, Murray Hill, NJ
Venue:
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Year:
1988

Citing 5
Cited 1

Computational models and task scheduling for parallel sparse Cholesky factorization

Parallel Computing
Parallel implementation of multifrontal schemes

Parallel Computing
Solving Linear Algebraic Equations on an MIMD Computer

Journal of the ACM (JACM)
Computer Methods for Circuit Analysis and Design

Computer Methods for Circuit Analysis and Design
Structure of Computers and Computations

Structure of Computers and Computations

Circuit Simulation on Shared-Memory Multiprocessors

IEEE Transactions on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Circuit simulation is a widely used but computationally demanding tool for VLSI design. In this paper, the considerations in achieving performance improvement through parallelization on a shared-memory multiprocessor are addressed. The two main components that comprise the computational bulk of circuit simulation, namely, matrix assembly and sparse matrix solution, raise very different issues in their parallelization. Parallelizing matrix assembly involves using a sequence of lock-synchronized parallel loops. A theoretical prediction of the performance of such loops is developed and this prediction is then compared to actual performance on a variety of circuits. Two approaches to parallel sparse matrix solution are contrasted: 1) an efficient implementation of an earlier proposed fine-grained model that captures parallelism at the elemental-operation level, and 2) a newly proposed medium-grained scheme that represents the computation at the row-operation level. A performance-evaluation framework is developed to interpret measured speedup in terms of various relevant factors. While the fine-grained approach achieves somewhat better load-balancing and also slightly lower scheduling overheads due to judicious task-clustering, the medium-grained approach is shown to be consistently superior for large circuit matrices due to lower operand access costs and better vectorization potential. The techniques developed have been incorporated into a prototype parallel implementation of the production circuit simulator ADVICE on the Alliant FX/8 multiprocessor.