Performance Enhancement on Microprocessors with Hierarchical Memory Systems for Solving Large Sparse Linear Systems

Authors:
G. Wang;Danesh K. Tafti
Affiliations:
National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, U.S.A.;National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, U.S.A.
Venue:
International Journal of High Performance Computing Applications
Year:
1999

Citing 13
Cited 7

Strategies for cache and local memory management by global program transformation

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Introduction to Parallel & Vector Solution of Linear Systems

Introduction to Parallel & Vector Solution of Linear Systems
The eigenvalue spectrum of domain decomposed preconditioners

Applied Numerical Mathematics - II on Domain decomposition; Guest Editor: W. Proskurowski
High performance computing

High performance computing
Block sparse Cholesky algorithms on advanced uniprocessor computers

SIAM Journal on Scientific Computing
A parallel implementation of an iterative substructuring algorithm for problems in three dimensions

SIAM Journal on Scientific Computing
Numerical experiments with an overlapping additive Schwarz solver for 3-D parallel reservoir simulation

International Journal of Supercomputer Applications and High Performance Engineering
Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Thread scheduling for cache locality

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Domain decomposition: parallel multilevel methods for elliptic partial differential equations

Domain decomposition: parallel multilevel methods for elliptic partial differential equations
Data-centric multi-level blocking

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Performance of the CRAY T3E multiprocessor

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing

Achieving high sustained performance in an unstructured mesh CFD application

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Three parallel programming paradigms: comparisons on an archetypal PDE computation

Progress in computer research
Using Loop-Level Parallelism to Parallelize Vectorizable Programs

HIPS '01 Proceedings of the 6th International Workshop on High-Level Parallel Programming Models and Supportive Environments
Jacobian-free Newton-Krylov methods: a survey of approaches and applications

Journal of Computational Physics
Globalized Newton-Krylov-Schwarz Algorithms and Software for Parallel Implicit CFD

International Journal of High Performance Computing Applications
Performance instrumentation and compiler optimizations for MPI/OpenMP applications

IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
OpenMP parallelism for fluid and fluid-particulate systems

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, scientific computing is being driven by microprocessor-based architectures. Most architectural designs are characterized by fast processors, fast but small caches, and large but slow memories. As a result, problems of small sizes that fit in cache perform exceedingly well, whereas the performance of larger problems is limited by the speed of memory. In this paper, the authors study the performance characteristics of several iterative kernels for solving sparse linear systems on several popular microprocessors. Given the performance limitations posed by slow memory on large problem sizes, the authors show the effectiveness of using domain decomposition methods of the additive Schwarz type to enhance performance on single microprocessors.