Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Introduction to Parallel & Vector Solution of Linear Systems
Introduction to Parallel & Vector Solution of Linear Systems
The eigenvalue spectrum of domain decomposed preconditioners
Applied Numerical Mathematics - II on Domain decomposition; Guest Editor: W. Proskurowski
High performance computing
Block sparse Cholesky algorithms on advanced uniprocessor computers
SIAM Journal on Scientific Computing
A parallel implementation of an iterative substructuring algorithm for problems in three dimensions
SIAM Journal on Scientific Computing
International Journal of Supercomputer Applications and High Performance Engineering
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Thread scheduling for cache locality
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Domain decomposition: parallel multilevel methods for elliptic partial differential equations
Domain decomposition: parallel multilevel methods for elliptic partial differential equations
Data-centric multi-level blocking
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Performance of the CRAY T3E multiprocessor
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Achieving high sustained performance in an unstructured mesh CFD application
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Three parallel programming paradigms: comparisons on an archetypal PDE computation
Progress in computer research
Using Loop-Level Parallelism to Parallelize Vectorizable Programs
HIPS '01 Proceedings of the 6th International Workshop on High-Level Parallel Programming Models and Supportive Environments
Jacobian-free Newton-Krylov methods: a survey of approaches and applications
Journal of Computational Physics
Globalized Newton-Krylov-Schwarz Algorithms and Software for Parallel Implicit CFD
International Journal of High Performance Computing Applications
Performance instrumentation and compiler optimizations for MPI/OpenMP applications
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
OpenMP parallelism for fluid and fluid-particulate systems
Parallel Computing
Hi-index | 0.00 |
In recent years, scientific computing is being driven by microprocessor-based architectures. Most architectural designs are characterized by fast processors, fast but small caches, and large but slow memories. As a result, problems of small sizes that fit in cache perform exceedingly well, whereas the performance of larger problems is limited by the speed of memory. In this paper, the authors study the performance characteristics of several iterative kernels for solving sparse linear systems on several popular microprocessors. Given the performance limitations posed by slow memory on large problem sizes, the authors show the effectiveness of using domain decomposition methods of the additive Schwarz type to enhance performance on single microprocessors.