The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Scalability issues affecting the design of a dense linear algebra library
Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Advanced Computer Architecture: Parallelism,Scalability,Programmability
Advanced Computer Architecture: Parallelism,Scalability,Programmability
Parallel Computers Two: Architecture, Programming and Algorithms
Parallel Computers Two: Architecture, Programming and Algorithms
Effects of Multithreading on Data and Workload Distribution for Distributed-Memory Multiprocessors
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Techniques to Enhance Cache Performance Across Parallel Program Sections
ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 01
Parallel FFT Algorithms for Cache Based Shared Memory Multiprocessors
ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 03
Performance Improvement for Applications on Parallel Computers
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Hi-index | 0.00 |
A technique to enhance the cache performance of some blocked algorithms is proposed in this paper. According to the results of the Number Theory, we present a principle for array padding so that accesses of array sub- blocks do not generate conflict misses. The technique is used to calcu- late the LU factorization and matrix multiplication. The principle is tested on a shared memory multiprocessor. The practical results agree with the theoretical analysis, and 20% to 150% increasing in performance is achieved.