Performance tradeoffs in cache design
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Cache considerations for multiprocessor programmers
Communications of the ACM
High-performance computer architecture (2nd ed.)
High-performance computer architecture (2nd ed.)
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Automatic transformation of FORTRAN loops to reduce cache conflicts
ICS '91 Proceedings of the 5th international conference on Supercomputing
Cache replacement with dynamic exclusion
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Reducing cache conflicts in data cache prefetching
ACM SIGARCH Computer Architecture News - Special issue on input/output in parallel computer systems
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Tolerating latency through software-controlled data prefetching
Tolerating latency through software-controlled data prefetching
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
An Iteration Partition Approach for Cache or Local Memory Thrashing on Parallel Processing
IEEE Transactions on Computers
Iteration Space Tiling for Memory Hierarchies
Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing
Software assistance for data caches
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Optimizing instruction cache performance for operating system intensive workloads
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Reducing Cache Conflicts by Multi-Level Cache Partitioning and Array Elements Mapping
The Journal of Supercomputing
Embedded Systems Design
Hi-index | 0.00 |
In parallel processor systems, the performance of individual processors is a key factor in overall performance. Processor performance is strongly affected by the behavior of cache memory in that high hit rates are essential for high performance. Hit rates are lowered when collisions on placing lines in the cache force a cache line to be replaced before it has been used to best effect. Spatial cache collisions occur if data structures and data access patterns are misaligned. We describe a mathematical scheme to improve alignment and enhance performance in applications which have moderate-to-large numbers of arrays, where various dimensionalities are involved in localized computation and array access patterns are sequential. These properties are common in many computational modeling applications. Furthermore, the scheme provides a single solution when an application is targeted to run on various numbers of processors in power-of-two sizes. The applicability of the proposed scheme is demonstrated on testbed code for an air quality modeling problem.