Program optimization for instruction caches
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Generating local addresses and communication sets for data-parallel programs
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Scheduling for locality in shared-memory multiprocessors
Scheduling for locality in shared-memory multiprocessors
Data and task alignment in distributed memory architectures
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Performance and optimization of data prefetching strategies in scalable multiprocessors
Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
SUIF: an infrastructure for research on parallelizing and optimizing compilers
ACM SIGPLAN Notices
Integer Programming for Array Subscript Analysis
IEEE Transactions on Parallel and Distributed Systems
Tolerating latency through software-controlled data prefetching
Tolerating latency through software-controlled data prefetching
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Automatic Detection of Parallelism: A Grand Challenge for High-Performance Computing
IEEE Parallel & Distributed Technology: Systems & Technology
Performance Tradeoffs in Multithreaded Processors
IEEE Transactions on Parallel and Distributed Systems
A Singular Loop Transformation Framework Based on Non-Singular Matrices
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Impact of memory hierarchy on program partitioning and scheduling
HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
Hi-index | 0.00 |
Over the last decade processor speed has increased dramatically, whereas the speed of the memory subsystem improved at a modest rate. Due to the increase in the cache miss latency (in terms of the processor cycle), processors stall on cache misses for a significant portion of its execution time. Multithreaded processors has been proposed in the literature to reduce the processor stall time due to cache misses. Although multithreading improves processor utilization, it may also increase cache miss rates, because in a multithreaded processor multiple threads share the same cache, which effectively reduces the cache size available to each individual thread. Increased processor utilization and the increase in the cache miss rate demands higher memory bandwidth. A novel compiler optimization method has been presented in this paper that improves data locality for each of the threads and enhances data sharing among the threads. The method is based on loop transformation theory and optimizes both spatial and temporal data locality. The created threads exhibit high level of intra-thread and inter-thread data locality which effectively reduces both the data cache miss rates and the total execution time of numerically intensive computation running on a multithreaded processor.