Effective stream-based and execution-based data prefetching
Proceedings of the 18th annual international conference on Supercomputing
Design and Implementation of a Compiler Framework for Helper Threading on Multi-core Processors
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
On the importance of optimizing the configuration of stream prefetchers
Proceedings of the 2005 workshop on Memory system performance
Hi-index | 0.00 |
As microprocessor speeds increase, a large fraction of the execution time is often lost to cache miss penalties. This loss can be particularly severe in processors such as the UltraSPARC-IIICu which have in-order execution and block on cache misses. Such processors rely greatly on the compiler to reduce stalls and achieve high performance. This paper describes a compiler technique for software prefetching that is aware of the specific prefetch behaviors of the target processor. The implementation targets loops containing control-flow and strided or irregular memory access patterns. A two phase locality analysis, capable of handling complex subscript expressions, is used for enhanced identification of prefetch candidates. Prefetch instructions are scheduled with careful consideration of the prefetch behaviors in the target system. Compared to a previous implementation, our technique produced performance improvements of 9% on the geometric mean, and up to 44% on individual tests, in Sunýs first UltraSPARC-IIICu based SPEC CPU2000 submission [5] and has been used in all later submissions to date.