ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Data access microarchitectures for superscalar processors with compiler-assisted data prefetching
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
An effective on-chip preloading scheme to reduce data access penalty
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
The Stanford Dash Multiprocessor
Computer
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Interprocedural modification side effect analysis with pointer aliasing
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Context-sensitive interprocedural points-to analysis in the presence of function pointers
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
SUIF: an infrastructure for research on parallelizing and optimizing compilers
ACM SIGPLAN Notices
Supporting dynamic data structures on distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Tolerating latency through software-controlled data prefetching
Tolerating latency through software-controlled data prefetching
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Compiler techniques for data prefetching on the PowerPC
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
SPAID: software prefetching in pointer- and call-intensive environments
Proceedings of the 28th annual international symposium on Microarchitecture
Compiler-based prefetching for recursive data structures
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Prefetching using Markov predictors
Proceedings of the 24th annual international symposium on Computer architecture
Data prefetching on the HP PA-8000
Proceedings of the 24th annual international symposium on Computer architecture
Tolerating latency in multiprocessors through compiler-inserted prefetching
ACM Transactions on Computer Systems (TOCS)
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
New methods for dynamic storage allocation (Fast Fits)
SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
SPLASH: Stanford parallel applications for shared-memory
SPLASH: Stanford parallel applications for shared-memory
Optimizing the cache performance of non-numeric applications
Optimizing the cache performance of non-numeric applications
Improving index performance through prefetching
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Content-Based Prefetching: Initial Results
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Stride prefetching by dynamically inspecting objects
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Improving Hash Join Performance through Prefetching
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Design and Implementation of a Compiler Framework for Helper Threading on Multi-core Processors
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Data trace cache: an application specific cache architecture
MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
Reducing Cache Pollution via Dynamic Data Prefetch Filtering
IEEE Transactions on Computers
Improving hash join performance through prefetching
ACM Transactions on Database Systems (TODS)
Exploring the performance limits of simultaneous multithreading for memory intensive applications
The Journal of Supercomputing
Fast indexing for blocked array layouts to reduce cache misses
International Journal of High Performance Computing and Networking
Analysis and performance results of computing betweenness centrality on IBM Cyclops64
The Journal of Supercomputing
Automatically generating symbolic prefetches for distributed transactional memories
Proceedings of the ACM/IFIP/USENIX 11th International Conference on Middleware
Diagnosis and optimization of application prefetching performance
Proceedings of the 27th international ACM conference on International conference on supercomputing
C1C: A configurable, compiler-guided STT-RAM L1 cache
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
As the disparity between processor and memory speeds continues to grow, memory latency is becoming an increasingly important performance bottleneck. While software-controlled prefetching is an attractive technique for tolerating this latency, its success has been limited thus far to array-based numeric codes. In this paper, we expand the scope of automatic compiler-inserted prefetching to also include the recursive data structures commonly found in pointer-based applications.We propose three compiler-based prefetching schemes, and automate the most widely applicable scheme (greedy prefetching) in an optimizing research compiler. Our experimental results demonstrate that compiler-inserted prefetching can offer significant performance gains on both uniprocessors and large-scale shared-memory multiprocessors.