Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
An overview for the PTRAN analysis system for multiprocessing
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Automatic recognition of induction variables and recurrence relations by abstract interpretation
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Compilation-based prefetching for memory latency tolerance
Compilation-based prefetching for memory latency tolerance
Beyond induction variables: detecting and classifying sequences using a demand-driven SSA form
ACM Transactions on Programming Languages and Systems (TOPLAS)
Tolerating latency through software-controlled data prefetching
Tolerating latency through software-controlled data prefetching
Compiler techniques for data prefetching on the PowerPC
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Symbolic analysis for parallelizing compilers
ACM Transactions on Programming Languages and Systems (TOPLAS)
Vortex: an optimizing compiler for object-oriented languages
Proceedings of the 11th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Data prefetching on the HP PA-8000
Proceedings of the 24th annual international symposium on Computer architecture
Automatic loop transformations and parallelization for Java
Proceedings of the 14th international conference on Supercomputing
ACM Computing Surveys (CSUR)
Communications of the ACM
Data Flow Analysis for Software Prefetching Linked Data Structures in Java
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
A Language-Independent Garbage Collector Toolkit
A Language-Independent Garbage Collector Toolkit
Compiler support for software prefetching
Compiler support for software prefetching
Induction variable analysis without idiom recognition: beyond monotonicity
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Guided region prefetching: a cooperative hardware/software approach
Proceedings of the 30th annual international symposium on Computer architecture
On the performance of trace locality of reference
Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Hi-index | 0.00 |
Java is becoming a viable choice for numerical algorithms due to the software engineering benefits of object-oriented programming. Because these programs still use large arrays that do not fit in the cache, they continue to suffer from poor memory performance. To hide memory latency, we describe a new unified compile-time analysis for software prefetching arrays and linked structures in Java. Our previous work uses data-flow analysis to discover linked data structure accesses, and here we present a more general version that also identifies loop induction variables used in array accesses. Our algorithm schedules prefetches for all array references that contain induction variables. We evaluate our technique using a simulator of an out-of-order superscalar processor running a set of array-based Java programs. Across all our programs, prefetching reduces execution time by a geometric mean of 23%, and the largest improvement is 58%. We also evaluate prefetching on a PowerPC processor, and we show that prefetching reduces execution time by a geometric mean of 17%. Traditional software prefetching algorithms for C and Fortran use locality analysis and sophisticated loop transformations. Because our analysis is much simpler and quicker, it is suitable for including in a just-in-time compiler. We further show that the additional loop transformations and careful scheduling of prefetches used in previous work are not always necessary for modern architectures and Java programs.