The program dependence graph and its use in optimization
ACM Transactions on Programming Languages and Systems (TOPLAS)
Estimating interlock and improving balance for pipelined architectures
Journal of Parallel and Distributed Computing
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Efficiently computing static single assignment form and the control dependence graph
ACM Transactions on Programming Languages and Systems (TOPLAS)
Limits of control flow on parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Unifying data and control transformations for distributed shared-memory machines
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The intrinsic bandwidth requirements of ordinary programs
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Olden: parallelizing programs with dynamic data structures on distributed-memory machines
Olden: parallelizing programs with dynamic data structures on distributed-memory machines
Data-centric multi-level blocking
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
New tiling techniques to improve cache temporal locality
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Quantifying loop nest locality using SPEC'95 and the perfect benchmarks
ACM Transactions on Computer Systems (TOCS)
Proceedings of the 14th international conference on Supercomputing
Transforming loops to recursion for multi-level memory hierarchies
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Tools for application-oriented performance tuning
ICS '01 Proceedings of the 15th international conference on Supercomputing
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Conversion of control dependence to data dependence
POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
International Journal of Parallel Programming
Achieving Scalable Locality with Time Skewing
International Journal of Parallel Programming
Collective Loop Fusion for Array Contraction
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Blocking Linear Algebra Codes for Memory Hierarchies
Proceedings of the Fourth SIAM Conference on Parallel Processing for Scientific Computing
Compiler-directed run-time monitoring of program data access
Proceedings of the 2002 workshop on Memory system performance
Compile-time composition of run-time data and iteration reorderings
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Predicting whole-program locality through reuse distance analysis
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Aspects of cache memory and instruction buffer performance
Aspects of cache memory and instruction buffer performance
Miss Rate Prediction across All Program Inputs
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Improving effective bandwidth through compiler enhancement of global cache reuse
Journal of Parallel and Distributed Computing
Cost effective dynamic program slicing
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Cross-architecture performance predictions for scientific applications using parameterized models
Proceedings of the joint international conference on Measurement and modeling of computer systems
Restructuring computations for temporal data cache locality
International Journal of Parallel Programming
Metrics and models for reordering transformations
MSP '04 Proceedings of the 2004 workshop on Memory system performance
Reuse-distance-based miss-rate prediction on a per instruction basis
MSP '04 Proceedings of the 2004 workshop on Memory system performance
The history of Fortran I, II, and III
History of programming languages I
Locality approximation using time
Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Feedback-directed thread scheduling with memory considerations
Proceedings of the 16th international symposium on High performance distributed computing
Task ordering and memory management problem for degree of parallelism estimation
COCOON'11 Proceedings of the 17th annual international conference on Computing and combinatorics
MiniTasking: improving cache performance for multiple query workloads
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Optimization of dense matrix multiplication on IBM cyclops-64: challenges and experiences
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Hi-index | 0.00 |
Improving program locality has become increasingly important on modern computer systems. An effective strategy is to group computations on the same data so that once the data are loaded into cache, the program performs all their operations before the data are evicted. However, computation regrouping is difficult to automate for programs with complex data and control structures. This paper studies the potential of locality improvement through trace-driven computation regrouping. First, it shows that maximizing the locality is different from maximizing the parallelism or maximizing the cache utilization. The problem is NP-hard even without considering data dependences and cache organization. Then the paper describes a tool that performs constrained computation regrouping on program traces. The new tool is unique because it measures the exact control dependences and applies complete memory renaming and re-allocation. Using the tool, the paper measures the potential locality improvement in a set of commonly used benchmark programs written in C.