Strategies for cache and local memory management by global program transformation
Proceedings of the 1st International Conference on Supercomputing
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A static performance estimator to guide data partitioning decisions
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Global instruction scheduling for superscalar machines
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Evaluation of compiler optimizations for Fortran D on MIMD distributed memory machines
ICS '92 Proceedings of the 6th international conference on Supercomputing
ICS '92 Proceedings of the 6th international conference on Supercomputing
Performance evaluation of instruction scheduling on the IBM RISC System/6000
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A static parameter based performance prediction tool for parallel programs
ICS '93 Proceedings of the 7th international conference on Supercomputing
Performance prediction of parallel processing systems: the PAMELA methodology
ICS '93 Proceedings of the 7th international conference on Supercomputing
Performance evaluation and prediction for parallel algorithms on the BBN GP1000
ICS '90 Proceedings of the 4th international conference on Supercomputing
Performance prediction of loop constructs on multiprocessor hierarchical-memory systems
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Partitioning and Scheduling Parallel Programs for Multiprocessors
Partitioning and Scheduling Parallel Programs for Multiprocessors
Compile-Time Estimation of Communication Costs on Multicomputers
IPPS '92 Proceedings of the 6th International Parallel Processing Symposium
On Estimating and Enhancing Cache Effectiveness
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Worst-case execution time analysis on modern processors
LCTES '95 Proceedings of the ACM SIGPLAN 1995 workshop on Languages, compilers, & tools for real-time systems
The importance of synchronization structure in parallel program optimization
ICS '97 Proceedings of the 11th international conference on Supercomputing
Compile-time minimisation of load imbalance in loop nests
ICS '97 Proceedings of the 11th international conference on Supercomputing
Calpa: a tool for automating selective dynamic compilation
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
A framework for performance-based program partitioning
Progress in computer research
A framework for performance-based program partitioning
Progress in computer research
Symbolic Performance Modeling of Parallel Systems
IEEE Transactions on Parallel and Distributed Systems
Fortran RED - A Retargetable Environment for Automatic Data Layout
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Symbolic Cost Estimation of Parallel Applications
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Parallel program performance prediction using deterministic task graph analysis
ACM Transactions on Computer Systems (TOCS)
SAGE: an automatic analyzing system for a new high-performance SoC architecture-processor-in-memory
Journal of Systems Architecture: the EUROMICRO Journal
Improving workload balance and code optimization on processor-in-memory systems
Journal of Systems and Software
Trust but verify: monitoring remotely executing programs for progress and correctness
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
On the decidability of phase ordering problem in optimizing compilation
Proceedings of the 3rd conference on Computing frontiers
Compiler-directed voltage scaling on communication links for reducing power consumption
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Languages and Compilers for Parallel Computing
Toward to utilize the heterogeneous multiple processors of the chip multiprocessor architecture
EUC'07 Proceedings of the 2007 international conference on Embedded and ubiquitous computing
Hi-index | 0.00 |
Optimizing compilers (particularly parallel compilers) are constrained by their ability to predict performance consequences of the transformations they apply. Many factors, such as unknowns in control structures, dynamic behavior of programs, and complexity of the underlying hardware, make it very difficult for compilers to estimate the performance of the transformations accurately and efficiently. In this paper, we present a performance prediction framework that combines several innovative approaches to solve this problem. First, the framework employs a detailed, architecture-specific, but portable, cost model that can be used to estimate the cost of straight line code efficiently. Second, aggregated costs of loops and conditional statements are computed and represented symbolically. This avoids unnecessary, premature guesses and preserves the precision of the prediction. Third, symbolic comparison allows compilers to choose the best transformation dynamically and systematically. Some methodologies for applying the framework to optimizing parallel compilers to support automatic, performance-guided program restructuring are discussed.