Precise compile-time performance prediction for superscalar-based computers

Authors:
Ko-Yang Wang
Affiliations:
IBM T. J. Watson Research Center,P.O. Box 704, Yorktown Heights, NY, USA
Venue:
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Year:
1994

Citing 14
Cited 17

Strategies for cache and local memory management by global program transformation

Proceedings of the 1st International Conference on Supercomputing
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A static performance estimator to guide data partitioning decisions

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Global instruction scheduling for superscalar machines

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Evaluation of compiler optimizations for Fortran D on MIMD distributed memory machines

ICS '92 Proceedings of the 6th international conference on Supercomputing
Automatic performance prediction to support parallelization of Fortran programs for massively parallel systems

ICS '92 Proceedings of the 6th international conference on Supercomputing
Performance evaluation of instruction scheduling on the IBM RISC System/6000

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A static parameter based performance prediction tool for parallel programs

ICS '93 Proceedings of the 7th international conference on Supercomputing
Performance prediction of parallel processing systems: the PAMELA methodology

ICS '93 Proceedings of the 7th international conference on Supercomputing
Performance evaluation and prediction for parallel algorithms on the BBN GP1000

ICS '90 Proceedings of the 4th international conference on Supercomputing
Performance prediction of loop constructs on multiprocessor hierarchical-memory systems

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Partitioning and Scheduling Parallel Programs for Multiprocessors

Partitioning and Scheduling Parallel Programs for Multiprocessors
Compile-Time Estimation of Communication Costs on Multicomputers

IPPS '92 Proceedings of the 6th International Parallel Processing Symposium
On Estimating and Enhancing Cache Effectiveness

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing

Worst-case execution time analysis on modern processors

LCTES '95 Proceedings of the ACM SIGPLAN 1995 workshop on Languages, compilers, & tools for real-time systems
The importance of synchronization structure in parallel program optimization

ICS '97 Proceedings of the 11th international conference on Supercomputing
Compile-time minimisation of load imbalance in loop nests

ICS '97 Proceedings of the 11th international conference on Supercomputing
Calpa: a tool for automating selective dynamic compilation

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
A framework for performance-based program partitioning

Progress in computer research
A framework for performance-based program partitioning

Progress in computer research
Symbolic Performance Modeling of Parallel Systems

IEEE Transactions on Parallel and Distributed Systems
Fortran RED - A Retargetable Environment for Automatic Data Layout

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Symbolic Cost Estimation of Parallel Applications

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Parallel program performance prediction using deterministic task graph analysis

ACM Transactions on Computer Systems (TOCS)
SAGE: an automatic analyzing system for a new high-performance SoC architecture-processor-in-memory

Journal of Systems Architecture: the EUROMICRO Journal
Improving workload balance and code optimization on processor-in-memory systems

Journal of Systems and Software
Trust but verify: monitoring remotely executing programs for progress and correctness

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
On the decidability of phase ordering problem in optimizing compilation

Proceedings of the 3rd conference on Computing frontiers
Compiler-directed voltage scaling on communication links for reducing power consumption

ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Critical Block Scheduling: A Thread-Level Parallelizing Mechanism for a Heterogeneous Chip Multiprocessor Architecture

Languages and Compilers for Parallel Computing
Toward to utilize the heterogeneous multiple processors of the chip multiprocessor architecture

EUC'07 Proceedings of the 2007 international conference on Embedded and ubiquitous computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Optimizing compilers (particularly parallel compilers) are constrained by their ability to predict performance consequences of the transformations they apply. Many factors, such as unknowns in control structures, dynamic behavior of programs, and complexity of the underlying hardware, make it very difficult for compilers to estimate the performance of the transformations accurately and efficiently. In this paper, we present a performance prediction framework that combines several innovative approaches to solve this problem. First, the framework employs a detailed, architecture-specific, but portable, cost model that can be used to estimate the cost of straight line code efficiently. Second, aggregated costs of loops and conditional statements are computed and represented symbolically. This avoids unnecessary, premature guesses and preserves the precision of the prediction. Third, symbolic comparison allows compilers to choose the best transformation dynamically and systematically. Some methodologies for applying the framework to optimizing parallel compilers to support automatic, performance-guided program restructuring are discussed.