A VLIW architecture for a trace scheduling compiler
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Efficient (stack) algorithms for analysis of write-back and sector memories
ACM Transactions on Computer Systems (TOCS)
Determining average program execution times and their variance
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
Machine Characterization Based on an Abstract High-Level Language Machine
IEEE Transactions on Computers
Efficient trace-driven simulation methods for cache performance analysis
ACM Transactions on Computer Systems (TOCS)
Using profile information to assist classic code optimizations
Software—Practice & Experience
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Set-associative cache simulation using generalized binomial trees
ACM Transactions on Computer Systems (TOCS)
Precise miss analysis for program transformations with caches of arbitrary associativity
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Dependence Analysis
Automatic Performance Prediction of Parallel Programs
Automatic Performance Prediction of Parallel Programs
Parallel Programming with Polaris
Computer
Measuring Cache and TLB Performance and Their Effect on Benchmark Runtimes
IEEE Transactions on Computers
SvPablo: A Multi-language Performance Analysis System
TOOLS '98 Proceedings of the 10th International Conference on Computer Performance Evaluation: Modelling Techniques and Tools
FlexRAM: Toward an Advanced Intelligent Memory System
ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Analysis of Benchmark Characteristics and Benchmark Performance
Analysis of Benchmark Characteristics and Benchmark Performance
Trace Scheduling: A Technique for Global Microcode Compaction
IEEE Transactions on Computers
Evaluation techniques for storage hierarchies
IBM Systems Journal
Adaptively Mapping Code in an Intelligent Memory Architecture
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Predicting Cache Space Contention in Utility Computing Servers
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Understanding the behavior and implications of context switch misses
ACM Transactions on Architecture and Code Optimization (TACO)
A framework for an automatic hybrid MPI+OpenMP code generation
Proceedings of the 19th High Performance Computing Symposia
Adaptively increasing performance and scalability of automatically parallelized programs
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
PSnAP: accurate synthetic address streams through memory profiles
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Validating model-driven performance predictions on random software systems
QoSA'10 Proceedings of the 6th international conference on Quality of Software Architectures: research into Practice - Reality and Gaps
Resource optimization in distributed real-time multimedia applications
Multimedia Tools and Applications
Survey of scheduling techniques for addressing shared resources in multicore processors
ACM Computing Surveys (CSUR)
The Journal of Supercomputing
Hi-index | 0.00 |
In this paper we present results we obtained using a compiler to predict performance of scientific codes. The compiler, Polaris [3], is both the primary tool for estimating the performance of a range of codes, and the beneficiary of the results obtained from predicting the program behavior at compile time. We show that a simple compile-time model, augmented with profiling data obtained using very light instrumentation, can be accurate within 20% (on average) of the measured performance for codes using both dense and sparse computational methods.