Estimating interlock and improving balance for pipelined architectures

Authors:
David Callahan;John Cocke;Ken Kennedy
Affiliations:
-;-;-
Venue:
Journal of Parallel and Distributed Computing
Year:
1988

Citing 0
Cited 48

Overlapped loop support in the Cydra 5

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Analysis and transformation in the ParaScope editor

ICS '91 Proceedings of the 5th international conference on Supercomputing
Interprocedural transformations for parallel code generation

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Unexpected side effects of inline substitution: a case study

ACM Letters on Programming Languages and Systems (LOPLAS)
Compiler optimizations for improving data locality

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Improving the ratio of memory operations to floating-point operations in loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler transformations for high-performance computing

ACM Computing Surveys (CSUR)
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Combining loop transformations considering caches and scheduling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Unroll-and-jam using uniformly generated sets

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Code transformations to improve memory parallelism

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Tools for application-oriented performance tuning

ICS '01 Proceedings of the 15th international conference on Supercomputing
Loop Transformations for Architectures with Partitioned Register Banks

OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Register tiling in nonrectangular iteration spaces

ACM Transactions on Programming Languages and Systems (TOPLAS)
HPCVIEW: A Tool for Top-down Analysis of Node Performance

The Journal of Supercomputing
Combining Loop Transformations Considering Caches and Scheduling

International Journal of Parallel Programming
Achieving Scalable Locality with Time Skewing

International Journal of Parallel Programming
Interactive Parallel Programming using the ParaScope Editor

IEEE Transactions on Parallel and Distributed Systems
Time Skewing for Parallel Computers

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Compiler-Directed Dynamic Frequency and Voltage Scheduling

PACS '00 Proceedings of the First International Workshop on Power-Aware Computer Systems-Revised Papers
Optimizing Loop Performance for Clustered VLIW Architectures

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
On the Parallel Execution Time of Tiled Loops

IEEE Transactions on Parallel and Distributed Systems
Predicting whole-program locality through reuse distance analysis

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Improving effective bandwidth through compiler enhancement of global cache reuse

Journal of Parallel and Distributed Computing
An experimental evaluation of scalar replacement on scientific benchmarks

Software—Practice & Experience
Improving register allocation for subscripted variables

ACM SIGPLAN Notices - Best of PLDI 1979-1999
The Energy Impact of Aggressive Loop Fusion

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
The Potential of Computation Regrouping for Improving Locality

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Complementing software pipelining with software thread integration

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Optimizing Sparse Matrix-Vector Product Computations Using Unroll and Jam

International Journal of High Performance Computing Applications
On the decidability of phase ordering problem in optimizing compilation

Proceedings of the 3rd conference on Computing frontiers
Reaching fast code faster: using modeling for efficient software thread integration on a VLIW DSP

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
A metric space for computer programs and the principle of computational least action

The Journal of Supercomputing
Hierarchical hybrid grids: achieving TERAFLOP performance on large scale finite element simulations

International Journal of Parallel, Emergent and Distributed Systems
A Case Study of Some Issues in the Optimization of Fortran 90 Array Notation

Scientific Programming
Computational forces in the Linpack benchmark

Journal of Parallel and Distributed Computing
Roofline: an insightful visual performance model for multicore architectures

Communications of the ACM - A Direct Path to Dependable Software
Computational forces in the SAGE benchmark

Journal of Parallel and Distributed Computing
Program locality analysis using reuse distance

ACM Transactions on Programming Languages and Systems (TOPLAS)
Instruction balance and its relation to program energy consumption

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Dimensional analysis applied to a parallel QR algorithm

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Self-similarity of parallel machines

Parallel Computing
Balance principles for algorithm-architecture co-design

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
On the communication complexity of 3D FFTs and its implications for Exascale

Proceedings of the 26th ACM international conference on Supercomputing
A coldness metric for cache optimization

Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Optimizing the performance of streaming numerical kernels on the IBM Blue Gene/P PowerPC 450 processor

International Journal of High Performance Computing Applications
Software thread integration for instruction-level parallelism

ACM Transactions on Embedded Computing Systems (TECS)
Computer performance analysis and the Pi Theorem

Computer Science - Research and Development

Quantified Score

Hi-index	0.00

Estimating interlock and improving balance for pipelined architectures

Quantified Score

Visualization

Abstract