Field-testing IMPACT EPIC research results in Itanium 2

Authors:
John W. Sias;Sain-zee Ueng;Geoff A. Kent;Ian M. Steiner;Erik M. Nystrom;Wen-mei W. Hwu
Affiliations:
University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign
Venue:
Proceedings of the 31st annual international symposium on Computer architecture
Year:
2004

Citing 19
Cited 9

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Profile-guided automatic inline expansion for C programs

Software—Practice & Experience
Eliminating false data dependences using the Omega test

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Sentinel scheduling for VLIW and superscalar processors

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Dynamic memory disambiguation using the memory conflict buffer

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Global predicate analysis and its application to register allocation

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Integrated predicated and speculative execution in the IMPACT EPIC architecture

Proceedings of the 25th annual international symposium on Computer architecture
Wavefront scheduling: path based data representation and scheduling of subgraphs

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Modular interprocedural pointer analysis using access paths: design, implementation, and evaluation

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Accurate and efficient predicate analysis with binary decision diagrams

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
The impact of if-conversion and branch prediction on program execution on the Intel® Itanium™ processor

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The Intel IA-64 Compiler Code Generator

IEEE Micro
Compiler optimization-space exploration

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Itanium 2 Processor Microarchitecture

IEEE Micro
A compiler framework for speculative analysis and optimizations

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Memory Latency-Tolerance Approaches for Itanium Processors: Out-of-Order Execution vs.Speculative Precomputation

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Beating in-order stalls with "flea-flicker" two-pass pipelining

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture

Automatic Thread Extraction with Decoupled Software Pipelining

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
"Flea-flicker" Multipass Pipelining: An Alternative to the High-Power Out-of-Order Offense

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Beating In-Order Stalls with "Flea-Flicker" Two-Pass Pipelining

IEEE Transactions on Computers
Tolerating Cache-Miss Latency with Multipass Pipelines

IEEE Micro
Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Profile-assisted Compiler Support for Dynamic Predication in Diverge-Merge Processors

Proceedings of the International Symposium on Code Generation and Optimization
Communication optimizations for global multi-threaded instruction scheduling

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Trimaran: an infrastructure for research in instruction-level parallelism

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
An intermediate representation for speculative optimizations in a dynamic compiler

Proceedings of the 7th ACM workshop on Virtual machines and intermediate languages

Quantified Score

Hi-index	0.00

Visualization

Abstract

Explicitly-Parallel Instruction Computing (EPIC) providesarchitectural features, including predication and explicitcontrol speculation, intended to enhance the compiler'sability to expose instruction-level parallelism (ILP) incontrol-intensive programs. Aggressive structural transformationsusing these features, though described in theliterature, have not yet been fully characterized in completesystems. Using the Intel Itanium 2 microprocessor,the SPECint2000 benchmarks and the IMPACT Compilerfor IA-64, a research compiler competitive with thebest commercial compilers on the platform, we providean in situ evaluation of code generated using aggressive,EPIC-enabled techniques in a reality-constrained microarchitecture.Our work shows a 1.13 average speedup(up to 1.50) due to these compilation techniques, relativeto traditionally-optimized code at the same inlining andpointer analysis levels, and a 1.55 speedup (up to 2.30) relativeto GNU GCC, a solid traditional compiler. Detailedresults show that the structural compilation approach providesbenefits far beyond a decrease in branch mispredictionpenalties and that it both positively and negatively impactsinstruction cache performance. We also demonstratethe increasing significance of runtime effects, such as datacache and TLB, in determining end performance and theinteraction of these effects with control speculation.