Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Interprocedural dataflow analysis in an executable optimizer
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Continuous profiling: where have all the cycles gone?
Proceedings of the sixteenth ACM symposium on Operating systems principles
System support for automatic profiling and optimization
Proceedings of the sixteenth ACM symposium on Operating systems principles
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Optimizing alpha executables on Windows NT with spike
Digital Technical Journal
Alto: a link-time optimizer for the Compaq alpha
Software—Practice & Experience
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Profile-guided post-link stride prefetching
ICS '02 Proceedings of the 16th international conference on Supercomputing
Optimizations to prevent cache penalties for the Intel® Itanium® 2 Processor
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Optimization opportunities created by global data reordering
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Instrumentation and optimization of Win32/intel executables using Etch
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Reducing program image size by extracting frozen code and data
Proceedings of the 4th ACM international conference on Embedded software
A Dependency Chain Clustered Microarchitecture
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Fast and efficient partial code reordering: taking advantage of dynamic recompilatior
Proceedings of the 5th international symposium on Memory management
Dynamic code management: improving whole program code locality in managed runtimes
Proceedings of the 2nd international conference on Virtual execution environments
Link-time compaction and optimization of ARM executables
ACM Transactions on Embedded Computing Systems (TECS)
Performance driven data cache prefetching in a dynamic software optimization system
Proceedings of the 21st annual international conference on Supercomputing
PFetch: software prefetching exploiting temporal predictability of memory access streams
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Improving TriMedia cache performance by profile guided code reordering
SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
A study of the performance potential for dynamic instruction hints selection
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Issues and support for dynamic register allocation
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
MAO -- An extensible micro-architectural optimizer
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Entropy-based profile characterization and classification for automatic profile management
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
A compiler-level intermediate representation based binary analysis and rewriting system
Proceedings of the 8th ACM European Conference on Computer Systems
Hi-index | 0.00 |
Ispike is post-link optimizer developed for theIntel®Itanium Processor Family (IPF) processors.TheIPF architecture poses both opportunities and challenges topost-link optimizations.IPF offers a rich set of performancecounters to collect detailed profile information at a low cost,which is essential to post-link optimization being practical.At the same time, the prediction and bundling features onIPF make post-link code transformation more challengingthan on other architectures.In Ispike, we have implementedoptimizations like code layout, instruction prefetching, datalayout, and data prefetching that exploit the IPF advantages,and strategies that cope with the IPF-specific challenges.Using SPEC CINT2000 as benchmarks, we showthat Ispike improves performance by as much as 40% on theItanium®2 processor, with average improvement of 8.5%and 9.9% over executables generated by the Intel®Electroncompiler and by the Gcc compiler, respectively.We alsodemonstrate that statistical profiles collected via IPF performancecounters and complete profiles collected via instrumentationproduce equal performance benefit, but theprofiling overhead is significantly lower for performancecounters.