SIP: Performance Tuning through Source Code Interdependence

Authors:
Erik Berg;Erik Hagersten
Affiliations:
-;-
Venue:
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Year:
2002

Citing 10
Cited 8

MemSpy: analyzing memory system bottlenecks in programs

SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Integrating performance monitoring and communication in parallel computers

Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Using the SimOS machine simulator to study complex computer systems

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Continuous profiling: where have all the cycles gone?

ACM Transactions on Computer Systems (TOCS)
ProfileMe: hardware support for instruction-level profiling on out-of-order processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Using hardware performance monitors to isolate memory bottlenecks

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Tools for application-oriented performance tuning

ICS '01 Proceedings of the 15th international conference on Supercomputing
Removing the overhead from software-based shared memory

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Cache Profiling and the SPEC Benchmarks: A Case Study

Computer
SimICS/sun4m: a virtual workstation

ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference

Fast data-locality profiling of native execution

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Intermediately executed code is the key to find refactorings that improve temporal data locality

Proceedings of the 3rd conference on Computing frontiers
Detailed cache simulation for detecting bottleneck, miss reason and optimization potentialities

valuetools '06 Proceedings of the 1st international conference on Performance evaluation methodolgies and tools
Cache optimizations for iterative numerical codes aware of hardware prefetching

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
RDVIS: a tool that visualizes the causes of low locality and hints program optimizations

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
CacheIn: a toolset for comprehensive cache inspection

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Collecting and exploiting cache-reuse metrics

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Comprehensive cache inspection with hardware monitors

PaCT'05 Proceedings of the 8th international conference on Parallel Computing Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

The gap between CPU peak performance and achieved application performance widens as CPU complexity, as well as the gap between CPU cycle time and DRAM access time, increases. While advanced compilers can perform many optimizations to better utilize the cache system, the application programmer is still required to do some of the optimizations needed for efficient execution. Therefore, profiling should be performed on optimized binary code and performance problems reported to the programmer in an intuitive way. Existing performance tools do not have adequate functionality to address these needs. Here we introduce source interdependence profiling, SIP, as a paradigm to collect and present performance data to the programmer. SIP identifies the performance problems that remain after the compiler optimization and gives intuitive hints at the source-code level as to how they can be avoided. Instead of just collecting information about the events directly caused by each source-code statement, SIP also presents data about events from some interdependent statements of source code.A first SIP prototype tool has been implemented. It supports both C and Fortran programs. We describe how the tool was used to improve the performance of the SPEC CPU2000 183.equake application by 59 percent.