MemSpy: analyzing memory system bottlenecks in programs
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Integrating performance monitoring and communication in parallel computers
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Using the SimOS machine simulator to study complex computer systems
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Continuous profiling: where have all the cycles gone?
ACM Transactions on Computer Systems (TOCS)
ProfileMe: hardware support for instruction-level profiling on out-of-order processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Using hardware performance monitors to isolate memory bottlenecks
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Tools for application-oriented performance tuning
ICS '01 Proceedings of the 15th international conference on Supercomputing
Removing the overhead from software-based shared memory
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
SimICS/sun4m: a virtual workstation
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Fast data-locality profiling of native execution
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Intermediately executed code is the key to find refactorings that improve temporal data locality
Proceedings of the 3rd conference on Computing frontiers
Detailed cache simulation for detecting bottleneck, miss reason and optimization potentialities
valuetools '06 Proceedings of the 1st international conference on Performance evaluation methodolgies and tools
Cache optimizations for iterative numerical codes aware of hardware prefetching
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
RDVIS: a tool that visualizes the causes of low locality and hints program optimizations
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
CacheIn: a toolset for comprehensive cache inspection
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Collecting and exploiting cache-reuse metrics
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Comprehensive cache inspection with hardware monitors
PaCT'05 Proceedings of the 8th international conference on Parallel Computing Technologies
Hi-index | 0.00 |
The gap between CPU peak performance and achieved application performance widens as CPU complexity, as well as the gap between CPU cycle time and DRAM access time, increases. While advanced compilers can perform many optimizations to better utilize the cache system, the application programmer is still required to do some of the optimizations needed for efficient execution. Therefore, profiling should be performed on optimized binary code and performance problems reported to the programmer in an intuitive way. Existing performance tools do not have adequate functionality to address these needs. Here we introduce source interdependence profiling, SIP, as a paradigm to collect and present performance data to the programmer. SIP identifies the performance problems that remain after the compiler optimization and gives intuitive hints at the source-code level as to how they can be avoided. Instead of just collecting information about the events directly caused by each source-code statement, SIP also presents data about events from some interdependent statements of source code.A first SIP prototype tool has been implemented. It supports both C and Fortran programs. We describe how the tool was used to improve the performance of the SPEC CPU2000 183.equake application by 59 percent.