TotalProf: a fast and accurate retargetable source code profiler

  • Authors:
  • Lei Gao;Jia Huang;Jianjiang Ceng;Rainer Leupers;Gerd Ascheid;Heinrich Meyr

  • Affiliations:
  • RWTH Aachen University, Aachen, Germany;RWTH Aachen University, Aachen, Germany;RWTH Aachen University, Aachen, Germany;RWTH Aachen University, Aachen, Germany;RWTH Aachen University, Aachen, Germany;RWTH Aachen University, Aachen, Germany

  • Venue:
  • CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Profilers play an important role in software/hardware design, optimization, and verification. Various approaches have been proposed to implement profilers. The most widespread approach adopted in the embedded domain is Instruction Set Simulation (ISS) based profiling, which provides uncompromised accuracy but limited execution speed. Source code profilers, on the contrary, are fast but less accurate. This paper introduces TotalProf, a fast and accurate source code cross profiler that estimates the performance of an application from three aspects: First, code optimization and a novel virtual compiler backend are employed to resemble the course of target compilation. Second, an optimistic static scheduler is introduced to estimate the behavior of the target processor's datapath. Last but not least, dynamic events, such as cache misses, bus contention and branch prediction failures, are simulated at runtime. With an abstract architecture description, the tool can be easily retargeted in a performance characteristics oriented way to estimate different processor architectures, including DSPs and VLIW machines. Multiple instances of TotalProf can be integrated with SystemC to support heterogeneous Multi-Processor System-on-Chip (MPSoC) profiling. With only about a 5 to 15% error rate introduced to the major performance metrics, such as cycle count, memory accesses and cache misses, a more than one Giga-Instruction-Per-Second (GIPS) execution speed is achieved.