Efficient hardware-based nonintrusive dynamic application profiling

Authors:
Ajay Nair;Karthik Shankar;Roman Lysecky
Affiliations:
University of Arizona, Tucson, AZ;University of Arizona, Tucson, AZ;University of Arizona, Tucson, AZ
Venue:
ACM Transactions on Embedded Computing Systems (TECS)
Year:
2011

Citing 30
Cited 1

Efficient path profiling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Continuous profiling: where have all the cycles gone?

ACM Transactions on Computer Systems (TOCS)
System support for automatic profiling and optimization

Proceedings of the sixteenth ACM symposium on Operating systems principles
Value profiling

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
ProfileMe: hardware support for instruction-level profiling on out-of-order processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Common-case computation: a high-level technique for power and performance optimization

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
A low power hardware/software partitioning approach for core-based embedded systems

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Instruction fetch energy reduction using loop caches for embedded applications with small tight loops

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Dynamo: a transparent dynamic optimization system

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Performance analysis using the MIPS R10000 performance counters

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
A scalable cross-platform infrastructure for application performance tuning using hardware counters

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Automatic source code specialization for energy reduction

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Dynamic Binary Translation and Optimization

IEEE Transactions on Computers
A compiler framework for mapping applications to a coarse-grained reconfigurable computer architecture

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
A fast on-chip profiler memory

Proceedings of the 39th annual Design Automation Conference
Energy Advantages of Microprocessor Platforms with On-Chip Configurable Logic

IEEE Design & Test
FX!32: A Profile-Directed Binary Translator

IEEE Micro
Pentium 4 Performance-Monitoring Features

IEEE Micro
A Programmable Co-processor for Profiling

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
A compiled accelerator for biological cell signaling simulations

FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Energy savings and speedups from partitioning critical software loops to hardware in embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
Competitive algorithms for the dynamic selection of component implementations

IBM Systems Journal
Optimized Generation of Data-Path from C Codes for FPGAs

Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Owl: next generation system monitoring

Proceedings of the 2nd conference on Computing frontiers
Frequent Loop Detection Using Efficient Nonintrusive On-Chip Hardware

IEEE Transactions on Computers
Exploiting Fixed Programs in Embedded Systems: A Loop Cache Example

IEEE Computer Architecture Letters
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Warp Processors

Proceedings of the 41st annual Design Automation Conference
A dynamic binary instrumentation engine for the ARM architecture

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Non-intrusive dynamic application profiler for detailed loop execution characterization

CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems

A comparison of the influence of different multi-core processors on the runtime overhead for application-level monitoring

MSEPT'12 Proceedings of the 2012 international conference on Multicore Software Engineering, Performance, and Tools

Quantified Score

Hi-index	0.00

Visualization

Abstract

Application profiling—the process of monitoring an application to determine the frequency of execution within specific regions—is an essential step within the design process for many software and hardware systems. Profiling is often a critical step within hardware/software partitioning utilized to determine the critical kernels of an application. In this article, we present an innovative, nonintrusive dynamic application profiler (DAProf) capable of profiling an executing application by monitoring the application's short backward branches, function calls, and function returns. The resulting profile information provides an accurate characterization of the frequently executed loops within the application providing a breakdown of loop executions versus loop iterations per execution. DAProf achieves excellent profiling accuracy with an average accuracy of 98% for loop executions, 97% for average iterations per execution, and 95% for percentage of execution time. In addition, the presented dynamic application profiler incurs as little as 11% area overhead compared to an ARM9 microprocessor. DAProf is ideally suited for rapidly profiling software applications and dynamic optimization approaches such as dynamic hardware/software partitioning in which detailed loop execution information is needed to provide accurate performance estimates.