Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Continuous profiling: where have all the cycles gone?
ACM Transactions on Computer Systems (TOCS)
System support for automatic profiling and optimization
Proceedings of the sixteenth ACM symposium on Operating systems principles
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
ProfileMe: hardware support for instruction-level profiling on out-of-order processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Common-case computation: a high-level technique for power and performance optimization
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
A low power hardware/software partitioning approach for core-based embedded systems
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Dynamo: a transparent dynamic optimization system
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Performance analysis using the MIPS R10000 performance counters
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
A scalable cross-platform infrastructure for application performance tuning using hardware counters
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Automatic source code specialization for energy reduction
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Dynamic Binary Translation and Optimization
IEEE Transactions on Computers
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
A fast on-chip profiler memory
Proceedings of the 39th annual Design Automation Conference
FX!32: A Profile-Directed Binary Translator
IEEE Micro
Pentium 4 Performance-Monitoring Features
IEEE Micro
A Programmable Co-processor for Profiling
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
A compiled accelerator for biological cell signaling simulations
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
ACM Transactions on Embedded Computing Systems (TECS)
Competitive algorithms for the dynamic selection of component implementations
IBM Systems Journal
Optimized Generation of Data-Path from C Codes for FPGAs
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Owl: next generation system monitoring
Proceedings of the 2nd conference on Computing frontiers
Frequent Loop Detection Using Efficient Nonintrusive On-Chip Hardware
IEEE Transactions on Computers
Exploiting Fixed Programs in Embedded Systems: A Loop Cache Example
IEEE Computer Architecture Letters
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Proceedings of the 41st annual Design Automation Conference
A dynamic binary instrumentation engine for the ARM architecture
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Non-intrusive dynamic application profiler for detailed loop execution characterization
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
MSEPT'12 Proceedings of the 2012 international conference on Multicore Software Engineering, Performance, and Tools
Hi-index | 0.00 |
Application profiling—the process of monitoring an application to determine the frequency of execution within specific regions—is an essential step within the design process for many software and hardware systems. Profiling is often a critical step within hardware/software partitioning utilized to determine the critical kernels of an application. In this article, we present an innovative, nonintrusive dynamic application profiler (DAProf) capable of profiling an executing application by monitoring the application's short backward branches, function calls, and function returns. The resulting profile information provides an accurate characterization of the frequently executed loops within the application providing a breakdown of loop executions versus loop iterations per execution. DAProf achieves excellent profiling accuracy with an average accuracy of 98% for loop executions, 97% for average iterations per execution, and 95% for percentage of execution time. In addition, the presented dynamic application profiler incurs as little as 11% area overhead compared to an ARM9 microprocessor. DAProf is ideally suited for rapidly profiling software applications and dynamic optimization approaches such as dynamic hardware/software partitioning in which detailed loop execution information is needed to provide accurate performance estimates.