Optimally profiling and tracing programs
POPL '92 Proceedings of the 19th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
ATOM: a system for building customized program analysis tools
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
ProfileMe: hardware support for instruction-level profiling on out-of-order processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
An ASIP design methodology for embedded systems
CODES '99 Proceedings of the seventh international workshop on Hardware/software codesign
Dynamo: a transparent dynamic optimization system
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
NetBench: a benchmarking suite for network processors
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
IDtrace - A Tracing Tool for i486 Simulation
MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
SpixTools: Introduction and User's Manual
SpixTools: Introduction and User's Manual
Shade: A Fast Instruction Set Simulator for Execution Profiling
Shade: A Fast Instruction Set Simulator for Execution Profiling
Frequent loop detection using efficient non-intrusive on-chip hardware
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Input data reuse in compiling window operations onto reconfigurable hardware
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
An FPGA-based VLIW processor with custom hardware execution
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Optimized Generation of Data-Path from C Codes for FPGAs
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
A Partitioning Methodology for Accelerating Applications in Hybrid Reconfigurable Platforms
Proceedings of the conference on Design, Automation and Test in Europe - Volume 3
A Framework for Partitioning Computational Intensive Applications in Hybrid Reconfigurable Platforms
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
Fine-grained application source code profiling for ASIP design
Proceedings of the 42nd annual Design Automation Conference
Frequent Loop Detection Using Efficient Nonintrusive On-Chip Hardware
IEEE Transactions on Computers
An architectural level design methodology for embedded face detection
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Profiling of parallel processing programs on shared memory multiprocessors using Simics
ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Automated framework for partitioning DSP applications in hybrid reconfigurable platforms
Microprocessors & Microsystems
Journal of VLSI Signal Processing Systems
Two-level microprocessor-accelerator partitioning
Proceedings of the conference on Design, automation and test in Europe
A one-shot configurable-cache tuner for improved energy and performance
Proceedings of the conference on Design, automation and test in Europe
Proceedings of the conference on Design, automation and test in Europe
ACM Transactions on Design Automation of Electronic Systems (TODAES)
EURASIP Journal on Applied Signal Processing
Towards automatic program partitioning
Proceedings of the 6th ACM conference on Computing frontiers
Modern development methods and tools for embedded reconfigurable systems: A survey
Integration, the VLSI Journal
Binary acceleration using coarse-grained reconfigurable architecture
ACM SIGARCH Computer Architecture News
Automatic distribution of sequential code using javasymphony middleware
SOFSEM'06 Proceedings of the 32nd conference on Current Trends in Theory and Practice of Computer Science
Power efficient instruction caches for embedded systems
SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Loop acceleration exploration for ASIP architecture
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 0.00 |
Loops constitute the most executed segments of programs and therefore are the best candidates for hardware software partitioning. We present a set of profiling tools that are specifically dedicated to loop profiling and do support combined function and loop profiling. One tool relies on an instruction set simulator and can therefore be augmented with architecture and micro-architecture features simulation while the other is based on compile-time instrumentation of gcc and therefore has very little slow down compared to the original program We use the results of the profiling to identify the compute core in each benchmark and study the effect of compile-time optimization on the distribution of cores in a program. We also study the potential speedup that can be achieved using a configurable system on a chip, consisting of a CPU embedded on an FPGA, as an example application of these tools in hardware/software partitioning.