Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Scalable high speed IP routing lookups
SIGCOMM '97 Proceedings of the ACM SIGCOMM '97 conference on Applications, technologies, architectures, and protocols for computer communication
Continuous profiling: where have all the cycles gone?
Proceedings of the sixteenth ACM symposium on Operating systems principles
System support for automatic profiling and optimization
Proceedings of the sixteenth ACM symposium on Operating systems principles
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
ProfileMe: hardware support for instruction-level profiling on out-of-order processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Common-case computation: a high-level technique for power and performance optimization
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
A power reduction technique with object code merging for application specific embedded processors
DATE '00 Proceedings of the conference on Design, automation and test in Europe
Dynamo: a transparent dynamic optimization system
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Performance analysis using the MIPS R10000 performance counters
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Automatic source code specialization for energy reduction
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Gprof: A call graph execution profiler
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Energy and Performance Improvements in Microprocessor Design Using a Loop Cache
ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
A Programmable Co-processor for Profiling
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Exploiting Fixed Programs in Embedded Systems: A Loop Cache Example
IEEE Computer Architecture Letters
A fast on-chip profiler memory using a pipelined binary tree
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Profiling soft-core processor applications for hardware/software partitioning
Journal of Systems Architecture: the EUROMICRO Journal
Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Don't forget memories: a case study redesigning a pattern counting ASIC circuit for FPGAs
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
A systematic approach to profiling for hardware/software partitioning
Computers and Electrical Engineering
Efficient hardware-based nonintrusive dynamic application profiling
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
Profiling an application executing on a microprocessor is part of the solution to numerous software and hardware optimization and design automation problems. Most current profiling techniques suffer from runtime overhead, inaccuracy, or slowness, and the traditional non-intrusive method of using a logic analyzer doesn't work for today's system-on-a-chip having embedded cores. We introduce a novel on-chip memory architecture that overcomes these limitations. The architecture, which we call ProMem, is based on a pipelined binary tree structure. It achieves single-cycle throughput, so it can keep up with today's fastest pipelined processors. It can also be laid out efficiently and scales very well, becoming more efficient the larger it gets. The memory can be used in a wide-variety of common profiling situations, such as instruction profiling, value profiling, and network traffic profiling, which in turn can be used to guide numerous design automation tasks.