Continuous profiling: where have all the cycles gone?
Proceedings of the sixteenth ACM symposium on Operating systems principles
ProfileMe: hardware support for instruction-level profiling on out-of-order processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A fast on-chip profiler memory
Proceedings of the 39th annual Design Automation Conference
Gprof: A call graph execution profiler
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Proceedings of the 41st annual Design Automation Conference
Hi-index | 0.00 |
We introduce a novel memory architecture that can count the occurrences of patterns on a system's bus, a task known as profiling. Such profiling can serve a variety of purposes, like detecting a microprocessor's software hot spots or frequently used data values, which can be used to optimize various aspects of the system. The memory, which we call ProMem, is based on a pipelined binary search tree structure, yielding several beneficial features, including nonintrusiveness, accurate counts, excellent size and power efficiency, very fast access times, and the use of standard memories with only simple additional logic. The main limitation is that the set of potential patterns must be preloaded into the memory. We describe the ProMem architecture, and show excellent size and performance advantages compared with content-addressable memory (CAM) based designs.