A fast on-chip profiler memory

Authors:
Roman Lysecky;Susan Cotterell;Frank Vahid
Affiliations:
University of California, Riverside;University of California, Riverside;University of California, Riverside
Venue:
Proceedings of the 39th annual Design Automation Conference
Year:
2002

Citing 16
Cited 6

Profile guided code positioning

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Scalable high speed IP routing lookups

SIGCOMM '97 Proceedings of the ACM SIGCOMM '97 conference on Applications, technologies, architectures, and protocols for computer communication
Continuous profiling: where have all the cycles gone?

Proceedings of the sixteenth ACM symposium on Operating systems principles
System support for automatic profiling and optimization

Proceedings of the sixteenth ACM symposium on Operating systems principles
Value profiling

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
ProfileMe: hardware support for instruction-level profiling on out-of-order processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Common-case computation: a high-level technique for power and performance optimization

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
A power reduction technique with object code merging for application specific embedded processors

DATE '00 Proceedings of the conference on Design, automation and test in Europe
Dynamo: a transparent dynamic optimization system

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Performance analysis using the MIPS R10000 performance counters

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Automatic source code specialization for energy reduction

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Platform Tuning for Embedded Systems Design

Computer
Gprof: A call graph execution profiler

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Energy and Performance Improvements in Microprocessor Design Using a Loop Cache

ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
A Programmable Co-processor for Profiling

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Exploiting Fixed Programs in Embedded Systems: A Loop Cache Example

IEEE Computer Architecture Letters

A fast on-chip profiler memory using a pipelined binary tree

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Profiling soft-core processor applications for hardware/software partitioning

Journal of Systems Architecture: the EUROMICRO Journal
A pipelined binary tree as a case study on designing efficient circuits for an FPGA in a bram aware design

Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Don't forget memories: a case study redesigning a pattern counting ASIC circuit for FPGAs

CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
A systematic approach to profiling for hardware/software partitioning

Computers and Electrical Engineering
Efficient hardware-based nonintrusive dynamic application profiling

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Profiling an application executing on a microprocessor is part of the solution to numerous software and hardware optimization and design automation problems. Most current profiling techniques suffer from runtime overhead, inaccuracy, or slowness, and the traditional non-intrusive method of using a logic analyzer doesn't work for today's system-on-a-chip having embedded cores. We introduce a novel on-chip memory architecture that overcomes these limitations. The architecture, which we call ProMem, is based on a pipelined binary tree structure. It achieves single-cycle throughput, so it can keep up with today's fastest pipelined processors. It can also be laid out efficiently and scales very well, becoming more efficient the larger it gets. The memory can be used in a wide-variety of common profiling situations, such as instruction profiling, value profiling, and network traffic profiling, which in turn can be used to guide numerous design automation tasks.