Frequent loop detection using efficient non-intrusive on-chip hardware

Authors:
Ann Gordon-Ross;Frank Vahid
Affiliations:
University of California, Riverside,Riverside, CA;University of California, Riverside,Riverside, CA
Venue:
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Year:
2003

Citing 25
Cited 16

Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
Profile guided code positioning

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Continuous profiling: where have all the cycles gone?

Proceedings of the sixteenth ACM symposium on Operating systems principles
System support for automatic profiling and optimization

Proceedings of the sixteenth ACM symposium on Operating systems principles
Value profiling

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
ProfileMe: hardware support for instruction-level profiling on out-of-order processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Instruction fetch energy reduction using loop caches for embedded applications with small tight loops

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
A power reduction technique with object code merging for application specific embedded processors

DATE '00 Proceedings of the conference on Design, automation and test in Europe
A low power unified cache architecture providing power and performance flexibility (poster session)

ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Dynamo: a transparent dynamic optimization system

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Performance analysis using the MIPS R10000 performance counters

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Area and power reduction of embedded DSP systems using instruction compression and re-configurable encoding

Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Energy efficient frequent value data cache design

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Dynamic hardware/software partitioning: a first approach

Proceedings of the 40th annual Design Automation Conference
On-chip logic minimization

Proceedings of the 40th annual Design Automation Conference
Profiling tools for hardware/software partitioning of embedded applications

Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Gprof: A call graph execution profiler

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Control Speculation in Multithreaded Processors through Dynamic Loop Detection

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Designing the M·CORETM M3 CPU Architecture

ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Energy and Performance Improvements in Microprocessor Design Using a Loop Cache

ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
A codesigned on-chip logic minimizer

Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
SpixTools: Introduction and User's Manual

SpixTools: Introduction and User's Manual
Exploiting Fixed Programs in Embedded Systems: A Loop Cache Example

IEEE Computer Architecture Letters

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning

Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
HPS: Hybrid Profiling Support

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Efficient remote profiling for resource-constrained devices

ACM Transactions on Architecture and Code Optimization (TACO)
Warp Processors

Proceedings of the 41st annual Design Automation Conference
Binary synthesis

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Low-power warp processor for power efficient high-performance embedded systems

Proceedings of the conference on Design, automation and test in Europe
A smart random code injection to mask power analysis based side channel attacks

CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Design and implementation of a MicroBlaze-based warp processor

ACM Transactions on Embedded Computing Systems (TECS)
Scalability and parallel execution of warp processing: dynamic hardware/software partitioning

International Journal of Parallel Programming
Autonomous hardware/software partitioning and voltage/frequency scaling for low-power embedded systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Software cannot protect software: an argument for dedicated hardware in security and a categorization of the trustworthiness of information

WISTP'08 Proceedings of the 2nd IFIP WG 11.2 international conference on Information security theory and practices: smart devices, convergence and next generation networks
Enabling large decoded instruction loop caching for energy-aware embedded processors

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Binary acceleration using coarse-grained reconfigurable architecture

ACM SIGARCH Computer Architecture News
Randomized Instruction Injection to Counter Power Analysis Attacks

ACM Transactions on Embedded Computing Systems (TECS)
Adaptive loop caching using lightweight runtime control flow analysis

ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
DLIC: Decoded loop instructions caching for energy-aware embedded processors

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Dynamic software optimization methods are becoming increasingly popular for improving software performance and power. The first step in dynamic optimization consists of detecting frequently executed code, or "critical regions." Previous critical region detectors have been targeted to desktop processors. We introduce a critical region detector targeted to embedded processors, with the unique features of being very size and power efficient, and being completely non-intrusive to the software's execution - features needed in timing-sensitive embedded systems. Our detector not only finds the critical regions, but also determines their relative frequencies, a potentially important feature for selecting among alternative dynamic optimization methods. Our detector uses a tiny cache coupled with a small amount of logic. We provide results of extensive explorations across seventeen embedded system benchmarks. We show that highly accurate results can be achieved with only a 0.02% power overhead and acceptable size overhead. Our detector is currently being used as part of a dynamic hardware/software partitioning approach, but is applicable to a wide-variety of situations.