Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Continuous profiling: where have all the cycles gone?
Proceedings of the sixteenth ACM symposium on Operating systems principles
System support for automatic profiling and optimization
Proceedings of the sixteenth ACM symposium on Operating systems principles
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
ProfileMe: hardware support for instruction-level profiling on out-of-order processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
A power reduction technique with object code merging for application specific embedded processors
DATE '00 Proceedings of the conference on Design, automation and test in Europe
A low power unified cache architecture providing power and performance flexibility (poster session)
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Dynamo: a transparent dynamic optimization system
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Performance analysis using the MIPS R10000 performance counters
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Energy efficient frequent value data cache design
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Dynamic hardware/software partitioning: a first approach
Proceedings of the 40th annual Design Automation Conference
Proceedings of the 40th annual Design Automation Conference
Profiling tools for hardware/software partitioning of embedded applications
Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Gprof: A call graph execution profiler
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Control Speculation in Multithreaded Processors through Dynamic Loop Detection
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Designing the M·CORETM M3 CPU Architecture
ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Energy and Performance Improvements in Microprocessor Design Using a Loop Cache
ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
A codesigned on-chip logic minimizer
Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
SpixTools: Introduction and User's Manual
SpixTools: Introduction and User's Manual
Exploiting Fixed Programs in Embedded Systems: A Loop Cache Example
IEEE Computer Architecture Letters
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Efficient remote profiling for resource-constrained devices
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the 41st annual Design Automation Conference
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Low-power warp processor for power efficient high-performance embedded systems
Proceedings of the conference on Design, automation and test in Europe
A smart random code injection to mask power analysis based side channel attacks
CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Design and implementation of a MicroBlaze-based warp processor
ACM Transactions on Embedded Computing Systems (TECS)
Scalability and parallel execution of warp processing: dynamic hardware/software partitioning
International Journal of Parallel Programming
ACM Transactions on Design Automation of Electronic Systems (TODAES)
WISTP'08 Proceedings of the 2nd IFIP WG 11.2 international conference on Information security theory and practices: smart devices, convergence and next generation networks
Enabling large decoded instruction loop caching for energy-aware embedded processors
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Binary acceleration using coarse-grained reconfigurable architecture
ACM SIGARCH Computer Architecture News
Randomized Instruction Injection to Counter Power Analysis Attacks
ACM Transactions on Embedded Computing Systems (TECS)
Adaptive loop caching using lightweight runtime control flow analysis
ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
DLIC: Decoded loop instructions caching for energy-aware embedded processors
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
Dynamic software optimization methods are becoming increasingly popular for improving software performance and power. The first step in dynamic optimization consists of detecting frequently executed code, or "critical regions." Previous critical region detectors have been targeted to desktop processors. We introduce a critical region detector targeted to embedded processors, with the unique features of being very size and power efficient, and being completely non-intrusive to the software's execution - features needed in timing-sensitive embedded systems. Our detector not only finds the critical regions, but also determines their relative frequencies, a potentially important feature for selecting among alternative dynamic optimization methods. Our detector uses a tiny cache coupled with a small amount of logic. We provide results of extensive explorations across seventeen embedded system benchmarks. We show that highly accurate results can be achieved with only a 0.02% power overhead and acceptable size overhead. Our detector is currently being used as part of a dynamic hardware/software partitioning approach, but is applicable to a wide-variety of situations.