Introduction to algorithms
IMPACT: an architectural framework for multiple-instruction-issue processors
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Efficient software performance estimation methods for hardware/software codesign
DAC '96 Proceedings of the 33rd annual Design Automation Conference
Using the SimOS machine simulator to study complex computer systems
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Memory-CPU size optimization for embedded system designs
DAC '97 Proceedings of the 34th annual Design Automation Conference
Fast out-of-order processor simulation using memoization
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Instruction set selection for ASIP design
CODES '99 Proceedings of the seventh international workshop on Hardware/software codesign
Synthesis of Application Specific Instructions for Embedded DSP Software
IEEE Transactions on Computers
A method to derive application-specific embedded processing cores
CODES '00 Proceedings of the eighth international workshop on Hardware/software codesign
IEEE Transactions on Computers
The effect of reconfigurable units in superscalar processors
FPGA '01 Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays
Designing domain-specific processors
Proceedings of the ninth international symposium on Hardware/software codesign
Source-level execution time estimation of C programs
Proceedings of the ninth international symposium on Hardware/software codesign
FPGA prototyping of a RISC processor core for embedded applications
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Dynamic hardware plugins in an FPGA with partial run-time reconfiguration
Proceedings of the 39th annual Design Automation Conference
Efficient architecture/compiler co-exploration for ASIPs
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Towards automatic synthesis of a class of application-specific sensor networks
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
High-level modeling and FPGA prototyping of microprocessors
FPGA '03 Proceedings of the 2003 ACM/SIGDA eleventh international symposium on Field programmable gate arrays
The Garp Architecture and C Compiler
Computer
A Flexible DSP Core for Embedded Systems
IEEE Design & Test
Pentium 4 Performance-Monitoring Features
IEEE Micro
Compiler-directed customization of ASIP cores
Proceedings of the tenth international symposium on Hardware/software codesign
Automatic application-specific instruction-set extensions under microarchitectural constraints
Proceedings of the 40th annual Design Automation Conference
The Chimaera reconfigurable functional unit
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Garp: a MIPS processor with a reconfigurable coprocessor
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
The NAPA Adaptive Processing Architecture
FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
Emulator Environment Based on an FPGA Prototyping Board
RSP '00 Proceedings of the 11th IEEE International Workshop on Rapid System Prototyping (RSP 2000)
Experiences and Lessons Learned with a Portable Interface to Hardware Performance Counters
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Evolvable Internet Hardware Platforms
EH '01 Proceedings of the The 3rd NASA/DoD Workshop on Evolvable Hardware
The case for reconfigurable hardware in wearable computing
Personal and Ubiquitous Computing
Automatic generation of application specific processors
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Using reconfigurability to achieve real-time profiling for hardware/software codesign
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
A Portable Programming Interface for Performance Evaluation on Modern Processors
International Journal of High Performance Computing Applications
The next generation software program
International Journal of Parallel Programming - Special issue: The next generation software program
Empirical performance assessment using soft-core processors on reconfigurable hardware
Proceedings of the 2007 workshop on Experimental computer science
Empirical performance assessment using soft-core processors on reconfigurable hardware
ecs'07 Experimental computer science on Experimental computer science
Visions for application development on hybrid computing systems
Parallel Computing
Vision for liquid architecture
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Automatic application-specific microarchitecture reconfiguration
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hi-index | 0.00 |
Applications for constrained embedded systems require careful attention to the match between the application and the support offered by an architecture, at the ISA and microarchitecture levels. Generic processors, such as ARM and Power PC, are inexpensive, but with respect to a given application, they often overprovision in areas that are unimportant for the application's performance. Moreover, while application-specific, customized logic could dramatically improve the performance of an application, that approach is typically too expensive to justify its cost for most applications. In this paper, we describe our experience using reconfigurable architectures to develop an understanding of an application's performance and to enhance its performance with respect to customized, constrained logic. We begin with a standard ISA currently in use for embedded systems. We modify its core to measure performance characteristics, obtaining a system that provides cycle-accurate timings and presents results in the style of gprof, but with absolutely no software overhead. We then provide cache-behavior statistics that are typically unavailable in a generic processor. In contrast with simulation, our approach executes the program at full speed and delivers statistics based on the actual behavior of the cache subsystem. Finally, in response to the performance profile developed on our platform, we evaluate various uses of the FPGA-realized instruction and data caches in terms of the application's performance.