Synthesis of application-specific memories for power optimization in embedded systems
Proceedings of the 37th Annual Design Automation Conference
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
A low-overhead coherence solution for multiprocessors with private cache memories
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Assigning Program and Data Objects to Scratchpad for Energy Reduction
Proceedings of the conference on Design, automation and test in Europe
Scalable custom instructions identification for instruction-set extensible processors
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Automated Custom Instruction Generation for Domain-Specific Processor Acceleration
IEEE Transactions on Computers
An integer linear programming approach for identifying instruction-set extensions
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Customizable Embedded Processors: Design Technologies and Applications
Customizable Embedded Processors: Design Technologies and Applications
Rethinking custom ISE identification: a new processor-agnostic method
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Exact and approximate algorithms for the extension of embedded processor instruction sets
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Introduction of Architecturally Visible Storage in Instruction Set Extensions
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Way Stealing: cache-assisted automatic instruction set extensions
Proceedings of the 46th Annual Design Automation Conference
Proceedings of the 2009 International Conference on Computer-Aided Design
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Hi-index | 0.00 |
Instruction set extensions (ISEs) can accelerate embedded processor performance. Many algorithms for ISE generation have shown good potential; some of them have recently been expanded to include Architecturally Visible Storage (AVS) - compiler-controlled memories, similar to scratchpads, that are accessible only to ISEs. To achieve a speedup using AVS, Direct Memory Access (DMA) transfers are required to move data from the main memory to the AVS; unfortunately, this creates coherence problems between the AVS and the cache, which previous methods for ISEs with AVS failed to address; additionally, these methods need to leave many conservative DMA transfers in place, whose execution significantly limits the achievable speedup. This paper presents a memory coherence scheme for ISEs with AVS, which can ensure execution correctness and memory consistency with minimal area overhead. We also present a method that speculatively removes redundant DMA transfers. Cycle-accurate experimental results were obtained using an FPGA-emulation platform. These results show that the application-specific instruction-set extended processors with speculative DMA-enhanced AVS gain significantly over previous techniques, despite the overhead of the coherence mechanism.