A reconfigurable multi-function computing cache architecture
FPGA '00 Proceedings of the 2000 ACM/SIGDA eighth international symposium on Field programmable gate arrays
Application-specific memory management for embedded systems using software-controlled caches
Proceedings of the 37th Annual Design Automation Conference
Reconfigurable caches and their application to media processing
Proceedings of the 27th annual international symposium on Computer architecture
A Multi-Level Memory System Architecture for High-Performance DSP Applications
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
A low-power accelerator for the SPHINX 3 speech recognition system
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Comprehensively and efficiently protecting the heap
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Software-based instruction caching for embedded processors
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Integrated network interfaces for high-bandwidth TCP/IP
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Proceedings of the 34th annual international symposium on Computer architecture
Compiler-managed partitioned data caches for low power
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
QoS policies and architecture for cache/memory in CMP platforms
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Instruction cache energy saving through compiler way-placement
Proceedings of the conference on Design, automation and test in Europe
Scaling the bandwidth wall: challenges in and avenues for CMP scaling
Proceedings of the 36th annual international symposium on Computer architecture
Way Stealing: cache-assisted automatic instruction set extensions
Proceedings of the 46th Annual Design Automation Conference
Performance Measurement of an Integrated NIC Architecture with 10GbE
HOTI '09 Proceedings of the 2009 17th IEEE Symposium on High Performance Interconnects
Architecture Support for Improving Bulk Memory Copying and Initialization Performance
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Performance characterization and optimization of mobile augmented reality on handheld platforms
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Accelerating mobile augmented reality on a handheld platform
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Architectural framework for supporting operating system survivability
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Cost-effectively offering private buffers in SoCs and CMPs
Proceedings of the international conference on Supercomputing
BiN: a buffer-in-NUCA scheme for accelerator-rich CMPs
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Energy and performance exploration of accelerator coherency port using Xilinx ZYNQ
Proceedings of the 10th FPGAworld Conference
Hi-index | 0.00 |
In an SoC, building local storage in each accelerator is area inefficient due to the low average utilization. In this paper, we present design and implementation of Buffer-integrated-Caching (BiC), which allows many buffers to be instantiated simultaneously in caches. BiC enables cores to view portions of the SRAM as cache while accelerators access other portions of the SRAM as private buffers. We demonstrate the cost-effectiveness of BiC based on a recognition MPSoC that includes two PentiumTM cores, an Augmented Reality accelerator and a speech recognition accelerator. With 3% extra area added to the baseline L2 cache, BiC eliminates the need to build 215KB dedicated SRAM for the accelerators, while increasing total cache misses by no more than 0.3%.