Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
ASR: Adaptive Selective Replication for CMP Caches
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
An energy-efficient adaptive hybrid cache
Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Buffer-integrated-Cache: a cost-effective SRAM architecture for handheld and embedded platforms
Proceedings of the 48th Design Automation Conference
The accelerator store: A shared memory framework for accelerator-based systems
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Combined loop transformation and hierarchy allocation for data reuse optimization
Proceedings of the International Conference on Computer-Aided Design
Architecture support for accelerator-rich CMPs
Proceedings of the 49th Annual Design Automation Conference
High-Level Synthesis for FPGAs: From Prototyping to Deployment
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
CHARM: a composable heterogeneous accelerator-rich microprocessor
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Stream arbitration: Towards efficient bandwidth utilization for emerging on-chip interconnects
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Replacement techniques for dynamic NUCA cache designs on CMPs
The Journal of Supercomputing
Optimization of interconnects between accelerators and shared memories in dark silicon
Proceedings of the International Conference on Computer-Aided Design
Hi-index | 0.00 |
As the number of on-chip accelerators grows rapidly to improve power-efficiency, the buffer size required by accelerators drastically increases. Existing solutions allow the accelerators to share a common pool of buffers or/and allocate buffers in cache. In this paper we propose a Buffer-in-NUCA (BiN) scheme with the following contributions: (1) a dynamic interval-based global buffer allocation method to assign shared buffer spaces to accelerators that can best utilize the additional buffer space, and (2) a flexible and low-overhead paged buffer allocation method to limit the impact of buffer fragmentation in a shared buffer, especially when allocating buffers in a non-uniform cache architecture (NUCA) with distributed cache banks. Experimental results show that, when compared to two representative schemes from the prior work, BiN improves performance by 32% and 35% and reduces energy by 12% and 29%, respectively.