Smart Memories: a modular reconfigurable architecture
Proceedings of the 27th annual international symposium on Computer architecture
Hardware support for real-time embedded multiprocessor system-on-a-chip memory management
Proceedings of the tenth international symposium on Hardware/software codesign
Performance Evaluation and Design Trade-Offs for Network-on-Chip Interconnect Architectures
IEEE Transactions on Computers
Measuring the gap between FPGAs and ASICs
Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
Proceedings of the 43rd annual Design Automation Conference
Unified microprocessor core storage
Proceedings of the 4th international conference on Computing frontiers
Architectural implications of brick and mortar silicon manufacturing
Proceedings of the 34th annual international symposium on Computer architecture
Merge: a programming model for heterogeneous multi-core systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Disaggregated memory for expansion and sharing in blade servers
Proceedings of the 36th annual international symposium on Computer architecture
Understanding sources of inefficiency in general-purpose chips
Proceedings of the 37th annual international symposium on Computer architecture
The Accelerator Store framework for high-performance, low-power accelerator-based systems
IEEE Computer Architecture Letters
BiN: a buffer-in-NUCA scheme for accelerator-rich CMPs
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Optimization of interconnects between accelerators and shared memories in dark silicon
Proceedings of the International Conference on Computer-Aided Design
Hi-index | 0.00 |
This paper presents the many-accelerator architecture, a design approach combining the scalability of homogeneous multi-core architectures and system-on-chip's high performance and power-efficient hardware accelerators. In preparation for systems containing tens or hundreds of accelerators, we characterize a diverse pool of accelerators and find each contains significant amounts of SRAM memory (up to 90% of their area). We take advantage of this discovery and introduce the accelerator store, a scalable architectural component to minimize accelerator area by sharing its memories between accelerators. We evaluate the accelerator store for two applications and find significant system area reductions (30%) in exchange for small overheads (2% performance, 0%--8% energy). The paper also identifies new research directions enabled by the accelerator store and the many-accelerator architecture.