Buffer-integrated-Cache: a cost-effective SRAM architecture for handheld and embedded platforms

Authors:
Carlos Flores Fajardo;Zhen Fang;Ravi Iyer;German Fabila Garcia;Seung Eun Lee;Li Zhao
Affiliations:
Intel Labs, Intel Corp.;Intel Labs, Intel Corp.;Intel Labs, Intel Corp.;Intel Labs, Intel Corp.;Seoul National University of Science and Technology;Intel Labs, Intel Corp.
Venue:
Proceedings of the 48th Design Automation Conference
Year:
2011

Citing 20
Cited 3

A reconfigurable multi-function computing cache architecture

FPGA '00 Proceedings of the 2000 ACM/SIGDA eighth international symposium on Field programmable gate arrays
Application-specific memory management for embedded systems using software-controlled caches

Proceedings of the 37th Annual Design Automation Conference
Reconfigurable caches and their application to media processing

Proceedings of the 27th annual international symposium on Computer architecture
A Multi-Level Memory System Architecture for High-Performance DSP Applications

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
A low-power accelerator for the SPHINX 3 speech recognition system

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Synergistic Processing in Cell's Multicore Architecture

IEEE Micro
Comprehensively and efficiently protecting the heap

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Software-based instruction caching for embedded processors

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Integrated network interfaces for high-bandwidth TCP/IP

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Virtual private caches

Proceedings of the 34th annual international symposium on Computer architecture
Compiler-managed partitioned data caches for low power

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
QoS policies and architecture for cache/memory in CMP platforms

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Instruction cache energy saving through compiler way-placement

Proceedings of the conference on Design, automation and test in Europe
Scaling the bandwidth wall: challenges in and avenues for CMP scaling

Proceedings of the 36th annual international symposium on Computer architecture
Way Stealing: cache-assisted automatic instruction set extensions

Proceedings of the 46th Annual Design Automation Conference
Performance Measurement of an Integrated NIC Architecture with 10GbE

HOTI '09 Proceedings of the 2009 17th IEEE Symposium on High Performance Interconnects
Architecture Support for Improving Bulk Memory Copying and Initialization Performance

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Performance characterization and optimization of mobile augmented reality on handheld platforms

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Accelerating mobile augmented reality on a handheld platform

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Architectural framework for supporting operating system survivability

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture

Cost-effectively offering private buffers in SoCs and CMPs

Proceedings of the international conference on Supercomputing
BiN: a buffer-in-NUCA scheme for accelerator-rich CMPs

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Energy and performance exploration of accelerator coherency port using Xilinx ZYNQ

Proceedings of the 10th FPGAworld Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

In an SoC, building local storage in each accelerator is area inefficient due to the low average utilization. In this paper, we present design and implementation of Buffer-integrated-Caching (BiC), which allows many buffers to be instantiated simultaneously in caches. BiC enables cores to view portions of the SRAM as cache while accelerators access other portions of the SRAM as private buffers. We demonstrate the cost-effectiveness of BiC based on a recognition MPSoC that includes two PentiumTM cores, an Augmented Reality accelerator and a speech recognition accelerator. With 3% extra area added to the baseline L2 cache, BiC eliminates the need to build 215KB dedicated SRAM for the accelerators, while increasing total cache misses by no more than 0.3%.