Cost-effectively offering private buffers in SoCs and CMPs

Authors:
Zhen Fang;Li Zhao;Ravishankar R. Iyer;Carlos Flores Fajardo;German Fabila Garcia;Seung Eun Lee;Bin Li;Steve R. King;Xiaowei Jiang;Srihari Makineni
Affiliations:
Intel Labs, Hillsboro, OR, USA;Intel Labs, Hillsboro, OR, USA;Intel Labs, Hillsboro, OR, USA;Intel Labs, Guadalajara, Mexico;Intel, Guadalajara, Mexico;Seoul National University of Science and Technology, Seoul, South Korea;Intel Labs, Hillsboro, OR, USA;Intel Labs, Hillsboro, OR, USA;Intel Labs, Hillsboro, OR, USA;Intel Labs, Hillsboro, OR, USA
Venue:
Proceedings of the international conference on Supercomputing
Year:
2011

Citing 31
Cited 0

Application-specific memory management for embedded systems using software-controlled caches

Proceedings of the 37th Annual Design Automation Conference
Scratchpad memory: design alternative for cache on-chip memory in embedded systems

Proceedings of the tenth international symposium on Hardware/software codesign
A low-power accelerator for the SPHINX 3 speech recognition system

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Direct Cache Access for High Bandwidth Network I/O

Proceedings of the 32nd annual international symposium on Computer Architecture
Architectural Characterization of TCP/IP Packet Processing on the Pentium® M Microprocessor

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Store-Ordered Streaming of Shared Memory

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Synergistic Processing in Cell's Multicore Architecture

IEEE Micro
Comprehensively and efficiently protecting the heap

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Software-based instruction caching for embedded processors

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Integrated network interfaces for high-bandwidth TCP/IP

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Virtual private caches

Proceedings of the 34th annual international symposium on Computer architecture
QoS policies and architecture for cache/memory in CMP platforms

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Architectural Support for the Stream Execution Model on General-Purpose Processors

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Exploring Large-Scale CMP Architectures Using ManySim

IEEE Micro
Instruction cache energy saving through compiler way-placement

Proceedings of the conference on Design, automation and test in Europe
On-Chip Memory System Optimization Design for the FT64 Scientific Stream Accelerator

IEEE Micro
Outdoors augmented reality on mobile phone using loxel-based visual feature organization

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Scaling the bandwidth wall: challenges in and avenues for CMP scaling

Proceedings of the 36th annual international symposium on Computer architecture
Cutting the electric bill for internet-scale systems

Proceedings of the ACM SIGCOMM 2009 conference on Data communication
FAWN: a fast array of wimpy nodes

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Way Stealing: cache-assisted automatic instruction set extensions

Proceedings of the 46th Annual Design Automation Conference
Performance Measurement of an Integrated NIC Architecture with 10GbE

HOTI '09 Proceedings of the 2009 17th IEEE Symposium on High Performance Interconnects
Performance characterization and optimization of mobile augmented reality on handheld platforms

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Accelerating mobile augmented reality on a handheld platform

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Challenges and Opportunities for Extremely Energy-Efficient Processors

IEEE Micro
A new TCB cache to efficiently manage TCP sessions for web servers

Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Explicit Communication and Synchronization in SARC

IEEE Micro
CogniServe: Heterogeneous Server Architecture for Large-Scale Recognition

IEEE Micro
Architectural framework for supporting operating system survivability

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Buffer-integrated-Cache: a cost-effective SRAM architecture for handheld and embedded platforms

Proceedings of the 48th Design Automation Conference
Tech Titans Building Boom

IEEE Spectrum

Quantified Score

Hi-index	0.00

Visualization

Abstract

High performance SoCs and CMPs integrate multiple cores and hardware accelerators such as network interface devices and speech recognition engines. Cores make use of SRAM organized as a cache. Accelerators make use of SRAM as special-purpose storage such as FIFOs, scratchpad memory, or other forms of private buffers. Dedicated private buffers provide benefits such as deterministic access, but are highly area inefficient due to the lower average utilization of the total available storage. We propose Buffer-integrated-Caching (BiC), which integrates private buffers and traditional caches into a single shared SRAM block. Much like shared caches improve SRAM utilization on CMPs, the BiC architecture generalizes this advantage for a heterogeneous mix of cores and accelerators in future SoCs and CMPs. We demonstrate cost-effectiveness of the BiC using SoC-based low-power servers and CMP-based servers with on-chip NIC. We show that with a small extra area added to the baseline cache, BiC removes the need for large, dedicated SRAMs, with minimal performance impact.