Application-specific memory management for embedded systems using software-controlled caches
Proceedings of the 37th Annual Design Automation Conference
Scratchpad memory: design alternative for cache on-chip memory in embedded systems
Proceedings of the tenth international symposium on Hardware/software codesign
A low-power accelerator for the SPHINX 3 speech recognition system
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Direct Cache Access for High Bandwidth Network I/O
Proceedings of the 32nd annual international symposium on Computer Architecture
Architectural Characterization of TCP/IP Packet Processing on the Pentium® M Microprocessor
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Store-Ordered Streaming of Shared Memory
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Comprehensively and efficiently protecting the heap
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Software-based instruction caching for embedded processors
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Integrated network interfaces for high-bandwidth TCP/IP
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Proceedings of the 34th annual international symposium on Computer architecture
QoS policies and architecture for cache/memory in CMP platforms
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Architectural Support for the Stream Execution Model on General-Purpose Processors
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Instruction cache energy saving through compiler way-placement
Proceedings of the conference on Design, automation and test in Europe
Outdoors augmented reality on mobile phone using loxel-based visual feature organization
MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Scaling the bandwidth wall: challenges in and avenues for CMP scaling
Proceedings of the 36th annual international symposium on Computer architecture
Cutting the electric bill for internet-scale systems
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
FAWN: a fast array of wimpy nodes
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Way Stealing: cache-assisted automatic instruction set extensions
Proceedings of the 46th Annual Design Automation Conference
Performance Measurement of an Integrated NIC Architecture with 10GbE
HOTI '09 Proceedings of the 2009 17th IEEE Symposium on High Performance Interconnects
Performance characterization and optimization of mobile augmented reality on handheld platforms
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Accelerating mobile augmented reality on a handheld platform
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
A new TCB cache to efficiently manage TCP sessions for web servers
Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Architectural framework for supporting operating system survivability
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Buffer-integrated-Cache: a cost-effective SRAM architecture for handheld and embedded platforms
Proceedings of the 48th Design Automation Conference
IEEE Spectrum
Hi-index | 0.00 |
High performance SoCs and CMPs integrate multiple cores and hardware accelerators such as network interface devices and speech recognition engines. Cores make use of SRAM organized as a cache. Accelerators make use of SRAM as special-purpose storage such as FIFOs, scratchpad memory, or other forms of private buffers. Dedicated private buffers provide benefits such as deterministic access, but are highly area inefficient due to the lower average utilization of the total available storage. We propose Buffer-integrated-Caching (BiC), which integrates private buffers and traditional caches into a single shared SRAM block. Much like shared caches improve SRAM utilization on CMPs, the BiC architecture generalizes this advantage for a heterogeneous mix of cores and accelerators in future SoCs and CMPs. We demonstrate cost-effectiveness of the BiC using SoC-based low-power servers and CMP-based servers with on-chip NIC. We show that with a small extra area added to the baseline cache, BiC removes the need for large, dedicated SRAMs, with minimal performance impact.