Vector class on limited local memory (LLM) multi-core processors

Authors:
Ke Bai;Di Lu;Aviral Shrivastava
Affiliations:
Arizona State University, Tempe, AZ, USA;Arizona State University, Tempe, AZ, USA;Arizona State University, Tempe, AZ, USA
Venue:
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Year:
2011

Citing 10
Cited 6

Scratchpad memory: design alternative for cache on-chip memory in embedded systems

Proceedings of the tenth international symposium on Hardware/software codesign
Support for parallel generic programming

Support for parallel generic programming
Optimizing Compiler for the CELL Processor

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Prefetching irregular references for software cache on cell

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Operation and data mapping for CGRAs with multi-bank memory

Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
STAPL: standard template adaptive parallel library

Proceedings of the 3rd Annual Haifa Experimental Systems Conference
Heap data management for limited local memory (LLM) multi-core processors

CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Stack data management for Limited Local Memory (LLM) multi-core processors

ASAP '11 Proceedings of the ASAP 2011 - 22nd IEEE International Conference on Application-specific Systems, Architectures and Processors
MCSTL: the multi-core standard template library

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing

Integrating software caches with scratch pad memory

Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Automatic and efficient heap data management for limited local memory multicore architectures

Proceedings of the Conference on Design, Automation and Test in Europe
SSDM: smart stack data management for software managed multicores (SMMs)

Proceedings of the 50th Annual Design Automation Conference
A software-only scheme for managing heap data on limited local memory(LLM) multicore processors

ACM Transactions on Embedded Computing Systems (TECS)
CMSM: an efficient and effective code management for software managed multicores

Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
SPM-Sieve: a framework for assisting data partitioning in scratch pad memory based systems

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Limited Local Memory (LLM) multi-core architecture is a promising solution for scalable memory hierarchy. LLM architecture, e.g., IBM Cell/B.E. is a purely distributed memory architecture in which each core can directly access only its small local memory, and that is why it is extremely power-efficient. Vector is a popular container class in the C++ Standard Template Library (STL), which provides the functionality similar to a dynamic array. Due to the small non-virtualized memory in the LLM architecture, vector library implementation cannot be used as it is. In this paper, we propose and implement a scheme to manage vector class in the local memory present in each core of LLM multi-core architecture. Our scalable solution can transparently maintain vector data between the shared global memory and the local memories. In addition, different data transfer granularities are provided by our vector class to achieve better performance. We also propose a mechanism to ensure the validity of pointers-to-elements when the vector elements are moved into the global memory. Experimental result shows that our vector class can improve the programmability of vector class significantly while the overhead can be contained within 7%.