DMA-circular: an enhanced high level programmable DMA controller for optimized management of on-chip local memories

Authors:
Nikola Vujic;Lluc Alvarez;Marc Gonzalez;Xavier Martorell;Eduard Ayguade
Affiliations:
Barcelona Supercomputing Center, Barcelona, Spain;Barcelona Supercomputing Center and Universitat Politecnica de Catalunya, Barcelona, Spain;Universitat Politecnica de Catalunya, Barcelona, Spain;Barcelona Supercomputing Center and Universitat Politecnica de Catalunya, Barcelona, Spain;Barcelona Supercomputing Center and Universitat Politecnica de Catalunya, Barcelona, Spain
Venue:
Proceedings of the 9th conference on Computing Frontiers
Year:
2012

Citing 17
Cited 0

The NAS parallel benchmarks—summary and preliminary results

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
A safe approximate algorithm for interprocedural aliasing

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Interprocedural may-alias analysis for pointers: beyond k-limiting

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Efficient context-sensitive pointer analysis for C programs

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
A data cache with multiple caching strategies tuned to different types of locality

ICS '95 Proceedings of the 9th international conference on Supercomputing
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Efficient and precise array access analysis

ACM Transactions on Programming Languages and Systems (TOPLAS)
Direct addressed caches for reduced power consumption

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Cool-cache for hot multimedia

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Scratchpad memory: design alternative for cache on-chip memory in embedded systems

Proceedings of the tenth international symposium on Hardware/software codesign
Optimizing Compiler for the CELL Processor

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Using advanced compiler technology to exploit the performance of the Cell Broadband EngineTM architecture

IBM Systems Journal
Dynamic data scratchpad memory management for a memory subsystem with an MMU

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Compiler-managed partitioned data caches for low power

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Hybrid access-specific software cache techniques for the cell BE architecture

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
DBDB: optimizing DMATransfer for the cell be architecture

Proceedings of the 23rd international conference on Supercomputing
Optimizing the use of static buffers for DMA on a CELL chip

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents DMA-circular, a novel DMA controller for optimized memory management for on-chip local memories. DMA-circular embeds the functionality of caches into the DMA controller and applies aggressive optimizations using novel hardware. DMA-circular anticipates the computation requirements in terms of data transfers and performs buffer management for data that is mapped to the local memory. The explicit hardware support accelerates the most common actions related to the management of a local memory while the cache functionalities enable a high level of programmability for the DMA-circular. The evaluation is done on several high performance kernels from the NAS benchmark suite. Compared to traditional DMA controllers, results show speedups from 1.20x to 2x, keeping the control code overhead under 15% of the kernels' execution time and also reducing the energy consumption up to 40%.