Direct memory access usage optimization in network applications for reduced memory latency and energy consumption

Authors:
Alexandros Bartzas;Miguel Peon-Quiros;Stylianos Mamagkakis;Francky Catthoor_affcnd;Dimitrios Soudris;Jose M. Mendias
Affiliations:
(Correspd. E-mail: ampartza@ee.duth.gr) VLSI Design Center - Democritus Univ. Thrace, 67100 Xanthi, Greece;DACYA/UCM, Avda. Complutense s/n, 28040 Madrid, Spain;afc IMEC vzw, Kapeldreef 75, 3001 Heverlee, Belgium;afd Professor at Katholieke Universiteit Leuven, Belgium;Microprocessors and Digital Systems Lab, School of Electrical and Computer Engineering, National Technical University of Athens, 15780 Zografou, Greece;DACYA/UCM, Avda. Complutense s/n, 28040 Madrid, Spain
Venue:
Journal of Embedded Computing - PATMOS 2007 selected papers on low power electronics
Year:
2009

Citing 15
Cited 0

Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
Pthreads programming

Pthreads programming
Compiler-directed scratch pad memory hierarchy design and management

Proceedings of the 39th annual Design Automation Conference
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design

Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
Data Cache and Direct Memory Access in Programming Mediaprocessors

IEEE Micro
Dynamic Storage Allocation: A Survey and Critical Review

IWMM '95 Proceedings of the International Workshop on Memory Management
Protected, user-level DMA for the SHRIMP network interface

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Synthesis of DMA Controllers from Architecture Independent Descriptions of HW/SW Communication Protocols

VLSID '99 Proceedings of the 12th International Conference on VLSI Design - 'VLSI for the Information Appliance'
A Scalable High-Performance DMA Architecture for DSP Applications

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Compiler-decided dynamic memory allocation for scratch-pad based embedded systems

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Cache-Aware Scratchpad Allocation Algorithm

Proceedings of the conference on Design, automation and test in Europe - Volume 2
The changing usage of a mature campus-wide wireless network

Proceedings of the 10th annual international conference on Mobile computing and networking
Integrated Task Scheduling and Data Assignment for SDRAMs in Dynamic Applications

IEEE Design & Test
Intra-task scenario-aware voltage scheduling

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
A combined DMA and application-specific prefetching approach for tackling the memory latency bottleneck

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Today, wireless networks are becoming increasingly ubiquitous. Usually several complex multi-threaded applications are mapped on a single embedded system and each of them is triggered by a different input stream (in accordance with the run-time behaviours of the user and the environment). This dynamicity renders the task of fully analyzing at design-time these systems very complex, if not impossible. Therefore, run-time information has to be used in order to produce an efficient design. This introduces new challenges, especially for embedded system designers using a Direct Memory Access (DMA) module, who have to know in advance the memory transfer behaviour of the whole system, in order to design and program their DMA efficiently. This is especially important in embedded systems with DRAM memories as the concurrent accesses from different processing elements can adversely affect the page-based architecture of these memory elements. Even more, the increasingly common usage of dynamic data types further complicates the problem because the exact location of data instances in the memory is unknown at design-time. In this paper we propose a system-level optimization methodology to adapt the DMA usage parameters automatically at run-time, according to online information. With our proposed optimization approach we manage to reduce the mean latency of the memory transfers by more than 18%, thus reducing the average number of cycles that processing elements or DMAs have to waste waiting for data from the main memory, while optimizing energy consumption and system responsiveness. We evaluate our approach using a set of real-life applications and real wireless dynamic streams.