Memory-constrained block processing for DSP software optimization

Authors:
Ming-Yung Ko;Chung-Ching Shen;Shuvra S. Bhattachryya
Affiliations:
Sandbridge Technologies Inc., White Plains, NY;Electrical and Computer Engineering Department, University of Maryland, College Park, MD;Electrical and Computer Engineering Department, University of Maryland, College Park, MD
Venue:
Journal of Signal Processing Systems - Special Issue: Embedded computing systems for DSP
Year:
2008

Citing 7
Cited 3

Optimizing computations for effective block-processing

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Code Optimization Techniques for Embedded Processors: Methods, Algorithms, and Tools

Code Optimization Techniques for Embedded Processors: Methods, Algorithms, and Tools
Fast Algorithms for Digital Signal Processing

Fast Algorithms for Digital Signal Processing
Software Synthesis from Dataflow Graphs

Software Synthesis from Dataflow Graphs
StreamIt: A Language for Streaming Applications

CC '02 Proceedings of the 11th International Conference on Compiler Construction
Software synthesis from the dataflow interchange format

SCOPES '05 Proceedings of the 2005 workshop on Software and compilers for embedded systems
Shared buffer implementations of signal processing systems using lifetime analysis techniques

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
An empirical characterization of stream programs and its implications for language and compiler design

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Integration of Dataflow-Based Heterogeneous Multiprocessor Scheduling Techniques in GNU Radio

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Digital signal processing (DSP) applications involve processing long streams of input data. It is important to take into account this form of processing when implementing embedded software for DSP systems. Task-level vectorization, or block processing, is a useful dataflow graph transformation that can significantly improve execution performance by allowing subsequences of data items to be processed through individual task invocations. In this way, several benefits can be obtained, including reduced context switch overhead, increased memory locality, improved utilization of processor pipelines, and use of more efficient DSP oriented addressing modes. On the other hand, block processing generally results in increased memory requirements since it effeclively increases the sizes of the input and output values associated with processing tasks. In this paper, we investigate the memory-performance trade-off associated with block processing. We develop novel block processing algorithms that carefully take into account memory constraints to achieve efficient block processing configurations within given memory space limitations. Our experimental results indicate that these methods derive optimal memory-constrained block processing solutions most of the time. We demonstrate the advantages of our block processing techniques on practical kernel functions and applications in the DSP domain.