Optimizing computations for effective block-processing
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Code Optimization Techniques for Embedded Processors: Methods, Algorithms, and Tools
Code Optimization Techniques for Embedded Processors: Methods, Algorithms, and Tools
Fast Algorithms for Digital Signal Processing
Fast Algorithms for Digital Signal Processing
Software Synthesis from Dataflow Graphs
Software Synthesis from Dataflow Graphs
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Software synthesis from the dataflow interchange format
SCOPES '05 Proceedings of the 2005 workshop on Software and compilers for embedded systems
Shared buffer implementations of signal processing systems using lifetime analysis techniques
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Integration of Dataflow-Based Heterogeneous Multiprocessor Scheduling Techniques in GNU Radio
Journal of Signal Processing Systems
Hi-index | 0.00 |
Digital signal processing (DSP) applications involve processing long streams of input data. It is important to take into account this form of processing when implementing embedded software for DSP systems. Task-level vectorization, or block processing, is a useful dataflow graph transformation that can significantly improve execution performance by allowing subsequences of data items to be processed through individual task invocations. In this way, several benefits can be obtained, including reduced context switch overhead, increased memory locality, improved utilization of processor pipelines, and use of more efficient DSP oriented addressing modes. On the other hand, block processing generally results in increased memory requirements since it effeclively increases the sizes of the input and output values associated with processing tasks. In this paper, we investigate the memory-performance trade-off associated with block processing. We develop novel block processing algorithms that carefully take into account memory constraints to achieve efficient block processing configurations within given memory space limitations. Our experimental results indicate that these methods derive optimal memory-constrained block processing solutions most of the time. We demonstrate the advantages of our block processing techniques on practical kernel functions and applications in the DSP domain.