Partitioning and scheduling DSP applications with maximal memory access hiding

Authors:
Zhong Wang;Edwin Hsing-Mean Sha;Yuke Wang
Affiliations:
Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN;Department of Computer Science, University of Texas at Dallas, Richardson, TX;Department of Computer Science, University of Texas at Dallas, Richardson, TX
Venue:
EURASIP Journal on Applied Signal Processing
Year:
2002

Citing 16
Cited 4

An effective on-chip preloading scheme to reduce data access penalty

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Register requirements of pipelined processors

ICS '92 Proceedings of the 6th international conference on Supercomputing
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Data prefetching for high-performance processors

Data prefetching for high-performance processors
Tolerating latency in multiprocessors through compiler-inserted prefetching

ACM Transactions on Computer Systems (TOCS)
Eliminating conflict misses for high performance architectures

ICS '98 Proceedings of the 12th international conference on Supercomputing
Scheduling of uniform multidimensional systems under resource constraints

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A tile selection algorithm for data locality and cache interference

ICS '99 Proceedings of the 13th international conference on Supercomputing
Minimizing Average Schedule Length under Memory Constraints by Optimal Partitioning and Prefetching

Journal of VLSI Signal Processing Systems
Sequential Hardware Prefetching in Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
An adaptive sequential prefetching scheme in shared-memory multiprocessors

ICPP '97 Proceedings of the international Conference on Parallel Processing
Combining Loop Fusion with Prefetching on Shared-memory Multiprocessors

ICPP '97 Proceedings of the international Conference on Parallel Processing
Automatic Data Layout Using 0-1 Integer Programming

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Loop Scheduling and Partitions for Hiding Memory Latencies

Proceedings of the 12th international symposium on System synthesis
Optimizing DSP flow graphs via schedule-based multidimensionalretiming

IEEE Transactions on Signal Processing

Loop Scheduling with Complete Memory Latency Hiding on Multi-core Architecture

ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
ILP optimal scheduling for multi-module memory

CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Iterational retiming with partitioning: Loop scheduling with complete memory latency hiding

ACM Transactions on Embedded Computing Systems (TECS)
Variable Partitioning and Scheduling for MPSoC with Virtually Shared Scratch Pad Memory

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an iteration space partitioning scheme to reduce the CPU idle time due to the long memory access latency. We take into consideration both the data accesses of intermediate and initial data. An algorithm is proposed to find the largest overlap for initial data to reduce the entire memory traffic. In order to efficiently hide the memory latency, another algorithm is developed to balance the ALU and memory schedules. The experiments on DSP benchmarks show that the algorithms significantly outperform the known existing methods.