Optimizing scheduling and intercluster connection for application-specific DSP processors

Authors:
Cathy Qun Xu;Chun Jason Xue;Jingtong Hu;Edwin Hsing-Mean Sha
Affiliations:
Department of Computer Science, University of Texas at Dallas, Richardson, TX;Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong;Department of Computer Science, University of Texas at Dallas, Richardson, TX;Department of Computer Science, University of Texas at Dallas, Richardson, TX
Venue:
IEEE Transactions on Signal Processing
Year:
2009

Citing 22
Cited 2

A new polynomial-time algorithm for linear programming

Combinatorica
Fibonacci heaps and their uses in improved network optimization algorithms

Journal of the ACM (JACM)
Combinatorial optimization: algorithms and complexity

Combinatorial optimization: algorithms and complexity
Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
The Transmogrifier-2: a 1 million gate rapid-prototyping system

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Lx: a technology platform for customizable VLIW embedded processing

Proceedings of the 27th annual international symposium on Computer architecture
Phase coupled operation assignment for VLIW processors with distributed register files

Proceedings of the 14th international symposium on Systems synthesis
Instruction scheduling for clustered VLIW architectures

ISSS '00 Proceedings of the 13th international symposium on System synthesis
Embedded Multiprocessors: Scheduling and Synchronization

Embedded Multiprocessors: Scheduling and Synchronization
Link contention-constrained scheduling and mapping of tasks

Cluster Computing
Design Challenges for New Application-Specific Processors

IEEE Design & Test
Scheduling Data-Flow Graphs via Retiming and Unfolding

IEEE Transactions on Parallel and Distributed Systems
A New Clustering Algorithm for Large Communication Delays

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Inter-Cluster Communication Models for Clustered VLIW Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Lu decomposition on a multiprocessing system with communications delay

Lu decomposition on a multiprocessing system with communications delay
Joint Application Mapping/Interconnect Synthesis Techniques for Embedded Chip-Scale Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Loop scheduling with timing and switching-activity minimization for VLIW DSP

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Impact of intercluster communication mechanisms on ILP in clustered VLIW architectures

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Relative Performance of Scheduling Algorithms in Grid Environments

CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Maximize Parallelism Minimize Overhead for Nested Loops via Loop Striping

Journal of VLSI Signal Processing Systems
Rotation scheduling: a loop pipelining algorithm

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Reducing write activities on non-volatile memories in embedded CMPs via data migration and recomputation

Proceedings of the 47th Design Automation Conference
WCET-aware re-scheduling register allocation for real-time embedded systems with clustered VLIW architecture

Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems

Quantified Score

Hi-index	35.69

Visualization

Abstract

Signal processing applications have high instruction level parallelism (ILP) and real-time performance requirements. Embedded and application specific multicluster architecture is desirable to provide the large computation power that these applications need. As technology moves to deep submicron level, it becomes more important and challenging to design an efficient intercluster connection network to satisfy the rapid growing intercluster data transfer needs under the power and cost constraints. This paper addresses the automatic generation of intercluster connection network with partially connected buses. An application specific approach is proposed in this paper to determine the minimum number of required partially connected buses without performance degradation for a given schedule in polynomial time. The intercluster connection topology is then generated with the determined minimum number of partially connected buses to minimize the connection bus segments. Further, a scheduling algorithm is presented in this paper to minimize the intercluster communication needs for the given application and to reduce the minimum number of partially connected buses required in the intercluster connection network under schedule length constraint. Experimental results indicate that an average reduction up to 50.6% in the number of minimum required buses and an average reduction of 64.5% in bus segments can be achieved compared to commonly used intercluster communication aware scheduling techniques and as soon as possible (ASAP) data transfer scheme.