A new polynomial-time algorithm for linear programming
Combinatorica
Fibonacci heaps and their uses in improved network optimization algorithms
Journal of the ACM (JACM)
Combinatorial optimization: algorithms and complexity
Combinatorial optimization: algorithms and complexity
IEEE Transactions on Parallel and Distributed Systems
The Transmogrifier-2: a 1 million gate rapid-prototyping system
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Lx: a technology platform for customizable VLIW embedded processing
Proceedings of the 27th annual international symposium on Computer architecture
Phase coupled operation assignment for VLIW processors with distributed register files
Proceedings of the 14th international symposium on Systems synthesis
Instruction scheduling for clustered VLIW architectures
ISSS '00 Proceedings of the 13th international symposium on System synthesis
Embedded Multiprocessors: Scheduling and Synchronization
Embedded Multiprocessors: Scheduling and Synchronization
Link contention-constrained scheduling and mapping of tasks
Cluster Computing
Design Challenges for New Application-Specific Processors
IEEE Design & Test
Scheduling Data-Flow Graphs via Retiming and Unfolding
IEEE Transactions on Parallel and Distributed Systems
A New Clustering Algorithm for Large Communication Delays
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Inter-Cluster Communication Models for Clustered VLIW Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Lu decomposition on a multiprocessing system with communications delay
Lu decomposition on a multiprocessing system with communications delay
Joint Application Mapping/Interconnect Synthesis Techniques for Embedded Chip-Scale Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Loop scheduling with timing and switching-activity minimization for VLIW DSP
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Impact of intercluster communication mechanisms on ILP in clustered VLIW architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Relative Performance of Scheduling Algorithms in Grid Environments
CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Maximize Parallelism Minimize Overhead for Nested Loops via Loop Striping
Journal of VLSI Signal Processing Systems
Rotation scheduling: a loop pipelining algorithm
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Proceedings of the 47th Design Automation Conference
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Hi-index | 35.69 |
Signal processing applications have high instruction level parallelism (ILP) and real-time performance requirements. Embedded and application specific multicluster architecture is desirable to provide the large computation power that these applications need. As technology moves to deep submicron level, it becomes more important and challenging to design an efficient intercluster connection network to satisfy the rapid growing intercluster data transfer needs under the power and cost constraints. This paper addresses the automatic generation of intercluster connection network with partially connected buses. An application specific approach is proposed in this paper to determine the minimum number of required partially connected buses without performance degradation for a given schedule in polynomial time. The intercluster connection topology is then generated with the determined minimum number of partially connected buses to minimize the connection bus segments. Further, a scheduling algorithm is presented in this paper to minimize the intercluster communication needs for the given application and to reduce the minimum number of partially connected buses required in the intercluster connection network under schedule length constraint. Experimental results indicate that an average reduction up to 50.6% in the number of minimum required buses and an average reduction of 64.5% in bus segments can be achieved compared to commonly used intercluster communication aware scheduling techniques and as soon as possible (ASAP) data transfer scheme.