Fibonacci heaps and their uses in improved network optimization algorithms
Journal of the ACM (JACM)
Combinatorial optimization: algorithms and complexity
Combinatorial optimization: algorithms and complexity
Rotation scheduling: a loop pipelining algorithm
DAC '93 Proceedings of the 30th international Design Automation Conference
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Lx: a technology platform for customizable VLIW embedded processing
Proceedings of the 27th annual international symposium on Computer architecture
Phase coupled operation assignment for VLIW processors with distributed register files
Proceedings of the 14th international symposium on Systems synthesis
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling
Proceedings of the 32nd annual international symposium on Computer Architecture
Impact of intercluster communication mechanisms on ILP in clustered VLIW architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Relative Performance of Scheduling Algorithms in Grid Environments
CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Evaluation of bus based interconnect mechanisms in clustered VLIW architectures
International Journal of Parallel Programming
Hi-index | 0.00 |
High Instruction-Level-Parallelism in DSP and media applications demands highly clustered architecture. It is challenge to design an efficient, flexible yet cost saving interconnection network to satisfy the rapid increasing inter-cluster data transfer needs. This paper presents a computation and data transfer co-scheduling technique to minimize the number of partially connected interconnection buses required for a given embedded application while minimizing its schedule length. Previous researches in this area focused on scheduling computations to minimize the number of inter-cluster data transfers. The proposed co-scheduling technique in this paper not only schedules computations to reduce the number of inter-cluster data transfers, but also schedules inter-cluster data transfers to minimize the number of required partially connected buses for inter-cluster connection network. Experimental results indicate that 39.4% fewer buses required compared to current best known technique while achieving the same schedule length minimization.