Synthesis algorithm for application-specific homogeneous processor networks

Authors:
Jason Cong;Karthik Gururaj;Guoling Han;Wei Jiang
Affiliations:
Computer Science Department, University of California, Los Angeles, CA;Computer Science Department, University of California, Los Angeles, CA;Computer Science Department, University of California, Los Angeles, CA;Computer Science Department, University of California, Los Angeles, CA
Venue:
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Year:
2009

Citing 16
Cited 2

Compile-time partitioning and scheduling of parallel programs

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Optimal clustering for delay minimization

DAC '93 Proceedings of the 30th international Design Automation Conference
Task scheduling in parallel and distributed systems

Task scheduling in parallel and distributed systems
The definition of dependence distance

ACM Transactions on Programming Languages and Systems (TOPLAS)
An architectural co-synthesis algorithm for distributed, embedded computing systems

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
TGFF: task graphs for free

Proceedings of the 6th international workshop on Hardware/software codesign
Genetic list scheduling algorithm for scheduling and allocation on a loosely coupled heterogeneous multiprocessor system

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Simultaneous circuit partitioning/clustering with retiming for performance optimization

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Optimal use of mixed task and data parallelism for pipelined computations

Journal of Parallel and Distributed Computing
Embedded Multiprocessors: Scheduling and Synchronization

Embedded Multiprocessors: Scheduling and Synchronization
Viper: A Multiprocessor SOC for Advanced Set-Top Box and Digital TV Systems

IEEE Design & Test
Design of heterogenous multi-processor embedded systems: applying functional pipelining

PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
An automated exploration framework for FPGA-based soft multiprocessor systems

CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Synthesis of an application-specific soft multiprocessor system

Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays
Module Clustering to Minimize Delay in Digital Networks

IEEE Transactions on Computers
MOGAC: a multiobjective genetic algorithm for hardware-software cosynthesis of distributed embedded systems

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Mapping of streaming applications considering alternative application specifications

ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
A general constraint-centric scheduling framework for spatial architectures

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The application-specific multiprocessor system-on-achip is a promising design alternative because of its high degree of flexibility, short development time, and potentially high performance attributed to application-specific optimizations. However, designing an optimal application-specific multiprocessor system is still challenging because there are a number of important metrics, such as throughput, latency, and resource usage, which need to be explored and optimized. This paper addresses the problem of synthesizing an application-specific multiprocessor system for stream-oriented embedded applications to minimize system latency under the throughput constraint. We employ a novel framework for this problem, similar to that of technology mapping in the logic synthesis domain, and develop a set of efficient algorithms, including labeling and clustering for efficient generation of the multiprocessor architecture with application-specific optimized latency. Specifically, the result of our algorithm is latency-optimal for directed acyclic task graphs. Application of our approach to the Motion JPEG example on Xilinx's Virtex II Pro platform FPGA shows interesting design tradeoffs.