Compile-time partitioning and scheduling of parallel programs
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Optimal clustering for delay minimization
DAC '93 Proceedings of the 30th international Design Automation Conference
Task scheduling in parallel and distributed systems
Task scheduling in parallel and distributed systems
The definition of dependence distance
ACM Transactions on Programming Languages and Systems (TOPLAS)
An architectural co-synthesis algorithm for distributed, embedded computing systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Proceedings of the 6th international workshop on Hardware/software codesign
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Simultaneous circuit partitioning/clustering with retiming for performance optimization
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Embedded Multiprocessors: Scheduling and Synchronization
Embedded Multiprocessors: Scheduling and Synchronization
Design of heterogenous multi-processor embedded systems: applying functional pipelining
PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
An automated exploration framework for FPGA-based soft multiprocessor systems
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Exact and approximate task assignment algorithms for pipelined software synthesis
Proceedings of the conference on Design, automation and test in Europe
Synthesis of heterogeneous pipelined multiprocessor systems using ILP: jpeg case study
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
Throughput-driven synthesis of embedded software for pipelined execution on multicore architectures
ACM Transactions on Embedded Computing Systems (TECS)
A design flow for application specific heterogeneous pipelined multiprocessor systems
Proceedings of the 46th Annual Design Automation Conference
Synthesis algorithm for application-specific homogeneous processor networks
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Versatile task assignment for heterogeneous soft dual-processor platforms
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Proceedings of the 47th Design Automation Conference
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Resource-constrained multiprocessor synthesis for floating-point applications on FPGAs
ACM Transactions on Design Automation of Electronic Systems (TODAES)
System-Level Synthesis for Wireless Sensor Node Controllers: A Complete Design Flow
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Postscheduling buffer management trade-offs in streaming software synthesis
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special section on verification challenges in the concurrent world
Hi-index | 0.00 |
The application-specific multiprocessor System-on-a-Chip is a promising design alternative because of its high degree of flexibility, short development time, and potentially high performance attributed to application-specific optimizations. However, designing an optimal application-specific multiprocessor system is still challenging because there are a number of important metrics, such as throughput, latency, and resource usage, that need to be explored and optimized. This paper addresses the problem of synthesizing the application-specific multiprocessor system to minimize latency and resource usage under the throughput constraint. We employ a novel framework for this problem, similar to that of technology mapping in the logic synthesis domain, and develop a set of efficient algorithms, including labeling, clustering and packing, for efficient generation of the multiprocessor architecture with application-specific optimized latency and resources. Specifically, the result of our algorithm is latency-optimal for directed acyclic task graphs. Application of our approach to the Motion JPEG example on Xilinx's Virtex II Pro platform FPGA shows interesting design tradeoffs.