Discrete cosine transform: algorithms, advantages, applications
Discrete cosine transform: algorithms, advantages, applications
Partitioning by regularity extraction
DAC '92 Proceedings of the 29th ACM/IEEE Design Automation Conference
Estimation of lower bounds in scheduling algorithms for high-level synthesis
ACM Transactions on Design Automation of Electronic Systems (TODAES)
A bandwidth-efficient architecture for media processing
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Lower bound on latency for VLIW ASIP datapaths
ICCAD '99 Proceedings of the 1999 IEEE/ACM international conference on Computer-aided design
Accelerator Data-Path Synthesis for High-Throughput Signal Processing Applications
Accelerator Data-Path Synthesis for High-Throughput Signal Processing Applications
Retargetable Compilers for Embedded Core Processors: Methods and Experience in Industrial Applications
Synthesis and Optimization of Digital Circuits
Synthesis and Optimization of Digital Circuits
Code Generation for Embedded Processors
Code Generation for Embedded Processors
Digital Signal Processing: A Practical Approach
Digital Signal Processing: A Practical Approach
Billion-Transistor Architectures
Computer
High-quality operation binding for clustered VLIW datapaths
Proceedings of the 38th annual Design Automation Conference
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors
PATMOS '02 Proceedings of the 12th International Workshop on Integrated Circuit Design. Power and Timing Modeling, Optimization and Simulation
Dual-pipeline heterogeneous ASIP design
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Customization of application specific heterogeneous multi-pipeline processors
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Application specific forwarding network and instruction encoding for multi-pipe ASIPs
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Impact of intercluster communication mechanisms on ILP in clustered VLIW architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Platform-based resource binding using a distributed register-file microarchitecture
Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Design of a low power pre-synchronization ASIP for multimode SDR terminals
SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
Hi-index | 0.00 |
VLIW ASIPs provide an attractive solution for increasingly pervasive real-time multimedia and signal processing embedded applications. In this paper we propose an algorithm to support trade-off exploration during the early phases of the design/specialization of VLIW ASIPs with clustered datapaths. For purposes of an early exploration step, we define a parameterized family of clustered datapaths D(m,n), where m and n denote interconnect capacity and cluster capacity constraints on the family. Given a kernel, the proposed algorithm explores the space of feasible clustered datapaths and returns: a datapath configuration; a binding and scheduling for the operations; and a corresponding estimate for the best achievable latency over the specified family. Moreover, we show how the parameters m and n, as well as a target latency optionally specified by the designer, can be used to effectively explore trade-offs among delay, power/energy, and latency. Extensive empirical evidence is provided showing that the proposed approach is strikingly effective at attacking this complex optimization problem.