Memory bank and register allocation in software synthesis for ASIPs
ICCAD '95 Proceedings of the 1995 IEEE/ACM international conference on Computer-aided design
Design of an ASIP architecture for low-level visual elaborations
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Realization of a programmable parallel DSP for high performance image processing applications
DAC '98 Proceedings of the 35th annual Design Automation Conference
Minimizing the required memory bandwidth in VLSI system realizations
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Optimizing Power in ASIC Behavioral Synthesis
IEEE Design & Test
A Partitioning Programming Environment for a Novel Parallel Architecture
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Simultaneous Scheduling, Binding and Floorplanning for Interconnect Power Optimization
VLSID '99 Proceedings of the 12th International Conference on VLSI Design - 'VLSI for the Information Appliance'
High-level DSP synthesis using concurrent transformations, scheduling, and allocation
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hi-index | 0.00 |
The design of high-throughput large-state Viterbi decoders relies on the use of multiple arithmetic units. The global communication channels among these parallel processors often consist of long interconnect wires, resulting in large area and high power consumption. In this paper, we propose a data-transfer oriented design methodology to implement a low-power 256-state rate-1/3 IS95 Viterbi decoder. Our architectural level scheme uses operation partitioning, packing, and scheduling to analyze and optimize interconnect effects in early design stages. In comparison with other published Viterbi decoders, our approach reduces the global data transfers by up to 75% and decreases the amount of global buses by up to 48%, while enabling the use of deeply pipelined datapaths with no data forwarding. In the RTL implementation of the individual processors, we apply precomputation in conjunction with saturation arithmetic to further reduce power dissipation with provably no coding performance degradation. Designed using a 0.25 &mgr; standard cell library, our decoder achieves a throughput of 20 Mbps in simulation and dissipates only 450 mW.