Memory bank and register allocation in software synthesis for ASIPs
ICCAD '95 Proceedings of the 1995 IEEE/ACM international conference on Computer-aided design
Design of an ASIP architecture for low-level visual elaborations
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Realization of a programmable parallel DSP for high performance image processing applications
DAC '98 Proceedings of the 35th annual Design Automation Conference
Minimizing the required memory bandwidth in VLSI system realizations
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Optimizing Power in ASIC Behavioral Synthesis
IEEE Design & Test
A Partitioning Programming Environment for a Novel Parallel Architecture
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
A 550 Mb/s Radix-4 Bit-level Pipelined 16-State 0.25-µm CMOS Viterbi Decoder
ASAP '00 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors
Simultaneous Scheduling, Binding and Floorplanning for Interconnect Power Optimization
VLSID '99 Proceedings of the 12th International Conference on VLSI Design - 'VLSI for the Information Appliance'
A versatile architecture for VLSI implementation of the Viterbi algorithm
ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
General in-place scheduling for the Viterbi algorithm
ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
IC design of an adaptive Viterbi decoder
IEEE Transactions on Consumer Electronics
High-level DSP synthesis using concurrent transformations, scheduling, and allocation
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Compatibility path based binding algorithm for interconnect reduction in high level synthesis
Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design
A global interconnect reduction technique during high level synthesis
Proceedings of the 2010 Asia and South Pacific Design Automation Conference
Parallel high-throughput limited search trellis decoder VLSI design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Multiple LDPC-Encoder Layered Space-Time-Frequency Architectures for OFDM MIMO Multiplexing
Wireless Personal Communications: An International Journal
Hi-index | 0.00 |
The design of high-throughput large-state Viterbi decoders relies on the use of multiple arithmetic units. The global communication channels among these parallel processors often consist of long interconnect wires, resulting in large area and high power consumption. In this paper, we propose a data transfer oriented design methodology to implement a low-power 256-state rate-1/3 Viterbi decoder. Our architectural level scheme uses operation partitioning, packing, and scheduling to analyze and optimize interconnect effects in early design stages. In comparison with other published Viterbi decoders, our approach reduces the global data transfers by up to 75% and decreases the amount of global buses by up to 48%, while enabling the use of deeply pipelined datapaths with no data forwarding. In the register-transfer level (RTL) implementation, we apply precomputation in conjunction with saturation arithmetic to further reduce power dissipation with provably no coding performance degradation. Designed using a 0.25 µm standard cell library, our decoder achieves a throughput of 20 Mb/s in simulation and dissipates only 0.45 W.