VLSI circuit design concept for parallel iterative algorithms in nanoscale

Authors:
Chi-Chia Sun;Jürgen Götze
Affiliations:
Dortmund University of Technology, Information Processing Lab, Dortmund, Germany;Dortmund University of Technology, Information Processing Lab, Dortmund, Germany
Venue:
ISCIT'09 Proceedings of the 9th international conference on Communications and information technologies
Year:
2009

Citing 7
Cited 0

An algorithm and architecture based on orthonormal &mgr;-rotations for computing the symmetric EVD

Integration, the VLSI Journal - Special issue: algorithms and parallel VLSI architectures
An Efficient Jacobi-Like Algorithm for Parallel Eigenvalue Computation

IEEE Transactions on Computers
The future of multiprocessor systems-on-chips

Proceedings of the 41st annual Design Automation Conference
Delay and Power Minimization in VLSI Interconnects with Spatio-Temporal Bus-Encoding Scheme

ISVLSI '07 Proceedings of the IEEE Computer Society Annual Symposium on VLSI
FreePDK: An Open-Source Variation-Aware Design Kit

MSE '07 Proceedings of the 2007 IEEE International Conference on Microelectronic Systems Education
Low-Complexity Link Microarchitecture for Mesochronous Communication in Networks-on-Chip

IEEE Transactions on Computers
The Future Is Low Power and Test

ETS '08 Proceedings of the 2008 13th European Test Symposium

Quantified Score

Hi-index	0.01

Visualization

Abstract

Modern VLSI manufacturing technology has kept shrinking down to the nanoscale level with a very fast trend. Integration with the advanced nano-technology now makes it possible to realize advanced parallel iterative algorithms directly which was almost impossible 10 years ago. In this paper, we want to discuss the influences of evolving VLSI technologies for iterative algorithms and present design strategies from an algorithmic and architectural point of view. We can simplify the parallel implementation of the iterative algorithm (i.e., processor elements of the multiprocessor array) in any way as long as the convergence is guaranteed. However, the modification of the algorithm (processors) usually increases the number of required iterations which also means that the switch activity of interconnects is increasing. We implemented a 3×3 Jacobi EVD array with the µ-CORDIC PE in both 0.18µm and 45nm technologies in order to further study the trade-off between the performance/complexity of processors and the load/throughput of interconnects. Our experimental results show that using the µ-CORDIC PE is beneficial concerning the design criteria since it yields smaller chip area, faster overall computation timing and less energy consumption per operation than the Full CORDIC PE.