A bandwidth-efficient architecture for media processing
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Kernel scheduling in reconfigurable computing
DATE '99 Proceedings of the conference on Design, automation and test in Europe
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
IEEE Transactions on Computers
Design and Implementation of the MorphoSys Reconfigurable ComputingProcessor
Journal of VLSI Signal Processing Systems - Special issue on VLSI on custom computing technology
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Microprocessors: Fundamentals and Applications
Microprocessors: Fundamentals and Applications
Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Design and Analysis of a Programmable Single-Chip Architecture for DVB-T Base-Band Receiver
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
MaRS: a macro-pipelined reconfigurable system
Proceedings of the 1st conference on Computing frontiers
Vectorization of Reed solomon decoding and mapping on the EVP
Proceedings of the conference on Design, automation and test in Europe
Reed-Solomon codec for a reconfigurable baseband processing platform
WiCOM'09 Proceedings of the 5th International Conference on Wireless communications, networking and mobile computing
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 0.00 |
This paper presents a software implementation of a very fast parallel Reed-Solomon decoder on the second generation of MorphoSys reconfigurable computation platform, which is targeting on streamed applications such as multimedia and DSP. Numerous modifications of the first-generation of the architecture have made a scalable computation and communication intensive architecture capable of extracting parallelisms of fine grain in instruction level. Many algorithms and the whole Digital Video Broadcasting base-band receiver as well, have been mapped onto the second architecture with impressing performance. The mapping of a Reed-Solomon decoder proposed in this paper highly parallelizes all of its sub-algorithms, including Syndrome Computation, Berlekamp Algorithm, Chein Search, and Error Value Computation, in a SIMD fashion. The mapping is tested on a cycle-accurate simulator, "Mulate", and the performance is encouragingly better than other architectures. The decoding speed of the RS (255,239,16) decoder using two different methods of GF multiplication can be 1.319Gbps and 2.534Gbps, respectively. Furthermore, since there is no functionality specifically tailored to Reed-Solomon decoder, the result has demonstrated the capability of MorphoSys architecture to extracting Instruction Level Parallelism from streamed applications.