The fast Fourier transform and its applications
The fast Fourier transform and its applications
The connection machines CM-1 and CM-2: solving nonlinear network problems
ICS '88 Proceedings of the 2nd international conference on Supercomputing
A Survey of Parallel Machine Organization and Programming
ACM Computing Surveys (CSUR)
Constraint analysis for code generation: basic techniques and applications in FACTS
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Computer Architecture and Parallel Processing
Computer Architecture and Parallel Processing
Imagine: Media Processing with Streams
IEEE Micro
CAMP '97 Proceedings of the 1997 Computer Architectures for Machine Perception (CAMP '97)
Power Efficient Processor Architecture and The Cell Processor
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Power complexity of multiplexer-based optoelectronic crossbar switches
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Designing area and performance constrained SIMD/VLIW image processing architectures
ACIVS'05 Proceedings of the 7th international conference on Advanced Concepts for Intelligent Vision Systems
Run-time reconfiguration of communication in SIMD architectures
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Reconfigurable communication networks in a parametric SIMD parallel system on chip
ARC'10 Proceedings of the 6th international conference on Reconfigurable Computing: architectures, Tools and Applications
Exploiting both pipelining and data parallelism with SIMD reconfigurable architecture
ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
Hi-index | 0.00 |
During the last two decades, Single Instruction Multiple Data (SIMD) processors have become important architectures in embedded systems for image processing applications. The main reasons are their area and energy efficiency. Often the processing elements (PEs) of an SIMD processor are only locally connected. This may result in a communication bottleneck (only access to direct neighbors). One way to solve this is to use a fully connected communication network (FC-SIMD) between PEs. However, this solution leads to an excessive communication area cost, low communication network utilization, and scalability problems. E.g., the area overhead of an FC-SIMD is more than 100% when the number of PEs gets bigger than 64. In this paper, we introduce a new type of SIMD architecture, called RC-SIMD, with a reconfigurable communication network. It uses a delay-line in the instruction bus, causing the accesses to the communication network to be distributed over time. This architecture requires only a very cheap communication network while performing almost the same as expensive FC-SIMD architectures. However, the new architecture causes irregular resource conflicts. We therefore introduce a conflict model that existing schedulers are able to cope with. Experimental results show that, on average (compared to locally connected SIMDs), RC-SIMD require 21% fewer cycles than architecture without the delay-line, while the area overhead is at most 10%.