Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
IEEE Transactions on Computers
Imagine: Media Processing with Streams
IEEE Micro
CAMP '97 Proceedings of the 1997 Computer Architectures for Machine Perception (CAMP '97)
A cellular computer to implement the kalman filter algorithm
A cellular computer to implement the kalman filter algorithm
Understanding the efficiency of GPU algorithms for matrix-matrix multiplication
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling
Proceedings of the 32nd annual international symposium on Computer Architecture
Interconnect-Aware Coherence Protocols for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Design tradeoffs for tiled CMP on-chip networks
Proceedings of the 20th annual international conference on Supercomputing
Design of a Massively Parallel Processor
IEEE Transactions on Computers
IEEE Transactions on Computers
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Parallel FFT Algorithms on Network-on-Chips
ITNG '08 Proceedings of the Fifth International Conference on Information Technology: New Generations
The Journal of Supercomputing
An energy and performance exploration of network-on-chip architectures
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Designing area and performance constrained SIMD/VLIW image processing architectures
ACIVS'05 Proceedings of the 7th international conference on Advanced Concepts for Intelligent Vision Systems
Hi-index | 0.01 |
In order to improve the performance of on-chip data communications in SIMD (Single Instruction Multiple Data) architecture, we propose an efficient and modular interconnection architecture called Broadcast and Permutation Mesh network (BP-Mesh) BP-Mesh architecture possesses not only low complexity and high bandwidth, but also well flexibility and scalability Detailed hardware implementation is discussed in the paper And the proposed architecture is evaluated in terms of area cost and performance.