Communications of the ACM - Special section on computer architecture
The connection machine
Fundamentals of Logic Design
Computer Architecture and Parallel Processing
Computer Architecture and Parallel Processing
Computer Vision
Special Computer Architectures for Pattern Processing
Special Computer Architectures for Pattern Processing
Mathematical theory of multistage interconnection networks analysis
Mathematical theory of multistage interconnection networks analysis
A network-topology independent task allocation strategy for parallel computers
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
IPF for real-time image processing on massively parallel architectures
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Computer Vision Algorithms on Reconfigurable Logic Arrays
IEEE Transactions on Parallel and Distributed Systems
Pipelined Data Parallel Algorithms-I: Concept and Modeling
IEEE Transactions on Parallel and Distributed Systems
A Sliding Memory Plane Array Processor
IEEE Transactions on Parallel and Distributed Systems
Implementation of a SliM Array Processor
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
HPC-Colony: services and interfaces for very large systems
ACM SIGOPS Operating Systems Review
Paper: Nearest neighbor classification on two types of SIMD machines
Parallel Computing
Topology-aware task mapping for reducing communication contention on large parallel machines
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hi-index | 14.98 |
Several parallel convolution algorithms for array processors with N/sup 2/ processing elements (PEs) connected by mesh, hypercube, and shuffle-exchange topologies, respectively, are presented. The computation time complexity is the same for array processors with different interconnection networks. The communication time complexity, however, varies from network to network, and is the main focus. It is shown that by using inter-PE communication networks efficiently, each PE requires only a small local memory, many unnecessary data transmissions are eliminated, and the overall time complexity (including computation and communication) of algorithms is reduced to O(M/sup 2/).