Warp architecture and implementation
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Computer Structures: Principles and Examples
Computer Structures: Principles and Examples
Structure of Computers and Computations
Structure of Computers and Computations
I/O complexity: The red-blue pebble game
STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
On a high-performance vlsi solution to database problems
On a high-performance vlsi solution to database problems
Warp architecture and implementation
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
A study parallel disk organizations
ACM SIGARCH Computer Architecture News
Interprocessor communication speed and performance in distributed-memory parallel processors
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
An Evaluation of Multiple-Disk I/O Systems
IEEE Transactions on Computers
On the Design of a Unidirectional Systolic Array for Key Enumeration
IEEE Transactions on Computers
Software and hardware parallelism on the iWarp multi-computer
ICS '91 Proceedings of the 5th international conference on Supercomputing
The K2 distributed memory parallel processor: architecture, compiler, and operating system
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Working sets, cache sizes, and node granularity issues for large-scale multiprocessors
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Future applicability of bus-based shared memory multiprocessors
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Warp architecture and implementation
25 years of the international symposia on Computer architecture (selected papers)
The K2 parallel processor: architecture and hardware implementation
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
A study of I/O behavior of perfect benchmarks on a multiprocessor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
SimpleFit: A Framework for Analyzing Design Trade-Offs in Raw Architectures
IEEE Transactions on Parallel and Distributed Systems
Design, Analysis, and Simulation of I/O Architectures for Hypercube Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Efficient Placement of Parity and Data to Tolerate Two Disk Failures in Disk Array Systems
IEEE Transactions on Parallel and Distributed Systems
Householder Bidiagonalization on Parallel Computers with Dynamic Ring Architecture
PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
Paper: Program compression on the instruction systolic array
Parallel Computing
Balance principles for algorithm-architecture co-design
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
On the communication complexity of 3D FFTs and its implications for Exascale
Proceedings of the 26th ACM international conference on Supercomputing
Hi-index | 0.01 |
One particular result is that to balance an array of p linearly connected PEs for performing matrix computations such as matrix multiplication and matrix triangularization, the size of each PE's local memory must grow linearly with p. Thus, the larger the array is, the larger each PE's local memory must be.