The input/output complexity of sorting and related problems
Communications of the ACM
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Theoretical aspects of VLSI pin limitations
SIAM Journal on Computing
The Area-Time Complexity of Binary Multiplication
Journal of the ACM (JACM)
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
High performance computing in lattice QCD
Parallel Computing - Special issue on high performance computing in lattice QCD
The 1 Teraflops QCDSP computer
Parallel Computing - Special issue on high performance computing in lattice QCD
Parallel Computing - Special issue on high performance computing in lattice QCD
Information transfer and area-time tradeoffs for VLSI multiplication
Communications of the ACM
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Seamless Integration of Parallelism and Memory Hierarchy
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
On the Space and Access Complexity of Computation DAGs
WG '00 Proceedings of the 26th International Workshop on Graph-Theoretic Concepts in Computer Science
I/O complexity: The red-blue pebble game
STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
Some complexity questions related to distributive computing(Preliminary Report)
STOC '79 Proceedings of the eleventh annual ACM symposium on Theory of computing
A complexity theory for VLSI
Blue Gene: a vision for protein science using a petaflop supercomputer
IBM Systems Journal - Deep computing for the life sciences
The Vector Floating-Point Unit in a Synergistic Processor Element of a CELL Processor
ARITH '05 Proceedings of the 17th IEEE Symposium on Computer Arithmetic
Computational Aspects of VLSI
Hierarchical memory with block transfer
SFCS '87 Proceedings of the 28th Annual Symposium on Foundations of Computer Science
Design and implementation of message-passing services for the Blue Gene/L supercomputer
IBM Journal of Research and Development
Computing in Science and Engineering
Models for parallel and hierarchical computation
Proceedings of the 4th international conference on Computing frontiers
A Performance Model of Dense Matrix Operations on Many-Core Architectures
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
The bottom-up implementation of one MILC lattice QCD application on the cell blade
International Journal of Parallel Programming
Hi-index | 0.00 |
We explore the opportunities offered by current and forthcoming VLSI technologies to on-chip multiprocessing for Quantum Chromo Dynamics (QCD), a computational grand challenge for which over half a dozen specialized machines have been developed over the last two decades. Based on a careful study of the information exchange requirements of QCD both across the network and within the memory system, we derive the optimal partition of die area between storage and functional units. We show that a scalable chip organization holds the promise to deliver from hundreds to thousands flop per cycle as VLSI feature size scales down from 90 nm to 20 nm, over the next dozen years.