The potential of on-chip multiprocessing for QCD machines

Authors:
Gianfranco Bilardi;Andrea Pietracaprina;Geppino Pucci;Fabio Schifano;Raffaele Tripiccione
Affiliations:
Dipartimento di Ingegneria dell’Informazione, Università di Padova, Padova, Italy;Dipartimento di Ingegneria dell’Informazione, Università di Padova, Padova, Italy;Dipartimento di Ingegneria dell’Informazione, Università di Padova, Padova, Italy;Dipartimento di Fisica, Università di Ferrara, and INFN, Ferrara, Italy;Dipartimento di Fisica, Università di Ferrara, and INFN, Ferrara, Italy
Venue:
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Year:
2005

Citing 21
Cited 4

Area-time lower-bound techniques with applications to sorting

Algorithmica
The input/output complexity of sorting and related problems

Communications of the ACM
Introduction to parallel algorithms and architectures: array, trees, hypercubes

Introduction to parallel algorithms and architectures: array, trees, hypercubes
Theoretical aspects of VLSI pin limitations

SIAM Journal on Computing
The Area-Time Complexity of Binary Multiplication

Journal of the ACM (JACM)
The GF11 supercomputer

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
High performance computing in lattice QCD

Parallel Computing - Special issue on high performance computing in lattice QCD
The 1 Teraflops QCDSP computer

Parallel Computing - Special issue on high performance computing in lattice QCD
APEmille

Parallel Computing - Special issue on high performance computing in lattice QCD
Information transfer and area-time tradeoffs for VLSI multiplication

Communications of the ACM
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
Seamless Integration of Parallelism and Memory Hierarchy

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
On the Space and Access Complexity of Computation DAGs

WG '00 Proceedings of the 26th International Workshop on Graph-Theoretic Concepts in Computer Science
I/O complexity: The red-blue pebble game

STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
Some complexity questions related to distributive computing(Preliminary Report)

STOC '79 Proceedings of the eleventh annual ACM symposium on Theory of computing
A complexity theory for VLSI

A complexity theory for VLSI
Blue Gene: a vision for protein science using a petaflop supercomputer

IBM Systems Journal - Deep computing for the life sciences
The Vector Floating-Point Unit in a Synergistic Processor Element of a CELL Processor

ARITH '05 Proceedings of the 17th IEEE Symposium on Computer Arithmetic
Computational Aspects of VLSI

Computational Aspects of VLSI
Hierarchical memory with block transfer

SFCS '87 Proceedings of the 28th Annual Symposium on Foundations of Computer Science
Design and implementation of message-passing services for the Blue Gene/L supercomputer

IBM Journal of Research and Development

Computing for LQCD: apeNEXT

Computing in Science and Engineering
Models for parallel and hierarchical computation

Proceedings of the 4th international conference on Computing frontiers
A Performance Model of Dense Matrix Operations on Many-Core Architectures

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
The bottom-up implementation of one MILC lattice QCD application on the cell blade

International Journal of Parallel Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

We explore the opportunities offered by current and forthcoming VLSI technologies to on-chip multiprocessing for Quantum Chromo Dynamics (QCD), a computational grand challenge for which over half a dozen specialized machines have been developed over the last two decades. Based on a careful study of the information exchange requirements of QCD both across the network and within the memory system, we derive the optimal partition of die area between storage and functional units. We show that a scalable chip organization holds the promise to deliver from hundreds to thousands flop per cycle as VLSI feature size scales down from 90 nm to 20 nm, over the next dozen years.