Performance analysis of a synchronous, circuit-switched interconnection cached network
ICS '94 Proceedings of the 8th international conference on Supercomputing
Parallel empirical pseudopotential electronic structure calculations for million atom systems
Journal of Computational Physics
Covering edges by cliques with regard to keyword conflicts and intersection graphs
Communications of the ACM
High-cost CFD on a low-cost cluster
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Gemini: An Optical Interconnection Network for Parallel Processing
IEEE Transactions on Parallel and Distributed Systems
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
An empirical performance evaluation of scalable scientific applications
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Separated high-bandwidth and low-latency communication in the cluster interconnect Clint
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems
ACM Transactions on Mathematical Software (TOMS)
Scientific Computations on Modern Parallel Vector Systems
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Low Diameter Interconnections for Routing in High-Performance Parallel Systems
IEEE Transactions on Computers
Power saving in regular interconnection networks
Parallel Computing
Dynamic power saving in fat-tree interconnection networks using on/off links
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A compiler-based communication analysis approach for multiprocessor systems
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
NoC-aware cache design for multithreaded execution on tiled chip multiprocessors
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Modeling Billion-Node Torus Networks Using Massively Parallel Discrete-Event Simulation
PADS '11 Proceedings of the 2011 IEEE Workshop on Principles of Advanced and Distributed Simulation
Network-theoretic classification of parallel computation patterns
International Journal of High Performance Computing Applications
Performance analysis of an optical circuit switched network for peta-scale systems
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Power-aware fat-tree networks using on/off links
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Multiclass classification of distributed memory parallel computations
Pattern Recognition Letters
International Journal of Embedded and Real-Time Communication Systems
Identifying HPC codes via performance logs and machine learning
Proceedings of the first workshop on Changing landscapes in HPC security
A synthetic task model for HPC-grade optical network performance evaluation
IA^3 '13 Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms
Hi-index | 0.00 |
The path towards realizing peta-scale computing is increasingly dependent on scaling up to unprecedented numbers of processors. To prevent the interconnect architecture between processors from dominating the overall cost of such systems, there is a critical need for interconnect solutions that both provide performance to ulta-scale applications and have costs that scale linearly with system size. In this work we propose the Hybrid Flexibly Assignable Switch Topology (HFAST) infrastructure. The HFAST approach uses both passive (circuit switch) and active (packet switch) commodity switch components to deliver all of the flexibility and fault-tolerance of a fully-interconnected network (such as a fat-tree), while preserving the nearly linear cost scaling associated with traditional low-degree interconnect networks. To understand the applicability of this technology, we perform an in-depth study of communication requirements across a broad spectrum of important scientific applications, whose computational methods include: finite-difference, latticebolzmann, particle in cell, sparse linear algebra, particle mesh ewald, and FFT-based solvers. We use the IPM (Integrated Performance Monitoring) profiling layer to gather detailed messaging statistics with minimal impact to code performance. This profiling provides us sufficiently detailed communication topology and message volume data to evaluate these applications in the context of the proposed hybrid interconnect. Overall results show that HFAST is a promising approach for practically addressing the interconnect requirements of future peta-scale systems.