Journal of Parallel and Distributed Computing - Special issue on parallel computing with optical interconnects
Design tradeoffs for tiled CMP on-chip networks
Proceedings of the 20th annual international conference on Supercomputing
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
A survey of the research on power management techniques for high-performance systems
Software—Practice & Experience
Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Silicon Nanophotonic Network-on-Chip Using TDM Arbitration
HOTI '10 Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects
Communication Requirements and Interconnect Optimization for High-End Scientific Applications
IEEE Transactions on Parallel and Distributed Systems
Achieving exascale computing through hardware/software co-design
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Virtual I/O caching: dynamic storage cache management for concurrent workloads
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Towards scalable I/O architecture for exascale systems
Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers
Efficient SIMD code generation for irregular kernels
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
In-situ I/O processing: a case for location flexibility
Proceedings of the sixth workshop on Parallel Data Storage
Towards a codelet-based runtime for exascale computing: position paper
Proceedings of the 2nd International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Energy-guided exploration of on-chip network design for exa-scale computing
Proceedings of the International Workshop on System Level Interconnect Prediction
NUMA-aware graph mining techniques for performance and energy efficiency
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Communication-avoiding parallel strassen: implementation and performance
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Turbine: a distributed-memory dataflow engine for extreme-scale many-task applications
Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies
Moving from petaflops to petadata
Communications of the ACM
Memory-conscious collective I/O for extreme scale HPC systems
Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
The Journal of Supercomputing
A communications simulation methodology for AMR codes using task dependency analysis
IA^3 '13 Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential
ACM Transactions on Architecture and Code Optimization (TACO)
Turbine: A Distributed-memory Dataflow Engine for High Performance Many-task Applications
Fundamenta Informaticae - Scalable Workflow Enactment Engines and Technology
Interactive ray casting of geodesic grids
EuroVis '13 Proceedings of the 15th Eurographics Conference on Visualization
Hi-index | 0.02 |
High Performance Computing architectures are expected to change dramatically in the next decade as power and cooling constraints limit increases in microprocessor clock speeds. Consequently computer companies are dramatically increasing on-chip parallelism to improve performance. The traditional doubling of clock speeds every 18-24 months is being replaced by a doubling of cores or other parallelism mechanisms. During the next decade the amount of parallelism on a single microprocessor will rival the number of nodes in early massively parallel supercomputers that were built in the 1980s. Applications and algorithms will need to change and adapt as node architectures evolve. In particular, they will need to manage locality to achieve performance. A key element of the strategy as we move forward is the co-design of applications, architectures and programming environments. There is an unprecedented opportunity for application and algorithm developers to influence the direction of future architectures so that they meet DOE mission needs. This article will describe the technology challenges on the road to exascale, their underlying causes, and their effect on the future of HPC system design.