A platform for developing adaptable multicore applications
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
International Journal of High Performance Computing Applications
Towards optimizing energy costs of algorithms for shared memory architectures
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
FAWNdamentally power-efficient clusters
HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
Cache injection for parallel applications
Proceedings of the 20th international symposium on High performance distributed computing
Hardware-software coherence protocol for the coexistence of caches and local memories
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
International Journal of Distributed Systems and Technologies
Assessing the effects of data compression in simulations using physically motivated metrics
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.01 |
Since the first vector supercomputers in the mid-1970's, the largest scale applications have traditionally been floating point oriented numerical codes, which can be broadly characterized as the simulation of physics on a computer. Supercomputer architectures have evolved to meet the needs of those applications. Specifically, the computational work of the application tends to be floating point oriented, and the decomposition of the problem two or three dimensional. Today, an emerging class of critical applications may change those assumptions: they are combinatorial in nature, integer oriented, and irregular. The performance of both classes of applications is dominated by the performance of the memory system. This paper compares the memory performance sensitivity of both traditional and emerging HPC applications, and shows that the new codes are significantly more sensitive to memory latency and bandwidth than their traditional counterparts. Additionally, these codes exhibit lower base-line performance, which only exacerbates the problem. As a result, the construction of future supercomputer architectures to support these applications will most likely be different from those used to support traditional codes. Quantitatively understanding the difference between the two workloads will form the basis for future design choices.