Investigating the TLB Behavior of High-end Scientific Applications on Commodity Microprocessors

Authors:
Collin McCurdy;Alan L. Coxa;Jeffrey Vetter
Affiliations:
Future Technologies Group, Oak Ridge National Laboratory, cmccurdy@ornl.gov;Department of Computer Science, Rice University, alc@rice.edu;Future Technologies Group, Oak Ridge National Laboratory, vetter@ornl.gov
Venue:
ISPASS '08 Proceedings of the ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software
Year:
2008

Citing 0
Cited 8

An Analysis of HPC Benchmarks in Virtual Machine Environments

Euro-Par 2008 Workshops - Parallel Processing
Multi-facet approach to reduce energy consumption in clouds and grids: the GREEN-NET framework

Proceedings of the 1st International Conference on Energy-Efficient Computing and Networking
Translation caching: skip, don't walk (the page table)

Proceedings of the 37th annual international symposium on Computer architecture
Synergistic TLBs for High Performance Address Translation in Chip Multiprocessors

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
SpecTLB: a mechanism for speculative address translation

Proceedings of the 38th annual international symposium on Computer architecture
Exploiting semantics of virtual memory to improve the efficiency of the on-chip memory system

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
CoLT: Coalesced Large-Reach TLBs

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Efficient virtual memory for big memory servers

Proceedings of the 40th Annual International Symposium on Computer Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

The floating point portion of the SPEC CPU suite and the HPC Challenge suite are widely recognized and utilized as benchmarks that represent scientific application behavior. In this work we show that while these benchmark suites may be representative of the cache behavior of production scientific applications, they do not accurately represent the TLB behavior of these applications. Furthermore, we demonstrate that the difference can have a significant impact on performance. In the first part of the paper we present results from implementation-independent trace-based simulations which demonstrate that benchmarks exhibit significantly different TLB behavior for a range of page sizes than a representative set of production applications. In the second part we validate these results on the AMD Opteron implementation of the x86 architecture, showing that false conclusions about choice of page size, drawn from benchmark performance, can result in performance degradations of up to nearly 50% for the production applications we investigated.