Characterizing the d-TLB behavior of SPEC CPU2000 benchmarks

Authors:
Gokul B. Kandiraju;Anand Sivasubramaniam
Affiliations:
The Pennsylvania State University, University Park, PA;The Pennsylvania State University, University Park, PA
Venue:
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Year:
2002

Citing 26
Cited 12

The interaction of architecture and operating system design

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A simulation based study of TLB performance

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Design tradeoffs for software-managed TLBs

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Architectural support for translation table management in large address space machines

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Efficient simulation of caches under optimal replacement with applications to miss characterization

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Surpassing the TLB performance of superpages with less operating system support

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
A software-controlled prefetching mechanism for software-managed TLBs

Microprocessing and Microprogramming
Performance of the VAX-11/780 translation buffer: simulation and measurement

ACM Transactions on Computer Systems (TOCS)
High-bandwidth address translation for multiple-issue processors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Using the SimOS machine simulator to study complex computer systems

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Increasing TLB reach using superpages backed by shadow memory

Proceedings of the 25th annual international symposium on Computer architecture
A look at several memory management units, TLB-refill mechanisms, and page table organizations

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Optimizing the idle task and other MMU tricks

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Prefetching Using Markov Predictors

IEEE Transactions on Computers - Special issue on cache memory and related problems
Online superpage promotion revisited (poster session)

Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Recency-based TLB preloading

Proceedings of the 27th annual international symposium on Computer architecture
Integrating superscalar processor components to implement register caching

ICS '01 Proceedings of the 15th international conference on Supercomputing
Going the distance for TLB prefetching: an application-driven study

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
Workload Characterization for Computer System Design

Workload Characterization for Computer System Design
Cache performance for selected SPEC CPU2000 benchmarks

ACM SIGARCH Computer Architecture News
SPEC CPU2000: Measuring CPU Performance in the New Millennium

Computer
Virtual Memory in Contemporary Microprocessors

IEEE Micro
Reevaluating Online Superpage Promotion with Hardware Support

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Reducing DRAM Latencies with an Integrated Memory Hierarchy Design

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Use of superpages and subblocking in the address translation hierarchy

Use of superpages and subblocking in the address translation hierarchy

Going the distance for TLB prefetching: an application-driven study

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Practical, transparent operating system support for superpages

ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
Practical, transparent operating system support for superpages

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
A comprehensive study of hardware/software approaches to improve TLB performance for java applications on embedded systems

Proceedings of the 2006 workshop on Memory system performance and correctness
SPEC CPU2006 sensitivity to memory page sizes

ACM SIGARCH Computer Architecture News
Performance Characterization of Itanium® 2-Based Montecito Processor

Proceedings of the 2009 SPEC Benchmark Workshop on Computer Performance Evaluation and Benchmarking
Using 4KB page size for virtual memory is obsolete

IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Finding representative workloads for computer system design

Finding representative workloads for computer system design
Inter-core cooperative TLB for chip multiprocessors

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Synergistic TLBs for High Performance Address Translation in Chip Multiprocessors

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
IOMMU: strategies for mitigating the IOTLB bottleneck

ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Despite the numerous optimization and evaluation studies that have been conducted with TLBs over the years, there is still a deficiency in an indepth understanding of TLB characteristics from an application angle. This paper presents a detailed characterization study of the TLB behavior of the SPEC CPU2000 benchmark suite. The contributions of this work are in identifying important application characteristics for TLB studies, quantifying the SPEC2000 application behavior for these characteristics, as well as making pronouncements and suggestions for future research based on these results.Around one-fourth of the SPEC2000 applications (ammp, apsi, galgel, lucas, mcf, twolf and vpr) have significant TLB missrates. Both capacity and associativity are influencing factors on miss-rates, though they do not necessarily go hand-in-hand. Multi-level TLBs are definitely useful for these applications in cutting down access times without significant miss rate degradation. Superpaging to combine TLB entries may not be rewarding for many of these applications. Software management of TLBs in terms of determining what entries to prefetch, what entries to replace, and what entries to pin has a lot of potential to cut down miss rates considerably. Specifically, the potential benefits of prefetching TLB entries is examined, and Distance Prefetching is shown to give good prediction accuracy for these applications.