Page placement algorithms for large real-indexed caches

Authors:
R. E. Kessler;Mark D. Hill
Affiliations:
-;-
Venue:
ACM Transactions on Computer Systems (TOCS)
Year:
1992

Citing 20
Cited 74

Static grouping of small objects to enhance performance of a paged virtual memory

ACM Transactions on Computer Systems (TOCS)
Footprints in the cache

ACM Transactions on Computer Systems (TOCS)
Coherency for multiprocessor virtual address caches

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
On the inclusion properties for multi-level cache hierarchies

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Multiprocessor cache analysis using ATUM

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
An analytical cache model

ACM Transactions on Computer Systems (TOCS)
Program optimization for instruction caches

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Characteristics of performance-optimal multi-level cache hierarchies

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Organization and performance of a two-level virtual-real cache hierarchy

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Achieving high instruction cache performance with an optimizing compiler

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Evaluating Associativity in CPU Caches

IEEE Transactions on Computers
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Analysis of multi-megabyte secondary CPU cache memories

Analysis of multi-megabyte secondary CPU cache memories
Generation and analysis of very long address traces

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The TLB slice—a low-cost high-speed address translation mechanism

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Cache Memories

ACM Computing Surveys (CSUR)
The working set model for program behavior

Communications of the ACM
Converting a swap-based system to do paging in an architecture lacking page-referenced bits

SOSP '81 Proceedings of the eighth ACM symposium on Operating systems principles
The use of static column ram as a memory hierarchy

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Page allocation to reduce access time of physical caches

Page allocation to reduce access time of physical caches

The impact of operating system structure on memory system performance

SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Trap-driven simulation with Tapeworm II

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Avoiding conflict misses dynamically in large direct-mapped caches

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Surpassing the TLB performance of superpages with less operating system support

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Memory system performance of programs with intensive heap allocation

ACM Transactions on Computer Systems (TOCS)
On micro-kernel construction

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Exokernel: an operating system architecture for application-level resource management

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
The measured performance of personal computer operating systems

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Instruction fetching: coping with code bloat

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A system level perspective on branch architecture performance

Proceedings of the 28th annual international symposium on Microarchitecture
The measured performance of personal computer operating systems

ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
Toward real microkernels

Communications of the ACM
Whole-program optimization for time and space efficient threads

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Thread scheduling for cache locality

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Compiler-directed page coloring for multiprocessors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Trap-driven memory simulation with Tapeworm II

ACM Transactions on Modeling and Computer Simulation (TOMACS)
System support for automatic profiling and optimization

Proceedings of the sixteenth ACM symposium on Operating systems principles
An empirical study of the effects of careful page placement in Linux

ACM-SE 36 Proceedings of the 36th annual Southeast regional conference
An analysis of database workload performance on simultaneous multithreaded processors

Proceedings of the 25th annual international symposium on Computer architecture
Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Performance counters and state sharing annotations: a unified approach to thread locality

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Functional Implementation Techniques for CPU Cache Memories

IEEE Transactions on Computers - Special issue on cache memory and related problems
Memory forwarding: enabling aggressive layout optimizations by guaranteeing the safety of data relocation

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Reducing cache misses using hardware and software page placement

ICS '99 Proceedings of the 13th international conference on Supercomputing
A fully associative software-managed cache design

Proceedings of the 27th annual international symposium on Computer architecture
Power aware page allocation

ACM SIGPLAN Notices
Characterizing the memory behavior of Java workloads: a structured view and opportunities for optimizations

Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Exact analysis of the cache behavior of nested loops

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Power aware page allocation

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Improving Performance of Large Physically Indexed Caches by Decoupling Memory Addresses from Cache Addresses

IEEE Transactions on Computers
Cache Profiling and the SPEC Benchmarks: A Case Study

Computer
Cache Conscious Algorithms for Relational Query Processing

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Improving performance by cache driven memory management

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
U-cache: a cost-effective solution to synonym problem

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Optimizing instruction cache performance for operating system intensive workloads

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Improving the Data Cache Performance of Multiprocessor Operating Systems

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Practical, transparent operating system support for superpages

ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
Xen and the art of virtualization

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Efficient and Accurate Analytical Modeling of Whole-Program Data Cache Behavior

IEEE Transactions on Computers
Practical, transparent operating system support for superpages

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Lookahead page placement

ACM-SE 33 Proceedings of the 33rd annual on Southeast regional conference
Cache conflict resolution through detection, analysis and dynamic remapping of active pages

ACM-SE 38 Proceedings of the 38th annual on Southeast regional conference
Reducing energy of virtual cache synonym lookup using bloom filters

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
A flexible data to L2 cache mapping approach for future multicore processors

Proceedings of the 2006 workshop on Memory system performance and correctness
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
What can performance counters do for memory subsystem analysis?

Proceedings of the 2008 ACM SIGPLAN workshop on Memory systems performance and correctness: held in conjunction with the Thirteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '08)
Investigating Cache Parameters of x86 Family Processors

Proceedings of the 2009 SPEC Benchmark Workshop on Computer Performance Evaluation and Benchmarking
Towards practical page coloring-based multicore cache management

Proceedings of the 4th ACM European conference on Computer systems
Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Enabling software management for multicore caches with a lightweight hardware support

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Pseudo-LIFO: the foundation of a new family of replacement policies for last-level caches

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Reducing performance non-determinism via cache-aware page allocation strategies

Proceedings of the first joint WOSP/SIPEW international conference on Performance engineering
Micro-pages: increasing DRAM efficiency with locality-aware data placement

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Enigma: architectural and operating system support for reducing the impact of address translation

Proceedings of the 24th ACM International Conference on Supercomputing
Handling the problems and opportunities posed by multiple on-chip memory controllers

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Hardware execution throttling for multi-core resource management

USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Software-hardware cooperative DRAM bank partitioning for chip multiprocessors

NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
ULCC: a user-level facility for optimizing shared cache performance on multicores

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Cache index-aware memory allocation

Proceedings of the international symposium on Memory management
Controlling cache utilization of HPC applications

Proceedings of the international conference on Supercomputing
A case for globally shared-medium on-chip interconnect

Proceedings of the 38th annual international symposium on Computer architecture
A design space exploration of transmission-line links for on-chip interconnect

Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Page coloring synchronization for improving cache performance in virtualization environment

ICCSA'11 Proceedings of the 2011 international conference on Computational science and its applications - Volume Part III
W-Order scan: minimizing cache pollution by application software level cache management for MMDB

WAIM'11 Proceedings of the 12th international conference on Web-age information management
A cache-pinning strategy for improving generational garbage collection

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Unshackle the cloud!

HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Code-based cache partitioning for improving hardware cache performance

Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
New memory organizations for 3d DRAM and PCMs

ARCS'12 Proceedings of the 25th international conference on Architecture of Computing Systems
Reducing last level cache pollution through OS-level software-controlled region-based partitioning

Proceedings of the 27th Annual ACM Symposium on Applied Computing
Resource-freeing attacks: improve your cloud performance (at your neighbor's expense)

Proceedings of the 2012 ACM conference on Computer and communications security
A multi-core memory organization for 3-d DRAM as main memory

ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
A survey on cache tuning from a power/energy perspective

ACM Computing Surveys (CSUR)
Coloring the cloud for predictable performance

Proceedings of the 4th annual Symposium on Cloud Computing
On modeling contention for shared caches in multi-core processors with techniques from ecology

Natural Computing: an international journal

Quantified Score

Hi-index	0.03

Visualization

Abstract

When a computer system supports both paged virtual memory and large real-indexed caches, cache performance depends in part on the main memory page placement. To date, most operating systems place pages by selecting an arbitrary page frame from a pool of page frames that have been made available by the page replacement algorithm. We give a simple model that shows that this naive (arbitrary) page placement leads to up to 30% unnecessary cache conflicts. We develop several page placement algorithms, called careful-mapping algorithms, that try to select a page frame (from the pool of available page frames) that is likely to reduce cache contention. Using trace-driven simulation, we find that careful mapping results in 10–20% fewer (dynamic) cache misses than naive mapping (for a direct-mapped real-indexed multimegabyte cache). Thus, our results suggest that careful mapping by the operating system can get about half the cache miss reduction that a cache size (or associativity) doubling can.