Static grouping of small objects to enhance performance of a paged virtual memory
ACM Transactions on Computer Systems (TOCS)
ACM Transactions on Computer Systems (TOCS)
Coherency for multiprocessor virtual address caches
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
On the inclusion properties for multi-level cache hierarchies
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Multiprocessor cache analysis using ATUM
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
ACM Transactions on Computer Systems (TOCS)
Program optimization for instruction caches
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Characteristics of performance-optimal multi-level cache hierarchies
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Organization and performance of a two-level virtual-real cache hierarchy
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Achieving high instruction cache performance with an optimizing compiler
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Analysis of multi-megabyte secondary CPU cache memories
Analysis of multi-megabyte secondary CPU cache memories
Generation and analysis of very long address traces
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The TLB slice—a low-cost high-speed address translation mechanism
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ACM Computing Surveys (CSUR)
The working set model for program behavior
Communications of the ACM
Converting a swap-based system to do paging in an architecture lacking page-referenced bits
SOSP '81 Proceedings of the eighth ACM symposium on Operating systems principles
The use of static column ram as a memory hierarchy
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Page allocation to reduce access time of physical caches
Page allocation to reduce access time of physical caches
The impact of operating system structure on memory system performance
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Trap-driven simulation with Tapeworm II
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Avoiding conflict misses dynamically in large direct-mapped caches
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Surpassing the TLB performance of superpages with less operating system support
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Memory system performance of programs with intensive heap allocation
ACM Transactions on Computer Systems (TOCS)
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Exokernel: an operating system architecture for application-level resource management
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
The measured performance of personal computer operating systems
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Instruction fetching: coping with code bloat
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A system level perspective on branch architecture performance
Proceedings of the 28th annual international symposium on Microarchitecture
The measured performance of personal computer operating systems
ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
Communications of the ACM
Whole-program optimization for time and space efficient threads
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Thread scheduling for cache locality
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Compiler-directed page coloring for multiprocessors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Trap-driven memory simulation with Tapeworm II
ACM Transactions on Modeling and Computer Simulation (TOMACS)
System support for automatic profiling and optimization
Proceedings of the sixteenth ACM symposium on Operating systems principles
An empirical study of the effects of careful page placement in Linux
ACM-SE 36 Proceedings of the 36th annual Southeast regional conference
An analysis of database workload performance on simultaneous multithreaded processors
Proceedings of the 25th annual international symposium on Computer architecture
Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Performance counters and state sharing annotations: a unified approach to thread locality
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Functional Implementation Techniques for CPU Cache Memories
IEEE Transactions on Computers - Special issue on cache memory and related problems
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Reducing cache misses using hardware and software page placement
ICS '99 Proceedings of the 13th international conference on Supercomputing
A fully associative software-managed cache design
Proceedings of the 27th annual international symposium on Computer architecture
ACM SIGPLAN Notices
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Exact analysis of the cache behavior of nested loops
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
IEEE Transactions on Computers
Cache Conscious Algorithms for Relational Query Processing
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Improving performance by cache driven memory management
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
U-cache: a cost-effective solution to synonym problem
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Optimizing instruction cache performance for operating system intensive workloads
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Improving the Data Cache Performance of Multiprocessor Operating Systems
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Practical, transparent operating system support for superpages
ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
Xen and the art of virtualization
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Efficient and Accurate Analytical Modeling of Whole-Program Data Cache Behavior
IEEE Transactions on Computers
Practical, transparent operating system support for superpages
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
ACM-SE 33 Proceedings of the 33rd annual on Southeast regional conference
Cache conflict resolution through detection, analysis and dynamic remapping of active pages
ACM-SE 38 Proceedings of the 38th annual on Southeast regional conference
Reducing energy of virtual cache synonym lookup using bloom filters
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
A flexible data to L2 cache mapping approach for future multicore processors
Proceedings of the 2006 workshop on Memory system performance and correctness
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
What can performance counters do for memory subsystem analysis?
Proceedings of the 2008 ACM SIGPLAN workshop on Memory systems performance and correctness: held in conjunction with the Thirteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '08)
Investigating Cache Parameters of x86 Family Processors
Proceedings of the 2009 SPEC Benchmark Workshop on Computer Performance Evaluation and Benchmarking
Towards practical page coloring-based multicore cache management
Proceedings of the 4th ACM European conference on Computer systems
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Enabling software management for multicore caches with a lightweight hardware support
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Pseudo-LIFO: the foundation of a new family of replacement policies for last-level caches
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Reducing performance non-determinism via cache-aware page allocation strategies
Proceedings of the first joint WOSP/SIPEW international conference on Performance engineering
Micro-pages: increasing DRAM efficiency with locality-aware data placement
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Enigma: architectural and operating system support for reducing the impact of address translation
Proceedings of the 24th ACM International Conference on Supercomputing
Handling the problems and opportunities posed by multiple on-chip memory controllers
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Hardware execution throttling for multi-core resource management
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Software-hardware cooperative DRAM bank partitioning for chip multiprocessors
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
ULCC: a user-level facility for optimizing shared cache performance on multicores
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Cache index-aware memory allocation
Proceedings of the international symposium on Memory management
Controlling cache utilization of HPC applications
Proceedings of the international conference on Supercomputing
A case for globally shared-medium on-chip interconnect
Proceedings of the 38th annual international symposium on Computer architecture
A design space exploration of transmission-line links for on-chip interconnect
Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Page coloring synchronization for improving cache performance in virtualization environment
ICCSA'11 Proceedings of the 2011 international conference on Computational science and its applications - Volume Part III
W-Order scan: minimizing cache pollution by application software level cache management for MMDB
WAIM'11 Proceedings of the 12th international conference on Web-age information management
A cache-pinning strategy for improving generational garbage collection
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Code-based cache partitioning for improving hardware cache performance
Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
New memory organizations for 3d DRAM and PCMs
ARCS'12 Proceedings of the 25th international conference on Architecture of Computing Systems
Reducing last level cache pollution through OS-level software-controlled region-based partitioning
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Resource-freeing attacks: improve your cloud performance (at your neighbor's expense)
Proceedings of the 2012 ACM conference on Computer and communications security
A multi-core memory organization for 3-d DRAM as main memory
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
A survey on cache tuning from a power/energy perspective
ACM Computing Surveys (CSUR)
Coloring the cloud for predictable performance
Proceedings of the 4th annual Symposium on Cloud Computing
On modeling contention for shared caches in multi-core processors with techniques from ecology
Natural Computing: an international journal
Hi-index | 0.03 |
When a computer system supports both paged virtual memory and large real-indexed caches, cache performance depends in part on the main memory page placement. To date, most operating systems place pages by selecting an arbitrary page frame from a pool of page frames that have been made available by the page replacement algorithm. We give a simple model that shows that this naive (arbitrary) page placement leads to up to 30% unnecessary cache conflicts. We develop several page placement algorithms, called careful-mapping algorithms, that try to select a page frame (from the pool of available page frames) that is likely to reduce cache contention. Using trace-driven simulation, we find that careful mapping results in 10–20% fewer (dynamic) cache misses than naive mapping (for a direct-mapped real-indexed multimegabyte cache). Thus, our results suggest that careful mapping by the operating system can get about half the cache miss reduction that a cache size (or associativity) doubling can.