Reducing TLB and memory overhead using online superpage promotion
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
L1 data cache decomposition for energy efficiency
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Practical, transparent operating system support for superpages
ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
Energy efficient D-TLB and data cache using semantic-aware multilateral partitioning
Proceedings of the 2003 international symposium on Low power electronics and design
Adaptive Mechanisms and Policies for Managing Cache Hierarchies in Chip Multiprocessors
Proceedings of the 32nd annual international symposium on Computer Architecture
Multiple Page Size Modeling and Optimization
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
General purpose operating system support for multiple page sizes
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Implementation of multiple pagesize support in HP-UX
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
OS execution on multi-cores: is out-sourcing worthwhile?
ACM SIGOPS Operating Systems Review
Investigating the TLB Behavior of High-end Scientific Applications on Commodity Microprocessors
ISPASS '08 Proceedings of the ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software
SOS: A Software-Oriented Distributed Shared Cache Management Approach for Chip Multiprocessors
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
ORION 2.0: a fast and accurate NoC power and area model for early-stage design space exploration
Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
Different virtual memory regions (e.g., stack and heap) have different properties and characteristics. For example, stack data are thread-private by definition while heap data can be shared between threads. Compared with heap memory, stack memory tends to take a large number of accesses to a rather small number of pages. These facts have been largely ignored by designers. In this paper, we propose two novel designs that exploit stack memory's unique characteristics to optimize the on-chip memory system. The first design is Anticipatory Superpaging - automatically create superpages for stack memory at the first page fault in a potential superpage, increasing TLB reach and reducing TLB misses. It is transparent to applications and does not require kernel to employ online analysis algorithms and page copying. The second design is Stack-Aware Cache Placement - stack accesses are routed to their local slices in a distributed shared cache, while non-stack accesses are still routed using cacheline interleaving. The primary benefit of this mechanism is reduced power consumption of the on-chip interconnect. Our simulation shows that the first innovation reduces TLB misses by 10% - 20%, and the second one reduces interconnect power consumption by over 14%.