ACM Transactions on Computer Systems (TOCS)
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Adaptive storage management for very large virtual/real storage systems
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
The SPARC architecture manual: version 8
The SPARC architecture manual: version 8
MIPS RISC architectures
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
A class of replacement policies for medium and high-associativity structures
ACM SIGARCH Computer Architecture News
Page placement algorithms for large real-indexed caches
ACM Transactions on Computer Systems (TOCS)
A simulation based study of TLB performance
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Tradeoffs in supporting two page sizes
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Communications of the ACM
The Wisconsin Wind Tunnel: virtual prototyping of parallel computers
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Implementation of the CORAL deductive database system
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Tradeoffs in two-level on-chip caching
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Optimal allocation of on-chip memory for multiple-API operating systems
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Trap-driven simulation with Tapeworm II
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
The TLB slice—a low-cost high-speed address translation mechanism
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ACM Computing Surveys (CSUR)
ACM Computing Surveys (CSUR)
Communications of the ACM
Microprocessor Memory Management Units
IEEE Micro
Using cache memory to reduce processor-memory traffic
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Experimental evaluation of on-chip microprocessor cache memories
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
The multics system: an examination of its structure
The multics system: an examination of its structure
Trap-driven simulation with Tapeworm II
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
A new page table for 64-bit address spaces
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Reducing TLB and memory overhead using online superpage promotion
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
High-bandwidth address translation for multiple-issue processors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Reducing network latency using subpages in a global memory environment
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Trap-driven memory simulation with Tapeworm II
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Trace-driven memory simulation: a survey
ACM Computing Surveys (CSUR)
Increasing TLB reach using superpages backed by shadow memory
Proceedings of the 25th annual international symposium on Computer architecture
Options for dynamic address translation in COMAs
Proceedings of the 25th annual international symposium on Computer architecture
A look at several memory management units, TLB-refill mechanisms, and page table organizations
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Nonlinear array layouts for hierarchical memory systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
Boosting superpage utilization with the shadow memory and the partial-subblock TLB
Proceedings of the 14th international conference on Supercomputing
Proceedings of the 27th annual international symposium on Computer architecture
IEEE Transactions on Computers
FLASH vs. (simulated) FLASH: closing the simulation loop
ACM SIGPLAN Notices
Uniprocessor Virtual Memory without TLBs
IEEE Transactions on Computers
FLASH vs. (Simulated) FLASH: closing the simulation loop
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
IEEE Transactions on Computers
Characterizing the d-TLB behavior of SPEC CPU2000 benchmarks
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Avoiding initialization misses to the heap
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Trace-Driven Memory Simulation: A Survey
Performance Evaluation: Origins and Directions
ACM SIGARCH Computer Architecture News
A banked-promotion translation lookaside buffer system
Journal of Systems Architecture: the EUROMICRO Journal
Practical, transparent operating system support for superpages
ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
Virtual memory on data diffusion architectures
Parallel Computing
Concurrent Support of Multiple Page Sizes on a Skewed Associative TLB
IEEE Transactions on Computers
Practical, transparent operating system support for superpages
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Moving Address Translation Closer to Memory in Distributed Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Advanced non-distributed operating systems course
ACM SIGCSE Bulletin
Multiple Page Size Modeling and Optimization
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Deconstructing process isolation
Proceedings of the 2006 workshop on Memory system performance and correctness
Proceedings of the 2006 workshop on Memory system performance and correctness
General purpose operating system support for multiple page sizes
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Implementation of multiple pagesize support in HP-UX
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Supporting superpage allocation without additional hardware support
Proceedings of the 7th international symposium on Memory management
Architectural support for shadow memory in multiprocessors
Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Runtime monitoring on multicores via OASES
ACM SIGOPS Operating Systems Review
A case for compiler-driven superpage allocation
Proceedings of the 47th Annual Southeast Regional Conference
Using 4KB page size for virtual memory is obsolete
IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Micro-pages: increasing DRAM efficiency with locality-aware data placement
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Inter-core cooperative TLB for chip multiprocessors
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Virtualized and flexible ECC for main memory
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Translation caching: skip, don't walk (the page table)
Proceedings of the 37th annual international symposium on Computer architecture
IBM system z10 support for large pages
IBM Journal of Research and Development
A study of Java's non-Java memory
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Synergistic TLBs for High Performance Address Translation in Chip Multiprocessors
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
SpecTLB: a mechanism for speculative address translation
Proceedings of the 38th annual international symposium on Computer architecture
CoLT: Coalesced Large-Reach TLBs
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Efficient virtual memory for big memory servers
Proceedings of the 40th Annual International Symposium on Computer Architecture
Large-reach memory management unit caches
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Revisiting memory management on virtualized environments
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.02 |
Many commercial microprocessor architectures have added translation lookaside buffer (TLB) support for superpages. Superpages differ from segments because their size must be a power of two multiple of the base page size and they must be aligned in both virtual and physical address spaces. Very large superpages (e.g., 1MB) are clearly useful for mapping special structures, such as kernel data or frame buffers. This paper considers the architectural and operating system support required to exploit medium-sized superpages (e.g., 64KB, i.e., sixteen times a 4KB base page size). First, we show that superpages improve TLB performance only after invasive operating system modifications that introduce considerable overhead.We then propose two subblock TLB designs as alternate ways to improve TLB performance. Analogous to a subblock cache, a complete-subblock TLB associates a tag with a superpage-sized region but has valid bits, physical page number, attributes, etc., for each possible base page mapping. A partial-subblock TLB entry is much smaller than a complete-subblock TLB entry, because it shares physical page number and attribute fields across base page mappings. A drawback of a partial-subblock TLB is that base page mappings can share a TLB entry only if they map to consecutive physical pages and have the same attributes. We propose a physical memory allocation algorithm, page reservation, that makes this sharing more likely. When page reservation is used, experimental results show partial-subblock TLBs perform better than superpage TLBs, while requiring simpler operating system changes. If operating system changes are inappropriate, however, complete-subblock TLBs perform best.