LimitLESS directories: A scalable cache coherence scheme
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Value locality and load value prediction
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Exceeding the dataflow limit via value prediction
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 24th annual international symposium on Computer architecture
The predictability of data values
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Highly accurate data value prediction using hybrid predictors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Reconfigurable caches and their application to media processing
Proceedings of the 27th annual international symposium on Computer architecture
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
Predictor-directed stream buffers
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
rePLay: A Hardware Framework for Dynamic Optimization
IEEE Transactions on Computers
Timekeeping in the memory system: predicting and optimizing memory behavior
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A stateless, content-directed data prefetching mechanism
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Speculative Sequential Consistency with Little Custom Storage
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Pointer cache assisted prefetching
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling
Proceedings of the 30th annual international symposium on Computer architecture
Guided region prefetching: a cooperative hardware/software approach
Proceedings of the 30th annual international symposium on Computer architecture
Xen and the art of virtualization
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Specialized Dynamic Optimizations for High-Performance Energy-Efficient Microarchitecture
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Power Awareness through Selective Dynamically Optimized Traces
Proceedings of the 31st annual international symposium on Computer architecture
ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
Temporal Streaming of Shared Memory
Proceedings of the 32nd annual international symposium on Computer Architecture
Virtualizing Transactional Memory
Proceedings of the 32nd annual international symposium on Computer Architecture
Data Cache Prefetching Using a Global History Buffer
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
An Event-Driven Multithreaded Dynamic Optimization Framework
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
A Case for MLP-Aware Cache Replacement
Proceedings of the 33rd annual international symposium on Computer Architecture
Proceedings of the 33rd annual international symposium on Computer Architecture
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Live migration of virtual machines
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Enhancing Multiprocessor Architecture Simulation Speed Using Matched-Pair Comparison
ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Phantom-BTB: a virtualized branch target buffer design
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Temporal instruction fetch streaming
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Spatio-temporal memory streaming
Proceedings of the 36th annual international symposium on Computer architecture
EXACT: explicit dynamic-branch prediction with active updates
Proceedings of the 7th ACM international conference on Computing frontiers
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Design space exploration of hybrid ultra low power branch predictors
ARCS'12 Proceedings of the 25th international conference on Architecture of Computing Systems
Reducing OLTP instruction misses with thread migration
DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
Pointy: a hybrid pointer prefetcher for managed runtime systems
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Toward virtualizing branch direction prediction
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Linearizing irregular memory accesses for improved correlated prefetching
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
SHIFT: shared history instruction fetch for lean-core server processors
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
A novel architecture for ahead branch prediction
Frontiers of Computer Science: Selected Publications from Chinese Universities
Hi-index | 0.00 |
Many hardware optimizations rely on collecting information about program behavior at runtime. This information is stored in lookup tables. To be accurate and effective, these optimizations usually require large dedicated on-chip tables. Although technology advances offer an increased amount of on-chip resources, these resources are allocated to increase the size of on-chip conventional cache hierarchies. This work proposes Predictor Virtualization, a technique that uses the existing memory hierarchy to emulate large predictor tables. We demonstrate the benefits of this technique by virtualizing a state-of-the-art data prefetcher. Full-system, cycle-accurate simulations demonstrate that the virtualized prefetcher preserves the performance benefits of the original design, while reducing the on-chip storage dedicated to the predictor table from 60KB down to less than one kilobyte.