On the inclusion properties for multi-level cache hierarchies
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Available instruction-level parallelism for superscalar and superpipelined machines
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Improving performance of small on-chip instruction caches
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Architectural and organizational tradeoffs in the design of the MultiTitan CPU
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
ACM Computing Surveys (CSUR)
A Characterization of Processor Performance in the vax-11/780
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Aspects of cache memory and instruction buffer performance
Aspects of cache memory and instruction buffer performance
Paradigm: A Highly Scalable Shared-Memory Multicomputer Architecture
Computer - Special issue on cryptography
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
High-bandwidth data memory systems for superscalar processors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The effect of context switches on cache performance
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
An architecture for software-controlled data prefetching
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Dynamic base register caching: a technique for reducing address bus width
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
An effective on-chip preloading scheme to reduce data access penalty
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Cache replacement with dynamic exclusion
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Software support for speculative loads
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
A performance evaluation of optimal hybrid cache coherency protocols
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Closing the window of vulnerability in multiphase memory transactions
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Pseudo vector processor based on register-windowed superscalar pipeline
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Improving the cache locality of memory allocation
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Evaluating performance of prefetching second level caches
ACM SIGMETRICS Performance Evaluation Review
A case for two-way skewed-associative caches
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Column-associative caches: a technique for reducing the miss rate of direct-mapped caches
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Cache write policies and performance
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Limitations of cache prefetching on a bus-based multiprocessor
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
ICS '93 Proceedings of the 7th international conference on Supercomputing
A scalar architecture for pseudo vector processing based on slide-windowed registers
ICS '93 Proceedings of the 7th international conference on Supercomputing
Efficient simulation of caches under optimal replacement with applications to miss characterization
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Memory access coalescing: a technique for eliminating redundant memory accesses
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Using virtual lines to enhance locality exploitation
ICS '94 Proceedings of the 8th international conference on Supercomputing
Reducing cache conflicts in data cache prefetching
ACM SIGARCH Computer Architecture News - Special issue on input/output in parallel computer systems
Evaluating stream buffers as a secondary cache replacement
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tradeoffs in two-level on-chip caching
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
A performance study of software and hardware data prefetching schemes
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Software-extended coherent shared memory: performance and cost
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
A unified architectural tradeoff methodology
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Resource allocation in a high clock rate microprocessor
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Contrasting characteristics and cache performance of technical and multi-user commercial workloads
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Avoiding conflict misses dynamically in large direct-mapped caches
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Effective cache prefetching on bus-based multiprocessors
ACM Transactions on Computer Systems (TOCS)
ACM SIGARCH Computer Architecture News
Memory system performance of UNIX on CC-NUMA multiprocessors
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Instruction fetching: coping with code bloat
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Instruction cache fetch policies for speculative execution
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Hardware implementation issues of data prefetching
ICS '95 Proceedings of the 9th international conference on Supercomputing
A data cache with multiple caching strategies tuned to different types of locality
ICS '95 Proceedings of the 9th international conference on Supercomputing
Direct-mapped versus set-associative pipelined caches
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
A modified approach to data cache management
Proceedings of the 28th annual international symposium on Microarchitecture
SPAID: software prefetching in pointer- and call-intensive environments
Proceedings of the 28th annual international symposium on Microarchitecture
Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Missing the memory wall: the case for processor/memory integration
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Increasing cache port efficiency for dynamic superscalar microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Informing memory operations: providing memory performance feedback in modern processors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Instruction prefetching of systems codes with layout optimized for reduced cache misses
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
A quantitative analysis of loop nest locality
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Value locality and load value prediction
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Design and evaluation of dynamic access ordering hardware
ICS '96 Proceedings of the 10th international conference on Supercomputing
Examination of a memory access classification scheme for pointer-intensive and numeric programs
ICS '96 Proceedings of the 10th international conference on Supercomputing
Optimizing primary data caches for parallel scientific applications: the pool buffer approach
ICS '96 Proceedings of the 10th international conference on Supercomputing
ACM SIGARCH Computer Architecture News
Architecture Technique Trade-Offs Using Mean Memory Delay Time
IEEE Transactions on Computers
Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Tango: a hardware-based data prefetching technique for superscalar processors
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Trace-driven memory simulation: a survey
ACM Computing Surveys (CSUR)
A microarchitectural performance evaluation of a 3.2 Gbyte/s microprocessor bus
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Adaptive page replacement based on memory reference behavior
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Improving data cache performance by pre-executing instructions under a cache miss
ICS '97 Proceedings of the 11th international conference on Supercomputing
Eliminating cache conflict misses through XOR-based placement functions
ICS '97 Proceedings of the 11th international conference on Supercomputing
A victim cache for vector registers
ICS '97 Proceedings of the 11th international conference on Supercomputing
Designing high bandwidth on-chip caches
Proceedings of the 24th annual international symposium on Computer architecture
Prefetching using Markov predictors
Proceedings of the 24th annual international symposium on Computer architecture
Run-time adaptive cache hierarchy management via reference analysis
Proceedings of the 24th annual international symposium on Computer architecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Run-time spatial locality detection and optimization
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The design and performance of a conflict-avoiding cache
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Prediction caches for superscalar processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory data organization for improved cache performance in embedded processor applications
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Data transformations for eliminating conflict misses
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Eliminating conflict misses for high performance architectures
ICS '98 Proceedings of the 12th international conference on Supercomputing
Characterization and improvement of load/store cache-based prefetching
ICS '98 Proceedings of the 12th international conference on Supercomputing
Utilizing reuse information in data cache management
ICS '98 Proceedings of the 12th international conference on Supercomputing
Informing memory operations: memory performance feedback mechanisms and their applications
ACM Transactions on Computer Systems (TOCS)
Increasing TLB reach using superpages backed by shadow memory
Proceedings of the 25th annual international symposium on Computer architecture
Exploiting spatial locality in data caches using spatial footprints
Proceedings of the 25th annual international symposium on Computer architecture
A Software Approach to Avoiding Spatial Cache Collisions in Parallel Processor Systems
IEEE Transactions on Parallel and Distributed Systems
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Data speculation support for a chip multiprocessor
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Dependence based prefetching for linked data structures
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious data placement
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Investigating optimal local memory performance
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Capturing dynamic memory reference behavior with adaptive cache topology
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Performance of database workloads on shared-memory systems with out-of-order processors
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Power and performance tradeoffs using various caching strategies
ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
Functional Implementation Techniques for CPU Cache Memories
IEEE Transactions on Computers - Special issue on cache memory and related problems
Prefetching Using Markov Predictors
IEEE Transactions on Computers - Special issue on cache memory and related problems
An Algorithm for Optimally Exploiting Spatial and Temporal Locality in Upper Memory Levels
IEEE Transactions on Computers - Special issue on cache memory and related problems
Randomized Cache Placement for Eliminating Conflicts
IEEE Transactions on Computers - Special issue on cache memory and related problems
Simultaneous subordinate microthreading (SSMT)
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
On the use of trace sampling for architectural studies of desktop applications
SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Comprehensive Hardware and Software Support for Operating Systems to Exploit MP Memory Hierarchies
IEEE Transactions on Computers
A locality sensitive multi-module cache with explicit management
ICS '99 Proceedings of the 13th international conference on Supercomputing
Reducing cache misses using hardware and software page placement
ICS '99 Proceedings of the 13th international conference on Supercomputing
An Integrated Hardware/Software Data Prefetching Scheme for Shared-Memory Multiprocessors
International Journal of Parallel Programming
Fetch directed instruction prefetching
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Hardware identification of cache conflict misses
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Active Management of Data Caches by Exploiting Reuse Information
IEEE Transactions on Computers
IEEE Transactions on Computers
Hardware-only stream prefetching and dynamic access ordering
Proceedings of the 14th international conference on Supercomputing
Push vs. pull: data movement for linked data structures
Proceedings of the 14th international conference on Supercomputing
Hardware spatial forwarding for widely shared data
Proceedings of the 14th international conference on Supercomputing
A fully associative software-managed cache design
Proceedings of the 27th annual international symposium on Computer architecture
Proceedings of the 27th annual international symposium on Computer architecture
Smart Memories: a modular reconfigurable architecture
Proceedings of the 27th annual international symposium on Computer architecture
The Journal of Supercomputing
Frequent value locality and value-centric data cache design
ACM SIGPLAN Notices
ACM Computing Surveys (CSUR)
Predictor-directed stream buffers
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Improving BTB performance in the presence of DLLs
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Frequent value compression in data caches
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Hint-based cooperative caching
ACM Transactions on Computer Systems (TOCS)
Architectural support for scalable speculative parallelization in shared-memory multiprocessors
Proceedings of the 27th annual international symposium on Computer architecture
SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Architectural and compiler support for effective instruction prefetching: a cooperative approach
ACM Transactions on Computer Systems (TOCS)
Dynamic Access Ordering for Streamed Computations
IEEE Transactions on Computers
High Bandwidth On-Chip Cache Design
IEEE Transactions on Computers
Optimizations Enabled by a Decoupled Front-End Architecture
IEEE Transactions on Computers
Data and memory optimization techniques for embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Analytical cache models with applications to cache partitioning
ICS '01 Proceedings of the 15th international conference on Supercomputing
ICS '01 Proceedings of the 15th international conference on Supercomputing
A novel renaming mechanism that boosts software prefetching
ICS '01 Proceedings of the 15th international conference on Supercomputing
Frequent value locality and value-centric data cache design
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Dynamically allocating processor resources between nearby and distant ILP
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Dead-block prediction & dead-block correlating prefetchers
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Morphable Cache Architectures: Potential Benefits
OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
APEX: access pattern based memory architecture exploration
Proceedings of the 14th international symposium on Systems synthesis
Runtime identification of cache conflict misses: The adaptive miss buffer
ACM Transactions on Computer Systems (TOCS)
Improving Latency Tolerance of Multithreading through Decoupling
IEEE Transactions on Computers
IEEE Transactions on Computers
IEEE Transactions on Computers
Designing a Modern Memory Hierarchy with Hardware Prefetching
IEEE Transactions on Computers
ADir_pNB: A Cost-Effective Way to Implement Full Map Directory-Based Cache Coherence Protocols
IEEE Transactions on Computers
Dynamic hot data stream prefetching for general-purpose programs
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Profile-guided post-link stride prefetching
ICS '02 Proceedings of the 16th international conference on Supercomputing
Using a user-level memory thread for correlation prefetching
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Going the distance for TLB prefetching: an application-driven study
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Timekeeping in the memory system: predicting and optimizing memory behavior
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
DSTRIDE: data-cache miss-address-based stride prefetching scheme for multimedia processors
ACSAC '01 Proceedings of the 6th Australasian conference on Computer systems architecture
Dynamic speculative precomputation
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
An integrated approach to reducing power dissipation in memory hierarchies
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Application-adaptive intelligent cache memory system
ACM Transactions on Embedded Computing Systems (TECS)
MIST: an algorithm for memory miss traffic management
Proceedings of the 2000 IEEE/ACM international conference on Computer-aided design
Transactional lock-free execution of lock-based programs
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
ACM Transactions on Embedded Computing Systems (TECS)
Access pattern-based memory and connectivity architecture exploration
ACM Transactions on Embedded Computing Systems (TECS)
Splitting the Data Cache: A Survey
IEEE Concurrency
Random-Access Data Storage Components in Customized Architectures
IEEE Design & Test
Performance Implications of Tolerating Cache Faults
IEEE Transactions on Computers
A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches
IEEE Transactions on Computers
Effective Hardware-Based Data Prefetching for High-Performance Processors
IEEE Transactions on Computers
Selective Victim Caching: A Method to Improve the Performance of Direct-Mapped Caches
IEEE Transactions on Computers
Increasing hardware data prefetching performance using the second-level cache
Journal of Systems Architecture: the EUROMICRO Journal
A Decoupled Predictor-Directed Stream Prefetching Architecture
IEEE Transactions on Computers
Partitioned instruction cache architecture for energy efficiency
ACM Transactions on Embedded Computing Systems (TECS)
Online paging with arbitrary associativity
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Memory Architectures for Embedded Systems-On-Chip
HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Stride-directed Prefetching for Secondary Caches
ICPP '97 Proceedings of the international Conference on Parallel Processing
Experiments with Sequential Prefetching
HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
A Memory Controller for Improved Performance of Streamed Computations on Symmetric Multiprocessors
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Cost-Effective Compiler Directed Memory Prefetching and Bypassing
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Using the Compiler to Improve Cache Replacement Decisions
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
A Programmable Memory Hierarchy for Prefetching Linked Data Structures
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Caches with Compositional Performance
Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
On the Performance of Fetch Engines Running DSS Workloads
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Exploiting the Prefetching Effect Provided by Executing Mispredicted Load Instructions
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Design Considerations of High Performance Data Cache with Prefetching
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
A Power Efficient Cache Structure for Embedded Processors Based on the Dual Cache Structure
LCTES '00 Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems
Reordering Memory Bus Transactions for Reduced Power Consumption
LCTES '00 Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems
Tradeoffs in the Design of Single Chip Multiprocessors
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
The Multi-Queue Replacement Algorithm for Second Level Buffer Caches
Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Trace-Driven Memory Simulation: A Survey
Performance Evaluation: Origins and Directions
Value-Profile Guided Stride Prefetching for Irregular Code
CC '02 Proceedings of the 11th International Conference on Compiler Construction
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
Content-Based Prefetching: Initial Results
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Caches with compositional performance
Embedded processor design challenges
A banked-promotion translation lookaside buffer system
Journal of Systems Architecture: the EUROMICRO Journal
Pointer cache assisted prefetching
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
TEST: a tracer for extracting speculative threads
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Stride prefetching by dynamically inspecting objects
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Systematic objective-driven computer architecture optimization
ARVLSI '95 Proceedings of the 16th Conference on Advanced Research in VLSI (ARVLSI'95)
Improving Performance for Software MPEG Players
COMPCON '96 Proceedings of the 41st IEEE International Computer Conference
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
A Design Frame for Hybrid Access Cashes
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Software assistance for data caches
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
TCP: Tag Correlating Prefetchers
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
An improved lookahead instruction prefetching
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Performance Modeling Using Object-Oriented Execution-Driven Simulation}
SS '96 Proceedings of the 29th Annual Simulation Symposium (SS '96)
DRAM-Page Based Prediction and Prefetching
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
A Study of Channeled DRAM Memory Architectures
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
A Selective Temporal and Aggressive Spatial Cache System Based on Time Interval
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Caches versus object allocation
IWOOOS '96 Proceedings of the 5th International Workshop on Object Orientation in Operating Systems (IWOOOS '96)
Memory Organization for Improved Data Cache Performance in Embedded Processors
ISSS '96 Proceedings of the 9th international symposium on System synthesis
Guided region prefetching: a cooperative hardware/software approach
Proceedings of the 30th annual international symposium on Computer architecture
Correlation Prefetching with a User-Level Memory Thread
IEEE Transactions on Parallel and Distributed Systems
Dynamic class-based queue management for scalable media servers
Journal of Systems and Software
A low-cost memory architecture with NAND XIP for mobile embedded systems
Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Call graph prefetching for database applications
ACM Transactions on Computer Systems (TOCS)
Hierarchical disk cache management in RAID 5 controller
Journal of Computing Sciences in Colleges
Tiny instruction caches for low power embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Optimal Replacement Is NP-Hardfor Nonstandard Caches
IEEE Transactions on Computers
IPStash: a Power-Efficient Memory Architecture for IP-lookup
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
IEEE Transactions on Computers
An energy efficient cache memory architecture for embedded systems
Proceedings of the 2004 ACM symposium on Applied computing
Using a Victim Buffer in an Application-Specific Memory Hierarchy
Proceedings of the conference on Design, automation and test in Europe - Volume 1
Processor-memory coexploration using an architecture description language
ACM Transactions on Embedded Computing Systems (TECS)
Selective stack prefetch method
CompSysTech '03 Proceedings of the 4th international conference conference on Computer systems and technologies: e-Learning
Self-correcting LRU replacement policies
Proceedings of the 1st conference on Computing frontiers
Dynamic techniques to reduce memory traffic in embedded systems
Proceedings of the 1st conference on Computing frontiers
Reducing traffic generated by conflict misses in caches
Proceedings of the 1st conference on Computing frontiers
ACM Transactions on Computer Systems (TOCS)
Second-Level Buffer Cache Management
IEEE Transactions on Parallel and Distributed Systems
Cluster miss prediction for instruction caches in embedded networking applications
Proceedings of the 14th ACM Great Lakes symposium on VLSI
Design and Optimization of Large Size and Low Overhead Off-Chip Caches
IEEE Transactions on Computers
Myths and realities: the performance impact of garbage collection
Proceedings of the joint international conference on Measurement and modeling of computer systems
Effective stream-based and execution-based data prefetching
Proceedings of the 18th annual international conference on Supercomputing
Enhancing data cache reliability by the addition of a small fully-associative replication cache
Proceedings of the 18th annual international conference on Supercomputing
CQoS: a framework for enabling QoS in shared caches of CMP platforms
Proceedings of the 18th annual international conference on Supercomputing
X-RAY: A Non-Invasive Exclusive Caching Mechanism for RAIDs
Proceedings of the 31st annual international symposium on Computer architecture
Effectively sharing a cache among threads
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Journal of Scheduling - Special issue: On-line algorithm part I
An Integrated Approach for Improving Cache Behavior
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
A case for shared instruction cache on chip multiprocessors running OLTP
MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
A proposal for input-sensitivity analysis of profile-driven optimizations on embedded applications
MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
Managing Wire Delay in Large Chip-Multiprocessor Caches
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Cache Refill/Access Decoupling for Vector Machines
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Theoretical Computer Science - Special issue: Online algorithms in memoriam, Steve Seiden
Tolerating memory latency through push prefetching for pointer-intensive applications
ACM Transactions on Architecture and Code Optimization (TACO)
The Impact of Incorrectly Speculated Memory Operations in a Multithreaded Architecture
IEEE Transactions on Parallel and Distributed Systems
Effective Instruction Prefetching via Fetch Prestaging
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Addressing mode driven low power data caches for embedded processors
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Memory predecryption: hiding the latency overhead of memory encryption
ACM SIGARCH Computer Architecture News - Special issue: Workshop on architectural support for security and anti-virus (WASSA)
Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors
IEEE Transactions on Computers
A new NAND-type flash memory package with smart buffer system for spatial and temporal localities
Journal of Systems Architecture: the EUROMICRO Journal
Skewed caches from a low-power perspective
Proceedings of the 2nd conference on Computing frontiers
Exploring the limits of prefetching
IBM Journal of Research and Development - Electrochemical technology in microelectronics
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors
Proceedings of the 32nd annual international symposium on Computer Architecture
Adaptive Mechanisms and Policies for Managing Cache Hierarchies in Chip Multiprocessors
Proceedings of the 32nd annual international symposium on Computer Architecture
The V-Way Cache: Demand Based Associativity via Global Replacement
Proceedings of the 32nd annual international symposium on Computer Architecture
Memory Performance Optimizations For Real-Time Software HDTV Decoding
Journal of VLSI Signal Processing Systems
Generating cache hints for improved program efficiency
Journal of Systems Architecture: the EUROMICRO Journal
On the performance of trace locality of reference
Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Distance-aware L2 cache organizations for scalable multiprocessor systems
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Reconfigurable embedded systems: Synthesis, design and application
A NUCA substrate for flexible CMP cache sharing
Proceedings of the 19th annual international conference on Supercomputing
Characterization of TCC on Chip-Multiprocessors
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Store-Ordered Streaming of Shared Memory
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
The TM3270 Media-Processor Data Cache
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
IEEE Transactions on Computers
Replication Cache: A Small Fully Associative Cache to Improve Data Cache Reliability
IEEE Transactions on Computers
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
MEDEA '04 Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture
Speculative execution for hiding memory latency
MEDEA '04 Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture
Exploiting the replication cache to improve performance for multiple-issue microprocessors
ACM SIGARCH Computer Architecture News - Special issue: MEDEA 2004 workshop
Journal of Systems Architecture: the EUROMICRO Journal
On the importance of optimizing the configuration of stream prefetchers
Proceedings of the 2005 workshop on Memory system performance
Reducing cache misses by application-specific re-configurable indexing
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Optimizing instruction cache performance of embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Spectral prefetcher: An effective mechanism for L2 cache prefetching
ACM Transactions on Architecture and Code Optimization (TACO)
Memory access pattern analysis and stream cache design for multimedia applications
ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
A Self-Repairing Prefetcher in an Event-Driven Dynamic Optimization Framework
Proceedings of the International Symposium on Code Generation and Optimization
Kilo-instruction processors, runahead and prefetching
Proceedings of the 3rd conference on Computing frontiers
Routing Table Partitioning for Speedy Packet Lookups in Scalable Routers
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches
Proceedings of the 33rd annual international symposium on Computer Architecture
CAVA: Using checkpoint-assisted value prediction to hide L2 misses
ACM Transactions on Architecture and Code Optimization (TACO)
Pattern-driven prefetching for multimedia applications on embedded processors
Journal of Systems Architecture: the EUROMICRO Journal
International Journal of Parallel Programming
Making a case for split data caches for embedded applications
MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
Exploiting the replication cache to improve cache read bandwidth cost effectively
MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
Design space exploration for 3D architectures
ACM Journal on Emerging Technologies in Computing Systems (JETC)
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Efficient emulation of hardware prefetchers via event-driven helper threading
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Overlapping dependent loads with addressless preload
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
SNAPI '03 Proceedings of the international workshop on Storage network architecture and parallel I/Os
Managing bounded code caches in dynamic binary optimization systems
ACM Transactions on Architecture and Code Optimization (TACO)
Data prefetching in a cache hierarchy with high bandwidth and capacity
MEDEA '06 Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
Victim management in a cache hierarchy
IBM Journal of Research and Development - Advanced silicon technology
Geiger: monitoring the buffer cache in a virtual machine environment
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Future execution: A prefetching mechanism that uses multiple cores to speed up single threads
ACM Transactions on Architecture and Code Optimization (TACO)
Memory Prefetching Using Adaptive Stream Detection
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Accelerating sequential programs on Chip Multiprocessors via Dynamic Prefetching Thread
Microprocessors & Microsystems
M-TREE: a high efficiency security architecture for protecting integrity and privacy of software
Journal of Parallel and Distributed Computing - Special issue: Security in grid and distributed systems
Instruction buffering exploration for low energy embedded processors
Journal of Embedded Computing - Low-power Embedded Systems
A cache design for high performance embedded systems
Journal of Embedded Computing - Cache exploitation in embedded systems
Impulse: Memory system support for scientific applications
Scientific Programming
Analysis of hardware prefetching across virtual page boundaries
Proceedings of the 4th international conference on Computing frontiers
Performance/area efficiency in chip multiprocessors with micro-caches
Proceedings of the 4th international conference on Computing frontiers
Reconfigurable split data caches: a novel scheme for embedded systems
Proceedings of the 2007 ACM symposium on Applied computing
A One's Complement Cache Memory
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Software prefetching and caching for translation lookaside buffers
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Proceedings of the 21st annual international conference on Supercomputing
Experiences integrating research tools and projects into computer architecture courses
WCAE '00 Proceedings of the 2000 workshop on Computer architecture education
WCAE '06 Proceedings of the 2006 workshop on Computer architecture education: held in conjunction with the 33rd International Symposium on Computer Architecture
Fragment cache management for dynamic binary translators in embedded systems with scratchpad
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Heterogeneous associative cache for multimedia applications
IMSA'07 IASTED European Conference on Proceedings of the IASTED European Conference: internet and multimedia systems and applications
The impact of wrong-path memory references in cache-coherent multiprocessor systems
Journal of Parallel and Distributed Computing
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Data prefetching in a cache hierarchy with high bandwidth and capacity
ACM SIGARCH Computer Architecture News
Reducing cache misses through programmable decoders
ACM Transactions on Architecture and Code Optimization (TACO)
Adaptive set pinning: managing shared caches in chip multiprocessors
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Improving SDRAM access energy efficiency for low-power embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Prefetching irregular references for software cache on cell
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Communications of the ACM - Web science
Proceedings of the 5th conference on Computing frontiers
Tiny split data-caches make big performance impact for embedded applications
Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
Focused prefetching: performance oriented prefetching based on commit stalls
Proceedings of the 22nd annual international conference on Supercomputing
Miss reduction in embedded processors through dynamic, power-friendly cache design
Proceedings of the 45th annual Design Automation Conference
Hiding cache miss penalty using priority-based execution for embedded processors
Proceedings of the conference on Design, automation and test in Europe
The performance of pollution control victim cache for embedded systems
Proceedings of the 21st annual symposium on Integrated circuits and system design
Guided Prefetching Based on Runtime Access Patterns
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part III
Low-Cost Adaptive Data Prefetching
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Efficient code caching to improve performance and energy consumption for java applications
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Characterizing and modeling the behavior of context switch misses
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
PFetch: software prefetching exploiting temporal predictability of memory access streams
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A novel cache architecture with enhanced performance and security
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Temporal instruction fetch streaming
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Evaluating the effects of cache redundancy on profit
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Less reused filter: improving l2 cache performance via filtering less reused lines
Proceedings of the 23rd international conference on Supercomputing
Next high performance and low power flash memory package structure
Journal of Computer Science and Technology
Two new techniques integrated for energy-efficient TLB design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Time-predictable computer architecture
EURASIP Journal on Embedded Systems - FPGA supercomputing platforms, architectures, and techniques for accelerating computationally complex algorithms
Spatio-temporal memory streaming
Proceedings of the 36th annual international symposium on Computer architecture
Stream chaining: exploiting multiple levels of correlation in data prefetching
Proceedings of the 36th annual international symposium on Computer architecture
Efficient Data Access Management for FPGA-Based Image Processing SoCs
RSP '09 Proceedings of the 2009 IEEE/IFIP International Symposium on Rapid System Prototyping
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Selective wordline voltage boosting for caches to manage yield under process variations
Proceedings of the 46th Annual Design Automation Conference
A load-instruction unit for pipelined processors
IBM Journal of Research and Development
Coordinated control of multiple prefetchers in multi-core systems
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Pseudo-LIFO: the foundation of a new family of replacement policies for last-level caches
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive line placement with the set balancing cache
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Heterogeneous associative cache for multimedia applications
EurolMSA '07 Proceedings of the Third IASTED European Conference on Internet and Multimedia Systems and Applications
Micro-pages: increasing DRAM efficiency with locality-aware data placement
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
An analysis of on-chip interconnection networks for large-scale chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
An intelligent memory card For LBS
WiCOM'09 Proceedings of the 5th International Conference on Wireless communications, networking and mobile computing
Customized placement for high performance embedded processor caches
ARCS'07 Proceedings of the 20th international conference on Architecture of computing systems
The bandwidth expansion effectiveness of cache levels block prefetch
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Reducing energy in instruction caches by using multiple line buffers with prediction
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Global management of cache hierarchies
Proceedings of the 7th ACM international conference on Computing frontiers
Impact analysis of performance faults in modern microprocessors
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Software data spreading: leveraging distributed caches to improve single thread performance
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
The auction: optimizing banks usage in Non-Uniform Cache Architectures
Proceedings of the 24th ACM International Conference on Supercomputing
A real-time Java chip-multiprocessor
ACM Transactions on Embedded Computing Systems (TECS)
Branch target buffer design for embedded processors
Microprocessors & Microsystems
Replica victim caching to improve cache reliability against transient errors
International Journal of High Performance Systems Architecture
An Adaptive Data Prefetcher for High-Performance Processors
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Journal of Parallel and Distributed Computing
WAYPOINT: scaling coherence to thousand-core architectures
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Using dead blocks as a virtual victim cache
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Limiting the number of dirty cache lines
Proceedings of the Conference on Design, Automation and Test in Europe
Adaptive prefetching for shared cache based chip multiprocessors
Proceedings of the Conference on Design, Automation and Test in Europe
A memory interface for multi-purpose multi-stream accelerators
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Dynamic, non-linear cache architecture for power-sensitive mobile processors
CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Design exploration of hybrid caches with disparate memory technologies
ACM Transactions on Architecture and Code Optimization (TACO)
Understanding the behavior and implications of context switch misses
ACM Transactions on Architecture and Code Optimization (TACO)
Efficient address mapping of shared cache for on-chip many-core architecture
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Thread owned block cache: managing latency in many-core architecture
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Two management approaches of the split data cache in multiprocessor systems
EURO-PDP'00 Proceedings of the 8th Euromicro conference on Parallel and distributed processing
Conjugate gradient sparse solvers: performance-power characteristics
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Coterminous locality and coterminous group data prefetching on chip-multiprocessors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Helper thread prefetching for loosely-coupled multiprocessor systems
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Dynamic vectorization in the E2 dynamic multicore architecture
ACM SIGARCH Computer Architecture News
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
The ZCache: Decoupling Ways and Associativity
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Many-Thread Aware Prefetching Mechanisms for GPGPU Applications
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Extended histories: improving regularity and performance in correlation prefetchers
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
RVC: a mechanism for time-analyzable real-time processors with faulty caches
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Cache equalizer: a placement mechanism for chip multiprocessor distributed shared caches
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Inter-core prefetching for multicore processors using migrating helper threads
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Template-based memory access engine for accelerators in SoCs
Proceedings of the 16th Asia and South Pacific Design Automation Conference
Virtual memory window for application-specific reconfigurable coprocessors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Resolving a L2-prefetch-caused parallel nonscaling on Intel Core microarchitecture
Journal of Parallel and Distributed Computing
Cache index-aware memory allocation
Proceedings of the international symposium on Memory management
Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors
Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Bypass and insertion algorithms for exclusive last-level caches
Proceedings of the 38th annual international symposium on Computer architecture
Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors
ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
An energy-efficient adaptive hybrid cache
Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs
ACM Transactions on Architecture and Code Optimization (TACO)
CATCH: A mechanism for dynamically detecting cache-content-duplication in instruction caches
ACM Transactions on Architecture and Code Optimization (TACO)
Bandwidth constrained coordinated HW/SW prefetching for multicores
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Dynamic, multi-core cache coherence architecture for power-sensitive mobile processors
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A hybrid intelligent system to improve predictive accuracy for cache prefetching
Expert Systems with Applications: An International Journal
Global-aware and multi-order context-based prefetching for high-performance processors
International Journal of High Performance Computing Applications
A high performance heterogeneous architecture and its optimization design
HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
Do trace cache, value prediction and prefetching improve SMT throughput?
ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
An effective instruction cache prefetch policy by exploiting cache history information
EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
A space-efficient on-chip compressed cache organization for high performance computing
ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
Embedded Systems Design
Flux caches: what are they and are they useful?
SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
PACMan: prefetch-aware cache management for high performance caching
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Hot-and-Cold: using criticality in the design of energy-efficient caches
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Reducing off-chip memory traffic by selective cache management scheme in GPGPUs
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
A refactoring method for cache-efficient swarm intelligence algorithms
Information Sciences: an International Journal
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Low-overhead core swapping for thermal management
PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
Energy and throughput efficient transactional memory for embedded multicore systems
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
IOMMU: strategies for mitigating the IOTLB bottleneck
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
New memory organizations for 3d DRAM and PCMs
ARCS'12 Proceedings of the 25th international conference on Architecture of Computing Systems
Reducing OLTP instruction misses with thread migration
DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
Hardware support for enforcing isolation in lock-based parallel programs
Proceedings of the 26th ACM international conference on Supercomputing
Adopting TLB index-based tagging to data caches for tag energy reduction
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Do we need a crystal ball for task migration?
HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Memory Latency Hiding by Load Value Speculation for Reconfigurable Computers
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Making data prefetch smarter: adaptive prefetching on POWER7
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Transactional prefetching: narrowing the window of contention in hardware transactional memory
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Optimal bypass monitor for high performance last-level caches
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
The evicted-address filter: a unified mechanism to address both cache pollution and thrashing
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Revisiting level-0 caches in embedded processors
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
A software-hardware architecture for self-protecting data
Proceedings of the 2012 ACM conference on Computer and communications security
Application data prefetching on the IBM blue gene/Q supercomputer
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Exploiting single-usage for effective memory management
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Data cache organization for accurate timing analysis
Real-Time Systems
Survey of Low-Energy Techniques for Instruction Memory Organisations in Embedded Systems
Journal of Signal Processing Systems
A multi-core memory organization for 3-d DRAM as main memory
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Cache-Conscious Wavefront Scheduling
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
SLICC: Self-Assembly of Instruction Cache Collectives for OLTP Workloads
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Diagnosis and optimization of application prefetching performance
Proceedings of the 27th international ACM conference on International conference on supercomputing
Replacement techniques for dynamic NUCA cache designs on CMPs
The Journal of Supercomputing
Navigating big data with high-throughput, energy-efficient data partitioning
Proceedings of the 40th Annual International Symposium on Computer Architecture
STREX: boosting instruction cache reuse in OLTP workloads through stratified transaction execution
Proceedings of the 40th Annual International Symposium on Computer Architecture
Orchestrated scheduling and prefetching for GPGPUs
Proceedings of the 40th Annual International Symposium on Computer Architecture
APPLE: adaptive performance-predictable low-energy caches for reliable hybrid voltage operation
Proceedings of the 50th Annual Design Automation Conference
Run-time reconfiguration of expandable cache for embedded systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
LP-NUCA: networks-in-cache for high-performance low-power embedded processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
On the Impact of Performance Faults in Modern Microprocessors
Journal of Electronic Testing: Theory and Applications
Data filter cache with word selection cache for low power embedded processor
Proceedings of the 2013 Research in Adaptive and Convergent Systems
Application-aware adaptive cache architecture for power-sensitive mobile processors
ACM Transactions on Embedded Computing Systems (TECS)
MLP-aware dynamic instruction window resizing for adaptively exploiting both ILP and MLP
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Linearizing irregular memory accesses for improved correlated prefetching
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
RDIP: return-address-stack directed instruction prefetching
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Dynamic bandwidth scaling for embedded DSPs with 3D-stacked DRAM and wide I/Os
Proceedings of the International Conference on Computer-Aided Design
Practical models for energy-efficient prefetching in mobile embedded systems
Microprocessors & Microsystems
An early memory hierarchy evaluation simulator for multimedia applications
Microprocessors & Microsystems
Hi-index | 0.07 |
Projections of computer technology forecast processors with peak performance of 1,000 MIPS in the relatively near future. These processors could easily lose half or more of their performance in the memory hierarchy if the hierarchy design is based on conventional caching techniques. This paper presents hardware techniques to improve the performance of caches.Miss caching places a small fully-associative cache between a cache and its refill path. Misses in the cache that hit in the miss cache have only a one cycle miss penalty, as opposed to a many cycle miss penalty without the miss cache. Small miss caches of 2 to 5 entries are shown to be very effective in removing mapping conflict misses in first-level direct-mapped caches.Victim caching is an improvement to miss caching that loads the small fully-associative cache with the victim of a miss and not the requested line. Small victim caches of 1 to 5 entries are even more effective at removing conflict misses than miss caching.Stream buffers prefetch cache lines starting at a cache miss address. The prefetched data is placed in the buffer and not in the cache. Stream buffers are useful in removing capacity and compulsory cache misses, as well as some instruction cache conflict misses. Stream buffers are more effective than previously investigated prefetch techniques at using the next slower level in the memory hierarchy when it is pipelined. An extension to the basic stream buffer, called multi-way stream buffers, is introduced. Multi-way stream buffers are useful for prefetching along multiple intertwined data reference streams.Together, victim caches and stream buffers reduce the miss rate of the first level in the cache hierarchy by a factor of two to three on a set of six large benchmarks.