Memory access buffering in multiprocessors
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Multiprocessor cache design considerations
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Throughput Analysis of Cache-Based Multiprocessors with Multiple Buses
IEEE Transactions on Computers
The design of a lockup-free cache for high-performance multiprocessors
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Memory Access Dependencies in Shared-Memory Multiprocessors
IEEE Transactions on Software Engineering
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
High-bandwidth data memory systems for superscalar processors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Performance evaluation of memory consistency models for shared-memory multiprocessors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
An architecture for software-controlled data prefetching
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Comparative evaluation of latency reducing and tolerating techniques
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Data access microarchitectures for superscalar processors with compiler-assisted data prefetching
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
An effective on-chip preloading scheme to reduce data access penalty
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Delayed consistency and its effects on the miss rate of parallel programs
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
A performance study of memory consistency models
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Hiding memory latency using dynamic scheduling in shared-memory multiprocessors
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Reducing memory latency via non-blocking and prefetching caches
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Closing the window of vulnerability in multiphase memory transactions
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Tradeoffs in processor/memory interfaces for superscalar processors
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Pseudo vector processor based on register-windowed superscalar pipeline
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Balanced scheduling: instruction scheduling when memory latency is uncertain
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Limitations of cache prefetching on a bus-based multiprocessor
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
A scalar architecture for pseudo vector processing based on slide-windowed registers
ICS '93 Proceedings of the 7th international conference on Supercomputing
Effects of memory latencies on non-blocking processor/cache architectures
ICS '93 Proceedings of the 7th international conference on Supercomputing
Reducing cache conflicts in data cache prefetching
ACM SIGARCH Computer Architecture News - Special issue on input/output in parallel computer systems
Complexity/performance tradeoffs with non-blocking loads
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
A performance study of software and hardware data prefetching schemes
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Resource allocation in a high clock rate microprocessor
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Interleaving: a multithreading technique targeting multiprocessors and workstations
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The effectiveness of multiple hardware contexts
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Effective cache prefetching on bus-based multiprocessors
ACM Transactions on Computer Systems (TOCS)
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Performance evaluation of the PowerPC 620 microarchitecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Hardware implementation issues of data prefetching
ICS '95 Proceedings of the 9th international conference on Supercomputing
Increasing superscalar performance through multistreaming
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
SPAID: software prefetching in pointer- and call-intensive environments
Proceedings of the 28th annual international symposium on Microarchitecture
Cache miss heuristics and preloading techniques for general-purpose programs
Proceedings of the 28th annual international symposium on Microarchitecture
Evaluation of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Evaluation of design alternatives for a multiprocessor microprocessor
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Increasing cache port efficiency for dynamic superscalar microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
An evaluation of memory consistency models for shared-memory systems with ILP processors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Value locality and load value prediction
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Compiler synthesized dynamic branch prediction
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Wrong-path instruction prefetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Active memory: a new abstraction for memory system simulation
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Multithreading with Distributed Functional Units
IEEE Transactions on Computers
A microarchitectural performance evaluation of a 3.2 Gbyte/s microprocessor bus
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Improving data cache performance by pre-executing instructions under a cache miss
ICS '97 Proceedings of the 11th international conference on Supercomputing
Adaptive data prefetching using cache information
ICS '97 Proceedings of the 11th international conference on Supercomputing
Designing high bandwidth on-chip caches
Proceedings of the 24th annual international symposium on Computer architecture
Memory-system design considerations for dynamically-scheduled processors
Proceedings of the 24th annual international symposium on Computer architecture
The interaction of software prefetching with ILP processors in shared-memory systems
Proceedings of the 24th annual international symposium on Computer architecture
Data prefetching on the HP PA-8000
Proceedings of the 24th annual international symposium on Computer architecture
The design and performance of a conflict-avoiding cache
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Prediction caches for superscalar processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Tolerating latency in multiprocessors through compiler-inserted prefetching
ACM Transactions on Computer Systems (TOCS)
Evaluation of high performance multicache parallel texture mapping
ICS '98 Proceedings of the 12th international conference on Supercomputing
Utilizing reuse information in data cache management
ICS '98 Proceedings of the 12th international conference on Supercomputing
Using prediction to accelerate coherence protocols
Proceedings of the 25th annual international symposium on Computer architecture
Analytic evaluation of shared-memory systems with ILP processors
Proceedings of the 25th annual international symposium on Computer architecture
Memory access buffering in multiprocessors
25 years of the international symposia on Computer architecture (selected papers)
Weak ordering—a new definition
25 years of the international symposia on Computer architecture (selected papers)
The MIT Alewife machine: architecture and performance
25 years of the international symposia on Computer architecture (selected papers)
Using generational garbage collection to implement cache-conscious data placement
Proceedings of the 1st international symposium on Memory management
Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Performance of database workloads on shared-memory systems with out-of-order processors
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Randomized Cache Placement for Eliminating Conflicts
IEEE Transactions on Computers - Special issue on cache memory and related problems
The Impact of Exploiting Instruction-Level Parallelism on Shared-Memory Multiprocessors
IEEE Transactions on Computers - Special issue on cache memory and related problems
A performance comparison of contemporary DRAM architectures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Cache-conscious structure layout
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
A program-driven simulation model of an MIMD multiprocessor
ANSS '91 Proceedings of the 24th annual symposium on Simulation
An Integrated Hardware/Software Data Prefetching Scheme for Shared-Memory Multiprocessors
International Journal of Parallel Programming
Fetch directed instruction prefetching
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Active Management of Data Caches by Exploiting Reuse Information
IEEE Transactions on Computers
IEEE Transactions on Computers
Weak ordering—a new definition
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Issues related to MIMD shared-memory computers: the NYU ultracomputer approach
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
IEEE Transactions on Computers
Proceedings of the 27th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
ACM Computing Surveys (CSUR)
ACM Computing Surveys (CSUR)
Eager writeback - a technique for improving bandwidth utilization
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Modulo scheduling for a fully-distributed clustered VLIW architecture
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
A novel renaming mechanism that boosts software prefetching
ICS '01 Proceedings of the 15th international conference on Supercomputing
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Measuring experimental error in microprocessor simulation
SSR '01 Proceedings of the 2001 symposium on Software reusability: putting software reuse in context
Improving Latency Tolerance of Multithreading through Decoupling
IEEE Transactions on Computers
Designing a Modern Memory Hierarchy with Hardware Prefetching
IEEE Transactions on Computers
Measuring memory hierarchy performance of cache-coherent multiprocessors using micro benchmarks
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Measuring Experimental Error in Microprocessor Simulation
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches
IEEE Transactions on Computers
Effective Hardware-Based Data Prefetching for High-Performance Processors
IEEE Transactions on Computers
A Unified Formalization of Four Shared-Memory Models
IEEE Transactions on Parallel and Distributed Systems
Improving Memory Utilization in Cache Coherence Directories
IEEE Transactions on Parallel and Distributed Systems
The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared Memory Multiprocessor
IEEE Transactions on Parallel and Distributed Systems
Analytic Evaluation of Shared-Memory Architectures
IEEE Transactions on Parallel and Distributed Systems
Compiler-Directed Dynamic Frequency and Voltage Scheduling
PACS '00 Proceedings of the First International Workshop on Power-Aware Computer Systems-Revised Papers
How Can We Design Better Networks for DSM Systems?
PCRCW '97 Proceedings of the Second International Workshop on Parallel Computer Routing and Communication
Implementing Snoop-Coherence Protocol for Future SMP Architectures
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Improving the Performance of Heterogeneous DSMs via Multithreading
VECPAR '00 Selected Papers and Invited Talks from the 4th International Conference on Vector and Parallel Processing
Microprocessors - 10 Years Back, 10 Years Ahead
Informatics - 10 Years Back. 10 Years Ahead.
Content-Based Prefetching: Initial Results
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Inferential queueing and speculative push for reducing critical communication latencies
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Improving the Data Cache Performance of Multiprocessor Operating Systems
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Just Say No: Benefits of Early Cache Miss Determination
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Fighting the memory wall with assisted execution
Proceedings of the 1st conference on Computing frontiers
ACM Transactions on Computer Systems (TOCS)
Balanced scheduling: instruction scheduling when memory latency is uncertain
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Cache Refill/Access Decoupling for Vector Machines
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Tolerating memory latency through push prefetching for pointer-intensive applications
ACM Transactions on Architecture and Code Optimization (TACO)
Techniques for Efficient Processing in Runahead Execution Engines
Proceedings of the 32nd annual international symposium on Computer Architecture
Dynamically configurable shared CMP helper engines for improved performance
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
International Journal of Parallel Programming
Inferential queueing and speculative push
International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
A Case for MLP-Aware Cache Replacement
Proceedings of the 33rd annual international symposium on Computer Architecture
CAVA: Using checkpoint-assisted value prediction to hide L2 misses
ACM Transactions on Architecture and Code Optimization (TACO)
International Journal of Parallel Programming
Improving the performance and power efficiency of shared helpers in CMPs
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Accurate memory data flow modeling in statistical simulation
Proceedings of the 20th annual international conference on Supercomputing
The design space of data-parallel memory systems
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Coherence Ordering for Ring-based Chip Multiprocessors
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Scalable Cache Miss Handling for High Memory-Level Parallelism
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Compiler Optimization Technique for Data Cache Prefetching Using a Small CAM Array
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Cache memory for microprocessors
ACM SIGARCH Computer Architecture News
The NYU Ultracomputer Designing an MIMD Shared Memory Parallel Computer
IEEE Transactions on Computers
IEEE Transactions on Computers
Speeding-up multiprocessors running DBMS workloads through coherence protocols
International Journal of High Performance Computing and Networking
Focused prefetching: performance oriented prefetching based on commit stalls
Proceedings of the 22nd annual international conference on Supercomputing
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
3D-Stacked Memory Architectures for Multi-core Processors
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
The performance of pollution control victim cache for embedded systems
Proceedings of the 21st annual symposium on Integrated circuits and system design
On the potential of latency tolerant execution in speculative multithreading
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
A shared cache for a chip multi vector processor
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A light-weight fairness mechanism for chip multiprocessor memory systems
Proceedings of the 6th ACM conference on Computing frontiers
A case for bufferless routing in on-chip networks
Proceedings of the 36th annual international symposium on Computer architecture
Performance evaluation of NEC SX-9 using real science and engineering applications
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
A load-instruction unit for pipelined processors
IBM Journal of Research and Development
Application-aware prioritization mechanisms for on-chip networks
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Improving memory bank-level parallelism in the presence of prefetching
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Accelerating multi-core simulators
Proceedings of the 2010 ACM Symposium on Applied Computing
MLP-aware dynamic cache partitioning
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Impact analysis of performance faults in modern microprocessors
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Aérgia: exploiting packet latency slack in on-chip networks
Proceedings of the 37th annual international symposium on Computer architecture
On-Chip Network Evaluation Framework
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Pseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
The ZCache: Decoupling Ways and Associativity
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A workload-adaptive and reconfigurable bus architecture for multicore processors
International Journal of Reconfigurable Computing
Dynamic cache partitioning based on the MLP of cache misses
Transactions on high-performance embedded architectures and compilers III
Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs
ACM Transactions on Architecture and Code Optimization (TACO)
Processor directed dynamic page policy
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
When Prefetching Works, When It Doesn’t, and Why
ACM Transactions on Architecture and Code Optimization (TACO)
A multizone pipelined cache for IP routing
NETWORKING'05 Proceedings of the 4th IFIP-TC6 international conference on Networking Technologies, Services, and Protocols; Performance of Computer and Communication Networks; Mobile and Wireless Communication Systems
Minimalist open-page: a DRAM page-mode scheduling policy for the many-core era
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
ACM Transactions on Computer Systems (TOCS)
A high performance adaptive miss handling architecture for chip multiprocessors
Transactions on High-Performance Embedded Architectures and Compilers IV
NCOR: an FPGA-friendly nonblocking data cache for soft processors with runahead execution
International Journal of Reconfigurable Computing - Special issue on Selected Papers from the International Conference on Reconfigurable Computing and FPGAs (ReConFig'10)
Write performance improvement by hiding R drift latency in phase-change RAM
Proceedings of the 49th Annual Design Automation Conference
A case for exploiting subarray-level parallelism (SALP) in DRAM
Proceedings of the 39th Annual International Symposium on Computer Architecture
Disjoint out-of-order execution processor
ACM Transactions on Architecture and Code Optimization (TACO)
APCR: an adaptive physical channel regulator for on-chip interconnects
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
APC: a performance metric of memory systems
ACM SIGMETRICS Performance Evaluation Review
Adaptive cache management for a combined SRAM and DRAM cache hierarchy for multi-cores
Proceedings of the Conference on Design, Automation and Test in Europe
On the Impact of Performance Faults in Modern Microprocessors
Journal of Electronic Testing: Theory and Applications
Virtually split cache: An efficient mechanism to distribute instructions and data
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.03 |
In the past decade, there has been much literature describing various cache organizations that exploit general programming idiosyncrasies to obtain maximum hit rate (the probability that a requested datum is now resident in the cache). Little, if any, has been presented to exploit: (1) the inherent dual input nature of the cache and (2) the many-datum reference type central processor instructions. No matter how high the cache hit rate is, a cache miss may impose a penalty on subsequent cache references. This penalty is the necessity of waiting until the missed requested datum is received from central memory and, possibly, for cache update. For the two cases above, the cache references following a miss do not require the information of the datum not resident in the cache, and are therefore penalized in this fashion. In this paper, a cache organization is presented that essentially eliminates this penalty. This cache organizational feature has been incorporated in a cache/memory interface subsystem design, and the design has been implemented and prototyped. An existing simple instruction set machine has verified the advantage of this feature; future, more extensive and sophisticated instruction set machines may obviously take more advantage. Prior to prototyping, simulations verified the advantage.