Performance Analysis of Cache Memories
Journal of the ACM (JACM)
ACM Computing Surveys (CSUR)
Cold-start vs. warm-start miss ratios
Communications of the ACM
Economies of scale and the IBM system/360
Communications of the ACM
Computer Structures: Principles and Examples
Computer Structures: Principles and Examples
Computer Engineering; A DEC View of Hardware Systems Design
Computer Engineering; A DEC View of Hardware Systems Design
A study of instruction cache organizations and replacement policies
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Reduction of memory interference in multiprocessor systems
ISCA '77 Proceedings of the 4th annual symposium on Computer architecture
Cache coherence protocols: evaluation using a multiprocessor simulation model
ACM Transactions on Computer Systems (TOCS)
Memory coherence in shared virtual memory systems
PODC '86 Proceedings of the fifth annual ACM symposium on Principles of distributed computing
ATUM: a new technique for capturing address traces using microcode
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Software-controlled caches in the VMP multiprocessor
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
On the use of registers vs. cache to minimize memory traffic
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
A class of compatible cache consistency protocols and their support by the IEEE futurebus
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Multiprocessor cache synchronization: issues, innovations, evolution
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
An implementation independent approach to cache memories
ACM SIGARCH Computer Architecture News
On cacheability of lock-variables in tightly coupled multiprocessor systems
ACM SIGARCH Computer Architecture News
Hardware support for interprocess communication
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Cache design of a sub-micron CMOS system/370
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Correct memory operation of cache-based multiprocessors
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Hierarchical cache/bus architecture for shared memory multiprocessors
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Multiprocessor cache design considerations
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
VLSI assist for a multiprocessor
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Coherency for multiprocessor virtual address caches
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
An analysis of Memnet—an experiment in high-speed shared-memory local networking
SIGCOMM '88 Symposium proceedings on Communications architectures and protocols
A simulation study of two-level caches
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Multiprocessor cache analysis using ATUM
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
An evaluation of directory schemes for cache coherence
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A cache coherence scheme with fast selective invalidation
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A cache-based message passing scheme for a shared-bus multiprocessor
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
The VMP multiprocessor: initial experience, refinements, and performance evaluation
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
The Wisconsin multicube: a new large-scale cache-coherent multiprocessor
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A Case for Direct-Mapped Caches
Computer
A two-tier memory architecture for high-performance multiprocessor systems
ICS '88 Proceedings of the 2nd international conference on Supercomputing
A cache coherence approach for large multiprocessor systems
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Memory-reference characteristics of multiprocessor applications under MACH
SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Efficient (stack) algorithms for analysis of write-back and sector memories
ACM Transactions on Computer Systems (TOCS)
Cache Memory Organization to Enhance the Yield of High Performance VLSI Processors
IEEE Transactions on Computers
ACM Transactions on Computer Systems (TOCS)
Evaluating the performance of software cache coherence
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Analysis of cache invalidation patterns in multiprocessors
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Design and performance of a coherent cache for parallel logic programming architectures
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Multiple vs. wide shared bus multiprocessors
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
A cache consistency protocol for multiprocessors with multistage networks
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Memory coherence in shared virtual memory systems
ACM Transactions on Computer Systems (TOCS)
Analysis and Comparison of Cache Coherence Protocols for a Packet-Switched Multiprocessor
IEEE Transactions on Computers
Memory Access Dependencies in Shared-Memory Multiprocessors
IEEE Transactions on Software Engineering
Analysis of multithreaded architectures for parallel computing
SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
Performance evaluation of a commercial cache-coherent shared memory multiprocessor
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The effects of processor architecture on instruction memory traffic
ACM Transactions on Computer Systems (TOCS)
High-bandwidth data memory systems for superscalar processors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
LimitLESS directories: A scalable cache coherence scheme
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
An efficient cache-based access anomaly detection scheme
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Combining hardware and software cache coherence strategies
ICS '91 Proceedings of the 5th international conference on Supercomputing
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Evaluating Design Choices for Shared Bus Multiprocessors in a Throughput-Oriented Environment
IEEE Transactions on Computers
Analysis of directory based cache coherence schemes with multistage networks
CSC '92 Proceedings of the 1992 ACM annual conference on Communications
Cache Invalidation Patterns in Shared-Memory Multiprocessors
IEEE Transactions on Computers
A Model of Workloads and its Use in Miss-Rate Prediction for Fully Associative Caches
IEEE Transactions on Computers
Design choices for the TOP-1 multiprocessor workstation
IBM Journal of Research and Development
Life span strategy—a compiler-based approach to cache coherence
ICS '92 Proceedings of the 6th international conference on Supercomputing
A performance evaluation of optimal hybrid cache coherency protocols
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
A scalable coherent cache system with a dynamic pointing scheme
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Cache coherence in large-scale shared-memory multiprocessors: issues and comparisons
ACM Computing Surveys (CSUR)
Cache write policies and performance
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
A unified architectural tradeoff methodology
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Optimal allocation of on-chip memory for multiple-API operating systems
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Decoupled sectored caches: conciliating low tag implementation cost
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Cache designs with partial address matching
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Surpassing the TLB performance of superpages with less operating system support
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
A new page table for 64-bit address spaces
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
CAT—caching address tags: a technique for reducing area cost of on-chip caches
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Don't use the page number, but a pointer to it
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Architecture Technique Trade-Offs Using Mean Memory Delay Time
IEEE Transactions on Computers
IEEE Transactions on Computers
An efficient caching support for critical sections in large-scale shared-memory multiprocessors
ICS '90 Proceedings of the 4th international conference on Supercomputing
The selection of optimal cache lines for microprocessor-based controllers
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
A memory management unit and cache controller for the MARS system
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Transactional client-server cache consistency: alternatives and performance
ACM Transactions on Database Systems (TODS)
Evaluating parallel logic programming systems on scalable multiprocessors
PASCO '97 Proceedings of the second international symposium on Parallel symbolic computation
Minimizing Area Cost of On-Chip Cache Memories by Caching Address Tags
IEEE Transactions on Computers
Retrospective: using cache memory to reduce processor-memory traffic
25 years of the international symposia on Computer architecture (selected papers)
Retrospective: a low-overhead coherence solution for multiprocessors with private cache memories
25 years of the international symposia on Computer architecture (selected papers)
A low-overhead coherence solution for multiprocessors with private cache memories
25 years of the international symposia on Computer architecture (selected papers)
An evaluation of directory schemes for cache coherence
25 years of the international symposia on Computer architecture (selected papers)
Multicast snooping: a new coherence method using a multicast address network
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
The pool of subsectors cache design
ICS '99 Proceedings of the 13th international conference on Supercomputing
IEEE Transactions on Parallel and Distributed Systems
SIGMETRICS '86/PERFORMANCE '86 Proceedings of the 1986 ACM SIGMETRICS joint international conference on Computer performance modelling, measurement and evaluation
A version control approach to Cache coherence
ICS '89 Proceedings of the 3rd international conference on Supercomputing
An architecture for mostly functional languages
LFP '86 Proceedings of the 1986 ACM conference on LISP and functional programming
The design and development of a very high speed system bus—the encore Mutlimax nanobus
ACM '86 Proceedings of 1986 ACM Fall joint computer conference
Synchronization with multiprocessor caches
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
PLUS: a distributed shared-memory system
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Improvements in multiprocessor system design
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Implementing a cache consistency protocol
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Distributed and Parallel Databases - Special issue on mobile data management and applications
A process cache memory for tightly coupled multiprocessor systems
ACM-SE 30 Proceedings of the 30th annual Southeast regional conference
IEEE Transactions on Parallel and Distributed Systems
ADir_pNB: A Cost-Effective Way to Implement Full Map Directory-Based Cache Coherence Protocols
IEEE Transactions on Computers
Cache performance in vector supercomputers
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Motorola's 88000 Family Architecture
IEEE Micro
Performance Implications of Tolerating Cache Faults
IEEE Transactions on Computers
Design and Analysis of Cache Coherent Multistage Interconnection Networks
IEEE Transactions on Computers
Design and Analysis of a Scalable Cache Coherence Scheme Based on Clocks and Timestamps
IEEE Transactions on Parallel and Distributed Systems
Design of an Adaptive Cache Coherence Protocol for Large Scale Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Improving Memory Utilization in Cache Coherence Directories
IEEE Transactions on Parallel and Distributed Systems
The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared Memory Multiprocessor
IEEE Transactions on Parallel and Distributed Systems
A Trace-Driven Simulator for Performance Evaluation of Cache-Based Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
The Influence of Architectural Parameters on the Performance of Parallel Logic Programming Systems
PADL '99 Proceedings of the First International Workshop on Practical Aspects of Declarative Languages
The Impact of Cache Coherence Protocols on Parallel Logic Programming Systems
CL '00 Proceedings of the First International Conference on Computational Logic
Experimental evaluation of on-chip microprocessor cache memories
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
The use of static column ram as a memory hierarchy
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Dynamic decentralized cache schemes for mimd parallel processors
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
A low-overhead coherence solution for multiprocessors with private cache memories
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
An economical solution to the cache coherence problem
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Creating a wider bus using caching techniques
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Transactional Memory Coherence and Consistency
Proceedings of the 31st annual international symposium on Computer architecture
A Two-Level Directory Architecture for Highly Scalable cc-NUMA Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Characterization of TCC on Chip-Multiprocessors
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
International Journal of Parallel Programming
Pipelined Execution of Critical Sections Using Software-Controlled Caching in Network Processors
Proceedings of the International Symposium on Code Generation and Optimization
A consistency architecture for hierarchical shared caches
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Synapse tightly coupled multiprocessors: a new approach to solve old problems
AFIPS '84 Proceedings of the July 9-12, 1984, national computer conference and exposition
Transient blocking synchronization
Transient blocking synchronization
Exploit temporal locality of shared data in SRC enabled CMP
NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
Cohesion: a hybrid memory model for accelerators
Proceedings of the 37th annual international symposium on Computer architecture
WAYPOINT: scaling coherence to thousand-core architectures
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Switch-based packing technique to reduce traffic and latency in token coherence
Journal of Parallel and Distributed Computing
Complexity-effective multicore coherence
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
ARI: Adaptive LLC-memory traffic management
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.04 |
The importance of reducing processor-memory bandwidth is recognized in two distinct situations: single board computer systems and microprocessors of the future. Cache memory is investigated as a way to reduce the memory-processor traffic. We show that traditional caches which depend heavily on spatial locality (look-ahead) for their performance are inappropriate in these environments because they generate large bursts of bus traffic. A cache exploiting primarily temporal locality (look-behind) is then proposed and demonstrated to be effective in an environment where process switches are infrequent. We argue that such an environment is possible if the traffic to backing store is small enough that many processors can share a common memory and if the cache data consistency problem is solved. We demonstrate that such a cache can indeed reduce traffic to memory greatly, and introduce an elegant solution to the cache coherency problem.