Using cache memory to reduce processor-memory traffic
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Effects of cache coherency in multiprocessors
ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Throughput analysis and configuration design of a shared-resource multiprocessor system: PUMPS
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Efficient interprocessor communication for MIMD multiprocessor systems
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Performance evaluation of multiple processor systems.
Performance evaluation of multiple processor systems.
Software structures for ultraparallel computing
Software structures for ultraparallel computing
Cache coherence protocols: evaluation using a multiprocessor simulation model
ACM Transactions on Computer Systems (TOCS)
Parallel algorithms and architectures for rule-based systems
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
A class of compatible cache consistency protocols and their support by the IEEE futurebus
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Multiprocessor cache synchronization: issues, innovations, evolution
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
An implementation independent approach to cache memories
ACM SIGARCH Computer Architecture News
On cacheability of lock-variables in tightly coupled multiprocessor systems
ACM SIGARCH Computer Architecture News
Correct memory operation of cache-based multiprocessors
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Hierarchical cache/bus architecture for shared memory multiprocessors
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Multiprocessor cache design considerations
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Simple, efficient, asynchronous parallel algorithms for maximization
ACM Transactions on Programming Languages and Systems (TOPLAS)
An evaluation of directory schemes for cache coherence
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A cache coherence scheme with fast selective invalidation
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
The Wisconsin multicube: a new large-scale cache-coherent multiprocessor
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A cache coherence approach for large multiprocessor systems
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Memory-reference characteristics of multiprocessor applications under MACH
SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Multiprocessor architectures are converging
C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
High-speed implementations of rule-based systems
ACM Transactions on Computer Systems (TOCS)
Efficient synchronization primitives for large-scale cache-coherent multiprocessors
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Analysis of cache invalidation patterns in multiprocessors
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Evaluating the performance of four snooping cache coherency protocols
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
A cache consistency protocol for multiprocessors with multistage networks
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Cache considerations for multiprocessor programmers
Communications of the ACM
Memory Access Dependencies in Shared-Memory Multiprocessors
IEEE Transactions on Software Engineering
Snoopy cache test-and-test-and-set without execessive bus contention
ACM SIGARCH Computer Architecture News
Counting networks and multi-processor coordination
STOC '91 Proceedings of the twenty-third annual ACM symposium on Theory of computing
Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
An efficient cache-based access anomaly detection scheme
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Synchronization without contention
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Performance Measurement and Modeling to Evaluate Various Effects on a Shared memory Multiprocessor
IEEE Transactions on Software Engineering
Architectural primitives for a scalable shared memory multiprocessor
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Race-free interconnection networks and multiprocessor consistency
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Low contention load balancing on large-scale multiprocessors
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Life span strategy—a compiler-based approach to cache coherence
ICS '92 Proceedings of the 6th international conference on Supercomputing
Waiting algorithms for synchronization in large-scale multiprocessors
ACM Transactions on Computer Systems (TOCS)
A methodology for implementing highly concurrent data objects
ACM Transactions on Programming Languages and Systems (TOPLAS)
Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The verification of cache coherence protocols
SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Contention in shared memory algorithms
STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Journal of the ACM (JACM)
Reactive synchronization algorithms for multiprocessors
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
ACM Transactions on Computer Systems (TOCS)
The communication requirements of mutual exclusion
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Boosting the performance of hybrid snooping cache protocols
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A steady state analysis of diffracting trees (extended abstract)
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
An Architecture for Tolerating Processor Failures in Shared-Memory Multiprocessors
IEEE Transactions on Computers
Verification techniques for cache coherence protocols
ACM Computing Surveys (CSUR)
An efficient caching support for critical sections in large-scale shared-memory multiprocessors
ICS '90 Proceedings of the 4th international conference on Supercomputing
A memory management unit and cache controller for the MARS system
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Efficient synchronization: let them eat QOLB
Proceedings of the 24th annual international symposium on Computer architecture
Contention in shared memory algorithms
Journal of the ACM (JACM)
An evaluation of directory schemes for cache coherence
25 years of the international symposia on Computer architecture (selected papers)
Weak ordering—a new definition
25 years of the international symposia on Computer architecture (selected papers)
ICS '99 Proceedings of the 13th international conference on Supercomputing
IEEE Transactions on Parallel and Distributed Systems
SIGMETRICS '86/PERFORMANCE '86 Proceedings of the 1986 ACM SIGMETRICS joint international conference on Computer performance modelling, measurement and evaluation
A version control approach to Cache coherence
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Weak ordering—a new definition
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Implementing a cache consistency protocol
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Speculative lock elision: enabling highly concurrent multithreaded execution
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
High performance dynamic lock-free hash tables and list-based sets
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
International Journal of Parallel Programming
Characterizing the Performance of Algorithms for Lock-Free Objects
IEEE Transactions on Computers
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Design and Analysis of a Scalable Cache Coherence Scheme Based on Clocks and Timestamps
IEEE Transactions on Parallel and Distributed Systems
A New Approach for the Verification of Cache Coherence Protocols
IEEE Transactions on Parallel and Distributed Systems
Efficient synchronization for nonuniform communication architectures
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Inferential queueing and speculative push for reducing critical communication latencies
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Two techniques for improving performance on bus-based multiprocessors
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Two Adaptive Hybrid Cache Coherency Protocols
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Evaluation of cache consistency algorithm performance
MASCOTS '96 Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects
IEEE Transactions on Parallel and Distributed Systems
Inferential queueing and speculative push
International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
Landing openMP on cyclops-64: an efficient mapping of openMP to a many-core system-on-a-chip
Proceedings of the 3rd conference on Computing frontiers
Proceedings of the conference on Design, automation and test in Europe
Critical sections: re-emerging scalability concerns for database storage engines
Proceedings of the 4th international workshop on Data management on new hardware
A method for analysis and solution of scalability bottleneck in DBMS
Proceedings of the 2010 Symposium on Information and Communication Technology
vNUMA: a virtual shared-memory multiprocessor
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Bottleneck identification and scheduling in multithreaded applications
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Reducing contention through priority updates
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Hi-index | 0.02 |
This paper presents two cache schemes for a shared-memory shared bus multiprocessor. Both schemes feature decentralized consistency control and dynamic type classification of the datum cached (i.e. read-only, local, or shared). It is shown how to exploit these features to minimize the shared bus traffic. The broadcasting ability of the shared bus is used not only to signal an event but also to distribute data. In addition, by introducing a new synchronization construct, i.e. the Test-and-Test-and-Set instruction, many of the traditional. parallell processing “hot spots” or bottlenecks are eliminated. Sketches of formal correctness proofs for the proposed schemes are also presented. It appears that moderately large parallel processors can be designed by employing the principles presented in this paper.