Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Coherence controller architectures for SMP-based CC-NUMA multiprocessors
Proceedings of the 24th annual international symposium on Computer architecture
Design Verification of the S3.mp Cache-Coherent Shared-Memory System
IEEE Transactions on Computers
Smart Memories: a modular reconfigurable architecture
Proceedings of the 27th annual international symposium on Computer architecture
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
IEEE Micro
Imagine: Media Processing with Streams
IEEE Micro
Impulse: Building a Smarter Memory Controller
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
Proceedings of the 30th annual international symposium on Computer architecture
Design Trade-Offs in High-Throughput Coherence Controllers
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Transactional Memory Coherence and Consistency
Proceedings of the 31st annual international symposium on Computer architecture
The STAMPede approach to thread-level speculation
ACM Transactions on Computer Systems (TOCS)
Phoenix: Detecting and Recovering from Permanent Processor Design Bugs with Programmable Hardware
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Transactional Memory (Synthesis Lectures on Computer Architecture)
Transactional Memory (Synthesis Lectures on Computer Architecture)
The AMD Opteron Northbridge Architecture
IEEE Micro
Using Field-Repairable Control Logic to Correct Design Errors in Microprocessors
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Using a configurable processor generator for computer architecture prototyping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
ACM Transactions on Architecture and Code Optimization (TACO)
On-chip communication and synchronization mechanisms with cache-integrated network interfaces
Proceedings of the 7th ACM international conference on Computing frontiers
Understanding sources of inefficiency in general-purpose chips
Proceedings of the 37th annual international symposium on Computer architecture
Removing overhead from high-level interfaces
Proceedings of the 49th Annual Design Automation Conference
PARDIS: a programmable memory controller for the DDRx interfacing standards
Proceedings of the 39th Annual International Symposium on Computer Architecture
Extensible sparse functional arrays with circuit parallelism
Proceedings of the 15th Symposium on Principles and Practice of Declarative Programming
A programmable memory controller for the DDRx interfacing standards
ACM Transactions on Computer Systems (TOCS)
Hi-index | 0.00 |
As CPU cores become building blocks, we see a great expansion in the types of on-chip memory systems proposed for CMPs. Unfortunately, designing the cache and protocol controllers to support these memory systems is complex, and their concurrency and latency characteristics significantly affect the performance of any CMP. To address this problem, this paper presents a microarchitecture framework for cache and protocol controllers, which can aid in generating the RTL for new memory systems. The framework consists of three pipelined engines' request-tracking, state-manipulation, and data movement' which are programmed to implement a higher-level memory model. This approach simplifies the design and verification of CMP systems by decomposing the memory model into sequences of state and data manipulations. Moreover, implementing the framework itself produces a polymorphic memory system. To validate the approach, we implemented a scalable, flexible CMP in silicon. The memory system was then programmed to support three disparate memory models' cache coherent shared memory, streams and transactional memory. Measured overheads of this approach seem promising. Our system generates controllers with performance overheads of less than 20% compared to an ideal controller with zero internal latency. Even the overhead of directly implementing a fully programmable controller was modest. While it did double the controller's area, the amortized effective area in the system grew by roughly 7%.