Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
Active pages: a computation model for intelligent memory
Proceedings of the 25th annual international symposium on Computer architecture
Exploiting ILP in page-based intelligent memory
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Mapping irregular applications to DIVA, a PIM-based data-intensive architecture
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Using a user-level memory thread for correlation prefetching
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
IEEE Micro
Memory-Intensive Benchmarks: IRAM vs. Cache-Based Machines
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A PIM-based Multiprocessor System
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Impulse: Building a Smarter Memory Controller
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Performance Evaluation of Two Emerging Media Processors: VIRAM and Imagine
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
FlexRAM: Toward an Advanced Intelligent Memory System
ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Computer Architecture: A Quantitative Approach
Computer Architecture: A Quantitative Approach
Cache Coherence in Intelligent Memory Systems
IEEE Transactions on Computers
Correlation Prefetching with a User-Level Memory Thread
IEEE Transactions on Parallel and Distributed Systems
Power Efficient Processor Architecture and The Cell Processor
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Hi-index | 0.00 |
Combining ideas from several previous proposals, such as Active Pages, DIVA, and ULMT, we present the Memory Arithmetic Unit and Interface (MAUI) architecture. Because the "intelligence" of the MAUI intelligent memory system architecture is located in the memory-controller, logic and DRAM are not required to be integrated into a single chip, and use of off-the-shelf DRAMs is permitted. The MAUI's computational engine performs memory-bound SIMD computations close to the memory system, enabling more efficient memory pipelining. A simulator modeling the MAUI architecture was added to the SimpleScalar v4.0 tool-set. Not surprisingly, simulations show that application speedup increases as the memory system speed increases and the dataset size increases. Simulation results show single-threaded application speedup of over 100% is possible, and suggest that a total system speedup of about 300% is possible in a multi-threaded environment.