Lightweight remote procedure call
ACM Transactions on Computer Systems (TOCS)
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Data prefetching in multiprocessor vector cache memories
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Increasing the number of strides for conflict-free vector access
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Reducing memory latency via non-blocking and prefetching caches
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Evaluating stream buffers as a secondary cache replacement
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Avoiding conflict misses dynamically in large direct-mapped caches
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Data relocation and prefetching for programs with large data sets
Data relocation and prefetching for programs with large data sets
Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The intrinsic bandwidth requirements of ordinary programs
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Design and evaluation of dynamic access ordering hardware
ICS '96 Proceedings of the 10th international conference on Supercomputing
ICS '90 Proceedings of the 4th international conference on Supercomputing
Memory-system design considerations for dynamically-scheduled processors
Proceedings of the 24th annual international symposium on Computer architecture
Active pages: a computation model for intelligent memory
Proceedings of the 25th annual international symposium on Computer architecture
Increasing TLB reach using superpages backed by shadow memory
Proceedings of the 25th annual international symposium on Computer architecture
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
A bandwidth-efficient architecture for media processing
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Access ordering and memory-conscious cache utilization
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Distributed Prefetch-buffer/Cache Design for High Performance Memory Systems
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
The Impact of Instruction-Level Parallelism on Multiprocessor Performance and Simulation Methodology
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Access Order and Effective Bandwidth for Streams on a Direct Rambus Memory
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Impulse: Building a Smarter Memory Controller
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Command Vector Memory Systems: High Performance at Low Cost
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Architectural adaptation for application-specific locality optimizations
ICCD '97 Proceedings of the 1997 International Conference on Computer Design (ICCD '97)
Gemini: An Optical Interconnection Network for Parallel Processing
IEEE Transactions on Parallel and Distributed Systems
Optimizing Graph Algorithms for Improved Cache Performance
IEEE Transactions on Parallel and Distributed Systems
Stream Programming on General-Purpose Processors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Executing irregular scientific applications on stream architectures
Proceedings of the 21st annual international conference on Supercomputing
Hi-index | 0.00 |
Impulse is a new memory system architecture that adds two important features to a traditional memory controller. First, Impulse supports application-specific optimizations through configurable physical address remapping. By remapping physical addresses, applications control how their data is accessed and cached, improving their cache and bus utilization. Second, Impulse supports prefetching at the memory controller, which can hide much of the latency of DRAM accesses. Because it requires no modification to processor, cache, or bus designs, Impulse can be adopted in conventional systems. In this paper we describe the design of the Impulse architecture, and show how an Impulse memory system can improve the performance of memory-bound scientific applications. For instance, Impulse decreases the running time of the NAS conjugate gradient benchmark by 67%. We expect that Impulse will also benefit regularly strided, memory-bound applications of commercial importance, such as database and multimedia programs.