Lightweight remote procedure call
ACM Transactions on Computer Systems (TOCS)
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A simulation based study of TLB performance
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
ATOM: a system for building customized program analysis tools
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Evaluating stream buffers as a secondary cache replacement
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Avoiding conflict misses dynamically in large direct-mapped caches
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Surpassing the TLB performance of superpages with less operating system support
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Reducing TLB and memory overhead using online superpage promotion
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The intrinsic bandwidth requirements of ordinary programs
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Fast volume rendering using a shear-warp factorization of the viewing transformation
Fast volume rendering using a shear-warp factorization of the viewing transformation
ICS '90 Proceedings of the 4th international conference on Supercomputing
Memory-system design considerations for dynamically-scheduled processors
Proceedings of the 24th annual international symposium on Computer architecture
Active pages: a computation model for intelligent memory
Proceedings of the 25th annual international symposium on Computer architecture
Increasing TLB reach using superpages backed by shadow memory
Proceedings of the 25th annual international symposium on Computer architecture
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Interactive ray tracing for isosurface rendering
Proceedings of the conference on Visualization '98
A bandwidth-efficient architecture for media processing
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A look at several memory management units, TLB-refill mechanisms, and page table organizations
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Proceedings of the 27th annual international symposium on Computer architecture
Digital Image Warping
Image Processing for Computer Graphics
Image Processing for Computer Graphics
3-D transformations of images in scanline order
SIGGRAPH '80 Proceedings of the 7th annual conference on Computer graphics and interactive techniques
Access ordering and memory-conscious cache utilization
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Software-Managed Address Translation
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Impulse: Building a Smarter Memory Controller
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Memory System Support for Image Processing
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Architectural adaptation for application-specific locality optimizations
ICCD '97 Proceedings of the 1997 International Conference on Computer Design (ICCD '97)
Reevaluating Online Superpage Promotion with Hardware Support
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Using virtual memory to improve cache and tlb performance
Using virtual memory to improve cache and tlb performance
Architectural Support for Uniprocessor and Multiprocessor Active Memory Systems
IEEE Transactions on Computers
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Performance directed energy management for main memory and disks
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Distributed Data Cache Designs for Clustered VLIW Processors
IEEE Transactions on Computers
Performance directed energy management for main memory and disks
ACM Transactions on Storage (TOS)
Hardware supported memory access for high performance main memory databases
DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
Efficient address remapping in distributed shared-memory systems
ACM Transactions on Architecture and Code Optimization (TACO)
Programmable bus/memory controllers in modern computer architecture
Proceedings of the 43rd annual Southeast regional conference - Volume 1
Memory Prefetching Using Adaptive Stream Detection
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
ALP: Efficient support for all levels of parallelism for complex media applications
ACM Transactions on Architecture and Code Optimization (TACO)
Limiting the power consumption of main memory
Proceedings of the 34th annual international symposium on Computer architecture
Data layouts for object-oriented programs
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Predicting locality phases for dynamic memory optimization
Journal of Parallel and Distributed Computing
Proceedings of the 21st annual international conference on Supercomputing
Scalable barrier synchronisation for large-scale shared-memory multiprocessors
International Journal of High Performance Computing and Networking
HMTT: a platform independent full-system memory trace monitoring system
SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Configurable data memory for multimedia processing
Journal of Signal Processing Systems - Special Issue: Embedded computing systems for DSP
Online Phase-Adaptive Data Layout Selection
ECOOP '08 Proceedings of the 22nd European conference on Object-Oriented Programming
Proceedings of the 7th ACM international conference on Computing frontiers
Enigma: architectural and operating system support for reducing the impact of address translation
Proceedings of the 24th ACM International Conference on Supercomputing
Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Streaming Data Movement for Real-Time Image Analysis
Journal of Signal Processing Systems
Constructing application-specific memory hierarchies on FPGAs
Transactions on high-performance embedded architectures and compilers III
Page placement in hybrid memory systems
Proceedings of the international conference on Supercomputing
Adaptive granularity memory systems: a tradeoff between storage efficiency and throughput
Proceedings of the 38th annual international symposium on Computer architecture
Dymaxion: optimizing memory access patterns for heterogeneous systems
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Hybrid DRAM/PRAM-based main memory for single-chip CPU/GPU
Proceedings of the 49th Annual Design Automation Conference
The Journal of Supercomputing
Hi-index | 14.99 |
Impulse is a memory system architecture that adds an optional level of address indirection at the memory controller. Applications can use this level of indirection to remap their data structures in memory. As a result, they can control how their data is accessed and cached, which can improve cache and bus utilization. The Impulse design does not require any modification to processor, cache, or bus designs since all the functionality resides at the memory controller. As a result, Impulse can be adopted in conventional systems without major system changes. We describe the design of the Impulse architecture and how an Impulse memory system can be used in a variety of ways to improve the performance of memory-bound applications. Impulse can be used to dynamically create superpages cheaply, to dynamically recolor physical pages, to perform strided fetches, and to perform gathers and scatters through indirection vectors. Our performance results demonstrate the effectiveness of these optimizations in a variety of scenarios. Using Impulse can speed up a range of applications from 20 percent to over a factor of 5. Alternatively, Impulse can be used by the OS for dynamic superpage creation; the best policy for creating superpages using Impulse outperforms previously known superpage creation policies.