The Impulse Memory Controller

Authors:
Lixin Zhang;Zhen Fang;Mide Parker;Binu K. Mathew;Lambert Schaelicke;John B. Carter;Wilson C. Hsieh;Sally A. McKee
Affiliations:
Univ.of Utah, Salt Lake City, UT;Univ. of Utah, Salt Lake City, UT;Univ. of Utah, Salt Lake City, UT;Univ. of Utah, Salt Lake City, UT;Univ. of Utah, Salt Lake City, UT;Univ. of Utah, Salt Lake City, UT;Univ. of Utah, Salt Lake City, UT;Univ. of Utah, Salt Lake City, UT
Venue:
IEEE Transactions on Computers
Year:
2001

Citing 34
Cited 30

Lightweight remote procedure call

ACM Transactions on Computer Systems (TOCS)
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A simulation based study of TLB performance

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
ATOM: a system for building customized program analysis tools

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Evaluating stream buffers as a secondary cache replacement

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Avoiding conflict misses dynamically in large direct-mapped caches

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Surpassing the TLB performance of superpages with less operating system support

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Reducing TLB and memory overhead using online superpage promotion

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Memory bandwidth limitations of future microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The intrinsic bandwidth requirements of ordinary programs

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Fast volume rendering using a shear-warp factorization of the viewing transformation

Fast volume rendering using a shear-warp factorization of the viewing transformation
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
Memory-system design considerations for dynamically-scheduled processors

Proceedings of the 24th annual international symposium on Computer architecture
Active pages: a computation model for intelligent memory

Proceedings of the 25th annual international symposium on Computer architecture
Increasing TLB reach using superpages backed by shadow memory

Proceedings of the 25th annual international symposium on Computer architecture
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
Interactive ray tracing for isosurface rendering

Proceedings of the conference on Visualization '98
A bandwidth-efficient architecture for media processing

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A look at several memory management units, TLB-refill mechanisms, and page table organizations

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Recency-based TLB preloading

Proceedings of the 27th annual international symposium on Computer architecture
Digital Image Warping

Digital Image Warping
Image Processing for Computer Graphics

Image Processing for Computer Graphics
Scalable Processors in the Billion-Transistor Era: IRAM

Computer
Baring It All to Software: Raw Machines

Computer
3-D transformations of images in scanline order

SIGGRAPH '80 Proceedings of the 7th annual conference on Computer graphics and interactive techniques
Access ordering and memory-conscious cache utilization

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Software-Managed Address Translation

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Impulse: Building a Smarter Memory Controller

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Memory System Support for Image Processing

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Architectural adaptation for application-specific locality optimizations

ICCD '97 Proceedings of the 1997 International Conference on Computer Design (ICCD '97)
Reevaluating Online Superpage Promotion with Hardware Support

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Using virtual memory to improve cache and tlb performance

Using virtual memory to improve cache and tlb performance

Architectural Support for Uniprocessor and Multiprocessor Active Memory Systems

IEEE Transactions on Computers
Locality phase prediction

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Performance directed energy management for main memory and disks

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Distributed Data Cache Designs for Clustered VLIW Processors

IEEE Transactions on Computers
Performance directed energy management for main memory and disks

ACM Transactions on Storage (TOS)
Hardware supported memory access for high performance main memory databases

DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
Efficient address remapping in distributed shared-memory systems

ACM Transactions on Architecture and Code Optimization (TACO)
Programmable bus/memory controllers in modern computer architecture

Proceedings of the 43rd annual Southeast regional conference - Volume 1
Memory Prefetching Using Adaptive Stream Detection

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
ALP: Efficient support for all levels of parallelism for complex media applications

ACM Transactions on Architecture and Code Optimization (TACO)
Limiting the power consumption of main memory

Proceedings of the 34th annual international symposium on Computer architecture
Sparse Matrix Computations on Reconfigurable Hardware

Computer
Data layouts for object-oriented programs

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Predicting locality phases for dynamic memory optimization

Journal of Parallel and Distributed Computing
Active memory operations

Proceedings of the 21st annual international conference on Supercomputing
Scalable barrier synchronisation for large-scale shared-memory multiprocessors

International Journal of High Performance Computing and Networking
HMTT: a platform independent full-system memory trace monitoring system

SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Configurable data memory for multimedia processing

Journal of Signal Processing Systems - Special Issue: Embedded computing systems for DSP
Online Phase-Adaptive Data Layout Selection

ECOOP '08 Proceedings of the 22nd European conference on Object-Oriented Programming
Models for generating locality-tuned traveling threads for a hierarchical multi-level heterogeneous multicore

Proceedings of the 7th ACM international conference on Computing frontiers
Enigma: architectural and operating system support for reducing the impact of address translation

Proceedings of the 24th ACM International Conference on Supercomputing
Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Streaming Data Movement for Real-Time Image Analysis

Journal of Signal Processing Systems
Constructing application-specific memory hierarchies on FPGAs

Transactions on high-performance embedded architectures and compilers III
Page placement in hybrid memory systems

Proceedings of the international conference on Supercomputing
Adaptive granularity memory systems: a tradeoff between storage efficiency and throughput

Proceedings of the 38th annual international symposium on Computer architecture
Dymaxion: optimizing memory access patterns for heterogeneous systems

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Hybrid DRAM/PRAM-based main memory for single-chip CPU/GPU

Proceedings of the 49th Annual Design Automation Conference
Active memory controller

The Journal of Supercomputing

Quantified Score

Hi-index	14.99

Visualization

Abstract

Impulse is a memory system architecture that adds an optional level of address indirection at the memory controller. Applications can use this level of indirection to remap their data structures in memory. As a result, they can control how their data is accessed and cached, which can improve cache and bus utilization. The Impulse design does not require any modification to processor, cache, or bus designs since all the functionality resides at the memory controller. As a result, Impulse can be adopted in conventional systems without major system changes. We describe the design of the Impulse architecture and how an Impulse memory system can be used in a variety of ways to improve the performance of memory-bound applications. Impulse can be used to dynamically create superpages cheaply, to dynamically recolor physical pages, to perform strided fetches, and to perform gathers and scatters through indirection vectors. Our performance results demonstrate the effectiveness of these optimizations in a variety of scenarios. Using Impulse can speed up a range of applications from 20 percent to over a factor of 5. Alternatively, Impulse can be used by the OS for dynamic superpage creation; the best policy for creating superpages using Impulse outperforms previously known superpage creation policies.