Memory Controller Optimizations for Web Servers

Authors:
Scott Rixner
Affiliations:
Rice University, Houston, TX
Venue:
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Year:
2004

Citing 13
Cited 31

A performance comparison of contemporary DRAM architectures

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Memory access scheduling

Proceedings of the 27th annual international symposium on Computer architecture
Eager writeback - a technique for improving bandwidth utilization

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Dynamic Access Ordering for Streamed Computations

IEEE Transactions on Computers
Concurrency, latency, or system overhead: which has the largest impact on uniprocessor DRAM-system performance?

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Simics: A Full System Simulation Platform

Computer
Impulse: Building a Smarter Memory Controller

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Command Vector Memory Systems: High Performance at Low Cost

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Reducing DRAM Latencies with an Integrated Memory Hierarchy Design

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Modern dram architectures

Modern dram architectures
Measuring the capacity of a web server

USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Flash: an efficient and portable web server

ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference

The irregular Z-buffer: Hardware acceleration for irregular data structures

ACM Transactions on Graphics (TOG)
The bit-reversal SDRAM address mapping

SCOPES '05 Proceedings of the 2005 workshop on Software and compilers for embedded systems
Long-latency branches: how much do they matter?

ACM SIGARCH Computer Architecture News
Fair Queuing Memory Systems

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
TCP offload through connection handoff

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Interactive presentation: Single-ended coding techniques for off-chip interconnects to commodity memory

Proceedings of the conference on Design, automation and test in Europe
Connection handoff policies for TCP offload network interfaces

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Memory scheduling for modern microprocessors

ACM Transactions on Computer Systems (TOCS)
Memory performance attacks: denial of memory service in multi-core systems

SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Exploiting program cyclic behavior to reduce memory latency in embedded processors

Proceedings of the 2008 ACM symposium on Applied computing
An approach for adaptive DRAM temperature and power management

Proceedings of the 22nd annual international conference on Supercomputing
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
High-Performance Buffer Mapping to Exploit DRAM Concurrency in Multiprocessor DSP Systems

RSP '09 Proceedings of the 2009 IEEE/IFIP International Symposium on Rapid System Prototyping
High-bandwidth network memory system through virtual pipelines

IEEE/ACM Transactions on Networking (TON)
An approach for adaptive DRAM temperature and power management

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Memory access schedule minimization for embedded systems

Journal of Systems Architecture: the EUROMICRO Journal
A fair thread-aware memory scheduling algorithm for chip multiprocessor

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Multiple sub-row buffers in DRAM: unlocking performance and energy improvement opportunities

Proceedings of the 26th ACM international conference on Supercomputing
Improving writeback efficiency with decoupled last-write prediction

Proceedings of the 39th Annual International Symposium on Computer Architecture
HaLock: hardware-assisted lock contention detection in multithreaded applications

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Timing effects of DDR memory systems in hard real-time multicore architectures: Issues and solutions

ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
High-performance and low-energy buffer mapping method for multiprocessor DSP systems

ACM Transactions on Embedded Computing Systems (TECS)
Regularities considered harmful: forcing randomness to memory accesses to reduce row buffer conflicts for multi-core, multi-bank systems

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Addressing End-to-End Memory Access Latency in NoC-Based Multicores

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Navigating big data with high-throughput, energy-efficient data partitioning

Proceedings of the 40th Annual International Symposium on Computer Architecture
Pragmatic integration of an SRAM row cache in heterogeneous 3-D DRAM architecture using TSV

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Effect of page frame allocation pattern on bank conflicts in multi-core systems

Proceedings of the 2013 Research in Adaptive and Convergent Systems
Reducing DRAM row activations with eager read/write clustering

ACM Transactions on Architecture and Code Optimization (TACO)
Direct distributed memory access for CMPs

Journal of Parallel and Distributed Computing
Application-aware Memory System for Fair and Efficient Execution of Concurrent GPGPU Applications

Proceedings of Workshop on General Purpose Processing Using GPUs

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper analyzes memory access scheduling and virtual channels as mechanisms to reduce the latency of main memory accesses by the CPU and peripherals in web servers. Despite the address filtering effects of the CPU's cache hierarchy, there is significant locality and bank parallelism in the DRAM access stream of a web server, which includes traffic from the operating system, application, and peripherals. However, a sequential memory controller leaves much of this locality and parallelism unexploited, as serialization and bank conflicts affect the realizable latency. Aggressive scheduling within the memory controller to exploit the available parallelism and locality can reduce the average read latency of the SDRAM. However, bank conflicts and the limited ability of the SDRAM's internal row buffers to act as a cache hinder further latency reduction. Virtual channel SDRAMovercomes these limitations by providing a set of channel buffers that can hold segments from rows of any internal SDRAM bank. This paper presents memory controller policies that can make effective use of these channel buffers to further reduce the average read latency of the SDRAM.