Memory-Intensive Benchmarks: IRAM vs. Cache-Based Machines

Authors:
Brian R. Gaeke;Parry Husbands;Xiaoye S. Li;Leonid Oliker;Katherine A. Yelick;Rupak Biswas
Affiliations:
-;-;-;-;-;-
Venue:
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Year:
2002

Citing 10
Cited 7

Introduction to algorithms

Introduction to algorithms
Integer sorting on shared-memory vector parallel computers

ICS '98 Proceedings of the 12th international conference on Supercomputing
Mapping irregular applications to DIVA, a PIM-based data-intensive architecture

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Parallelization of a Dynamic Unstructured Algorithm Using Three Leading Programming Paradigms

IEEE Transactions on Parallel and Distributed Systems
Imagine: Media Processing with Streams

IEEE Micro
Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Reducing the bandwidth of sparse symmetric matrices

ACM '69 Proceedings of the 1969 24th national conference
Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors

Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors
A Media-Enhanced Vector Architecture for Embedded Memory Systems

A Media-Enhanced Vector Architecture for Embedded Memory Systems
Sorting networks and their applications

AFIPS '68 (Spring) Proceedings of the April 30--May 2, 1968, spring joint computer conference

Design and Optimization of Large Size and Low Overhead Off-Chip Caches

IEEE Transactions on Computers
Performance characteristics of MAUI: an intelligent memory system architecture

Proceedings of the 2005 workshop on Memory system performance
SCMP: a single-chip message-passing parallel computer

The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
Process scheduling for future multicore processors

Proceedings of the Fifth International Workshop on Interconnection Network Architecture: On-Chip, Multi-Chip
A minimal average accessing time scheduler for multicore processors

ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
Exploration of heuristic scheduling algorithms for 3D multicore processors

Proceedings of the 15th International Workshop on Software and Compilers for Embedded Systems
A new perspective on processing-in-memory architecture design

Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness

Quantified Score

Hi-index	0.00

Visualization

Abstract

The increasing gap between processor and memory performance has led to new architectural models for memory-intensive applications. In this paper, we use a set of memory-intensive benchmarks to evaluate a mixed logic and DRAM processor called VIRAM as a building block for scientific computing. For each benchmark, we explore the fundamental hardware requirements of the problem as well as alternative algorithms and data structures that can help expose fine-grained parallelism or simplify memory access patterns. Results indicate that VIRAM is significantly faster than conventional cachebased machines for problems that are truly limited by the memory system and that it has a significant power advantage across all the benchmarks.