Massively Parallel Algorithms for Trace-Driven Cache Simulations

Authors:
D. M. Nicol;A. G. Greenberg;B. D. Lubachevsky
Affiliations:
-;-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1994

Citing 17
Cited 11

Data parallel algorithms

Communications of the ACM - Special issue on parallelism
An operating systems vade mecum; (2nd ed.)

An operating systems vade mecum; (2nd ed.)
Parallel merge sort

SIAM Journal on Computing
Cascading divide-and-conquer: a technique for designing parallel algorithms

SIAM Journal on Computing
Load balancing, selection sorting on the hypercube

SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
Evaluating Associativity in CPU Caches

IEEE Transactions on Computers
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
High-performance computer architecture (2nd ed.)

High-performance computer architecture (2nd ed.)
Randomized algorithms for binary search and load balancing with geometric applications

SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
Efficient trace-driven simulation method for cache performance analysis

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Data cache management using frequency-based replacement

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Introduction to parallel algorithms and architectures: array, trees, hypercubes

Introduction to parallel algorithms and architectures: array, trees, hypercubes
Synthetic Traces for Trace-Driven Simulation of Cache Memories

IEEE Transactions on Computers
Parallel Prefix Computation

Journal of the ACM (JACM)
Complexity Results for Permuting Data and Other Computations on Parallel Processors

Journal of the ACM (JACM)
Parallel trace-driven cache simulation by time partitioning

WSC' 90 Proceedings of the 22nd conference on Winter simulation
Multidimensional divide-and-conquer

Communications of the ACM

Empirical study of parallel trace-driven LRU cache simulators

PADS '95 Proceedings of the ninth workshop on Parallel and distributed simulation
Superfast parallel discrete event simulations

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Using Approximation with Time-Parallel Simulation

Simulation
Approximate time-parallel cache simulation

WSC '04 Proceedings of the 36th conference on Winter simulation
Improving Time Parallel Simulation for Monotone Systems

DS-RT '09 Proceedings of the 2009 13th IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications
Sampling Dead Block Prediction for Last-Level Caches

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Monotone queuing networks and time parallel simulation

ASMTA'11 Proceedings of the 18th international conference on Analytical and stochastic modeling techniques and applications
CIPARSim: cache intersection property assisted rapid single-pass FIFO cache simulation technique

Proceedings of the International Conference on Computer-Aided Design
Monotonicity and efficient computation of bounds with time parallel simulation

EPEW'11 Proceedings of the 8th European conference on Computer Performance Engineering
Tradeoff between accuracy and efficiency in the time-parallel simulation of monotone systems

EPEW'12 Proceedings of the 9th European conference on Computer Performance Engineering
Tradeoff between accuracy and efficiency in the time-parallel simulation of monotone systems

EPEW'12 Proceedings of the 9th European conference on Computer Performance Engineering

Quantified Score

Hi-index	0.02

Visualization

Abstract

Considers the use of massively parallel architectures to execute a trace-driven simulation of a single cache set. A method is presented for the least-recently-used (LRU) policy, which, regardless of the set size C, runs in time O(log N) using N processors on the EREW (exclusive read, exclusive write) parallel model. A simpler LRU simulation algorithm is given that runs in O(C log N) time using N/log N processors. We present timings of this algorithm's implementation on the MasPar MP-1, a machine with 16384 processors. A broad class of reference-based line replacement policies are considered, which includes LRU as well as the least-frequently-used (LFU) and random replacement policies. A simulation method is presented for any such policy that, on any trace of length N directed to a C line set, runs in O(C log N) time with high probability using N processors on the EREW model. The algorithms are simple, have very little space overhead, and are well suited for SIMD implementation.