Communications of the ACM - Special issue on parallelism
An operating systems vade mecum; (2nd ed.)
An operating systems vade mecum; (2nd ed.)
SIAM Journal on Computing
Cascading divide-and-conquer: a technique for designing parallel algorithms
SIAM Journal on Computing
Load balancing, selection sorting on the hypercube
SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
High-performance computer architecture (2nd ed.)
High-performance computer architecture (2nd ed.)
Randomized algorithms for binary search and load balancing with geometric applications
SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
Efficient trace-driven simulation method for cache performance analysis
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Data cache management using frequency-based replacement
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Synthetic Traces for Trace-Driven Simulation of Cache Memories
IEEE Transactions on Computers
Journal of the ACM (JACM)
Complexity Results for Permuting Data and Other Computations on Parallel Processors
Journal of the ACM (JACM)
Parallel trace-driven cache simulation by time partitioning
WSC' 90 Proceedings of the 22nd conference on Winter simulation
Multidimensional divide-and-conquer
Communications of the ACM
Empirical study of parallel trace-driven LRU cache simulators
PADS '95 Proceedings of the ninth workshop on Parallel and distributed simulation
Superfast parallel discrete event simulations
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Approximate time-parallel cache simulation
WSC '04 Proceedings of the 36th conference on Winter simulation
Improving Time Parallel Simulation for Monotone Systems
DS-RT '09 Proceedings of the 2009 13th IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications
Sampling Dead Block Prediction for Last-Level Caches
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Monotone queuing networks and time parallel simulation
ASMTA'11 Proceedings of the 18th international conference on Analytical and stochastic modeling techniques and applications
CIPARSim: cache intersection property assisted rapid single-pass FIFO cache simulation technique
Proceedings of the International Conference on Computer-Aided Design
Monotonicity and efficient computation of bounds with time parallel simulation
EPEW'11 Proceedings of the 8th European conference on Computer Performance Engineering
Tradeoff between accuracy and efficiency in the time-parallel simulation of monotone systems
EPEW'12 Proceedings of the 9th European conference on Computer Performance Engineering
Tradeoff between accuracy and efficiency in the time-parallel simulation of monotone systems
EPEW'12 Proceedings of the 9th European conference on Computer Performance Engineering
Hi-index | 0.02 |
Considers the use of massively parallel architectures to execute a trace-driven simulation of a single cache set. A method is presented for the least-recently-used (LRU) policy, which, regardless of the set size C, runs in time O(log N) using N processors on the EREW (exclusive read, exclusive write) parallel model. A simpler LRU simulation algorithm is given that runs in O(C log N) time using N/log N processors. We present timings of this algorithm's implementation on the MasPar MP-1, a machine with 16384 processors. A broad class of reference-based line replacement policies are considered, which includes LRU as well as the least-frequently-used (LFU) and random replacement policies. A simulation method is presented for any such policy that, on any trace of length N directed to a C line set, runs in O(C log N) time with high probability using N processors on the EREW model. The algorithms are simple, have very little space overhead, and are well suited for SIMD implementation.