A model for hierarchical memory
STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
An extended set of FORTRAN basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
ACM Transactions on Mathematical Software (TOMS)
The input/output complexity of sorting and related problems
Communications of the ACM
Communication complexity of PRAMs
Theoretical Computer Science - Special issue: Fifteenth international colloquium on automata, languages and programming, Tampere, Finland, July 1988
A bridging model for parallel computation
Communications of the ACM
Optimal disk I/O with parallel block transfer
STOC '90 Proceedings of the twenty-second annual ACM symposium on Theory of computing
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms
IBM Journal of Research and Development
Designing broadcasting algorithms in the Postal Model for message-passing systems
Proceedings of the 4th ACM symposium on Parallel algorithms and architectures
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
ACM Transactions on Mathematical Software (TOMS)
GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark
ACM Transactions on Mathematical Software (TOMS)
Models of Computation: Exploring the Power of Computing
Models of Computation: Exploring the Power of Computing
The Parallel Hierarchical Memory Model
SWAT '94 Proceedings of the 4th Scandinavian Workshop on Algorithm Theory
Extending the Hong-Kung Model to Memory Hierarchies
COCOON '95 Proceedings of the First Annual International Conference on Computing and Combinatorics
Models and resource metrics for parallel and distributed computation
HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
I/O complexity: The red-blue pebble game
STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
Parallelism in random access machines
STOC '78 Proceedings of the tenth annual ACM symposium on Theory of computing
High-performance linear algebra algorithms using new generalized data structures for matrices
IBM Journal of Research and Development
Communication lower bounds for distributed-memory matrix multiplication
Journal of Parallel and Distributed Computing
Anatomy of high-performance matrix multiplication
ACM Transactions on Mathematical Software (TOMS)
Hierarchical memory with block transfer
SFCS '87 Proceedings of the 28th Annual Symposium on Foundations of Computer Science
Parallel shared memory strategies for ant-based optimization algorithms
Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Cache-optimal algorithms for option pricing
ACM Transactions on Mathematical Software (TOMS)
Evaluating multicore algorithms on the unified memory model
Scientific Programming - Software Development for Multi-core Computing Systems
Low depth cache-oblivious algorithms
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
A bridging model for multi-core computing
Journal of Computer and System Sciences
Algorithm engineering: bridging the gap between algorithm theory and practice
Algorithm engineering: bridging the gap between algorithm theory and practice
Multi-DaC programming model: a variant of multi-BSP model for divide-and-conquer algorithms
DAMP '12 Proceedings of the 7th workshop on Declarative aspects and applications of multicore programming
A queueing theoretic approach for performance evaluation of low-power multi-core embedded systems
Journal of Parallel and Distributed Computing
Measurement of the latency parameters of the Multi-BSP model: a multicore benchmarking approach
The Journal of Supercomputing
Hi-index | 0.00 |
With the advent of multicore and many core architectures, we are facing a problem that is new to parallel computing, namely, the management of hierarchical parallel caches. One major limitation of all earlier models is their inability to model multicore processors with varying degrees of sharing of caches at different levels. We propose a unified memory hierarchy model that addresses these limitations and is an extension of the MHG model developed for a single processor with multi-memory hierarchy. We demonstrate that our unified framework can be applied to a number of multicore architectures for a variety of applications. In particular, we derive lower bounds on memory traffic between different levels in the hierarchy for financial and scientific computations. We also give a multicore algorithms for a financial application that exhibits a constant-factor optimal amount of memory traffic between different cache levels. We implemented the algorithm on a multicore system with two Quad-Core Intel Xeon 5310 1.6GHz processors having a total of 8 cores. Our algorithms outperform compiler optimized and auto-parallelized code by a factor of up to 7.3.