A model for hierarchical memory
STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
An extended set of FORTRAN basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
ACM Transactions on Mathematical Software (TOMS)
The input/output complexity of sorting and related problems
Communications of the ACM
Communication complexity of PRAMs
Theoretical Computer Science - Special issue: Fifteenth international colloquium on automata, languages and programming, Tampere, Finland, July 1988
A bridging model for parallel computation
Communications of the ACM
Optimal disk I/O with parallel block transfer
STOC '90 Proceedings of the twenty-second annual ACM symposium on Theory of computing
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms
IBM Journal of Research and Development
Designing broadcasting algorithms in the Postal Model for message-passing systems
Proceedings of the 4th ACM symposium on Parallel algorithms and architectures
ACM Transactions on Mathematical Software (TOMS)
GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark
ACM Transactions on Mathematical Software (TOMS)
Models of Computation: Exploring the Power of Computing
Models of Computation: Exploring the Power of Computing
Optimal organizations for pipelined hierarchical memories
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Towards a theory of cache-efficient algorithms
Journal of the ACM (JACM)
The Parallel Hierarchical Memory Model
SWAT '94 Proceedings of the 4th Scandinavian Workshop on Algorithm Theory
Extending the Hong-Kung Model to Memory Hierarchies
COCOON '95 Proceedings of the First Annual International Conference on Computing and Combinatorics
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Models and resource metrics for parallel and distributed computation
HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
I/O complexity: The red-blue pebble game
STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
Parallelism in random access machines
STOC '78 Proceedings of the tenth annual ACM symposium on Theory of computing
Performance Evaluation of Parallel Algorithms for Pricing Multidimensional Financial Derivatives
ICPPW '02 Proceedings of the 2002 International Conference on Parallel Processing Workshops
Computer Architecture: A Quantitative Approach
Computer Architecture: A Quantitative Approach
Architecture independent parallel binomial tree option price valuations
Parallel Computing
High-performance linear algebra algorithms using new generalized data structures for matrices
IBM Journal of Research and Development
Communication lower bounds for distributed-memory matrix multiplication
Journal of Parallel and Distributed Computing
A memory model for scientific algorithms on graphics processors
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
An experimental comparison of cache-oblivious and cache-conscious programs
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Provably good multicore cache performance for divide-and-conquer algorithms
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Anatomy of high-performance matrix multiplication
ACM Transactions on Mathematical Software (TOMS)
Hierarchical memory with block transfer
SFCS '87 Proceedings of the 28th Annual Symposium on Foundations of Computer Science
A unified model for multicore architectures
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Cache-optimal algorithms for option pricing
ACM Transactions on Mathematical Software (TOMS)
Cache-optimal algorithms for option pricing
ACM Transactions on Mathematical Software (TOMS)
Upper and lower I/O bounds for pebbling r-pyramids
IWOCA'10 Proceedings of the 21st international conference on Combinatorial algorithms
Upper and lower I/O bounds for pebbling r-pyramids
Journal of Discrete Algorithms
Hi-index | 0.00 |
One of the challenges to achieving good performance on multicore architectures is the effective utilization of the underlying memory hierarchy. While this is an issue for single-core architectures, it is a critical problem for multicore chips. In this paper, we formulate the unified multicore model (UMM) to help understand the fundamental limits on cache performance on these architectures. The UMM seamlessly handles different types of multiple-core processors with varying degrees of cache sharing at different levels. We demonstrate that our model can be used to study a variety of multicore architectures on a variety of applications. In particular, we use it to analyze an option pricing problem using the trinomial model and develop an algorithm for it that has near-optimal memory traffic between cache levels. We have implemented the algorithm on a two Quad-Core Intel Xeon 5310 1.6 GHz processors (8 cores). It achieves a peak performance of 19.5 GFLOPs, which is 38% of the theoretical peak of the multicore system. We demonstrate that our algorithm outperforms compiler-optimized and auto-parallelized code by a factor of up to 7.5.