A model for hierarchical memory
STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Improving performance of linear algebra algorithms for dense matrices, using algorithmic prefetch
IBM Journal of Research and Development
Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms
IBM Journal of Research and Development
ACM Transactions on Mathematical Software (TOMS)
GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark
ACM Transactions on Mathematical Software (TOMS)
LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
Models of Computation: Exploring the Power of Computing
Models of Computation: Exploring the Power of Computing
Techniques for Optimizing Applications: High Performance Computing
Techniques for Optimizing Applications: High Performance Computing
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Extending the Hong-Kung Model to Memory Hierarchies
COCOON '95 Proceedings of the First Annual International Conference on Computing and Combinatorics
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Architecture, algorithms and applications for future generation supercomputers
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
I/O complexity: The red-blue pebble game
STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
Performance Evaluation of Parallel Algorithms for Pricing Multidimensional Financial Derivatives
ICPPW '02 Proceedings of the 2002 International Conference on Parallel Processing Workshops
Architecture independent parallel binomial tree option price valuations
Parallel Computing
High-performance linear algebra algorithms using new generalized data structures for matrices
IBM Journal of Research and Development
Optimizing Graph Algorithms for Improved Cache Performance
IEEE Transactions on Parallel and Distributed Systems
An experimental comparison of cache-oblivious and cache-conscious programs
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Anatomy of high-performance matrix multiplication
ACM Transactions on Mathematical Software (TOMS)
Hierarchical memory with block transfer
SFCS '87 Proceedings of the 28th Annual Symposium on Foundations of Computer Science
Algorithms and data structures for external memory
Foundations and Trends® in Theoretical Computer Science
A unified model for multicore architectures
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Evaluating multicore algorithms on the unified memory model
Scientific Programming - Software Development for Multi-core Computing Systems
Evaluating multicore algorithms on the unified memory model
Scientific Programming - Software Development for Multi-core Computing Systems
Upper and lower I/O bounds for pebbling r-pyramids
IWOCA'10 Proceedings of the 21st international conference on Combinatorial algorithms
Strong I/O lower bounds for binomial and FFT computation graphs
COCOON'11 Proceedings of the 17th annual international conference on Computing and combinatorics
Upper and lower I/O bounds for pebbling r-pyramids
Journal of Discrete Algorithms
Hi-index | 0.00 |
Today computers have several levels of memory hierarchy. To obtain good performance on these processors it is necessary to design algorithms that minimize I/O traffic to slower memories in the hierarchy. In this article, we study the computation of option pricing using the binomial and trinomial models on processors with a multilevel memory hierarchy. We derive lower bounds on memory traffic between different levels of the hierarchy for these two models. We also develop algorithms for the binomial and trinomial models that have near-optimal memory traffic between levels. We have implemented these algorithms on an UltraSparc IIIi processor with a 4-level of memory hierarchy and demonstrated that our algorithms outperform algorithms without cache blocking by a factor of up to 5 and operate at 70% of peak performance.