The input/output complexity of sorting and related problems
Communications of the ACM
Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Modern computer algebra
A locality-preserving cache-oblivious dynamic dictionary
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Cache oblivious search trees via binary trees of small height
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Funnel Heap - A Cache Oblivious Priority Queue
ISAAC '02 Proceedings of the 13th International Symposium on Algorithms and Computation
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
A comparison of cache aware and cache oblivious static search trees using program instrumentation
Experimental algorithmics
SCG '05 Proceedings of the twenty-first annual symposium on Computational geometry
Cache oblivious stencil computations
Proceedings of the 19th annual international conference on Supercomputing
Cache-oblivious string B-trees
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Cache-oblivious streaming B-trees
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
An Optimal Cache-Oblivious Priority Queue and Its Application to Graph Algorithms
SIAM Journal on Computing
Journal of Symbolic Computation
A cache oblivious algorithm for matrix multiplication based on peano's space filling curve
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
A paradigm for parallel matrix algorithms: scalable cholesky
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Hi-index | 0.00 |
We examine a cache-oblivious reformulation of the (iterative) polygon indecomposability test of [19]. We analyse the cache complexity of the iterative version of this test within the ideal-cache model and identify the bottlenecks affecting its memory performance. Our analysis reveals that the iterative algorithm does not address data locality and that memory accesses progress with arbitrarily sized jumps in the address space. We reformulate the iterative computations of [19] according to a DFS traversal of the computation tree and obtain, as a result, a cache-oblivious variant which exhibits asymptotically improved spatial and temporal locality over the original one. In particular, we show that the DFS variant ensures spatial locality, and describe quantitatively the asymptotic improvements in spatial and temporal locality. In an extension to this work appearing in [3], the DFS variant is implemented in relation to absolute irreducibility of bivariate polynomials over arbitrary fields, and tested against both the original version as given in [19] and the powerful computer algebra system MAGMA. The results demonstrate significantly improved performance for the DFS variant as indicated by L1 misses, L2 misses, and total execution time.